---
title: "Mixed-effects regression tables"
description: >
  Publication-ready tables for mixed-effects (multilevel) regression in R
  with `table_regression()`: random effects as table rows with SE and CI,
  the boundary-correct chi-bar-squared test, opt-in per-term LRT / RLRT,
  ICC and Nakagawa R-squared, Satterthwaite inference, odds ratios for
  `glmer`, and the `lme4`, `glmmTMB`, and `nlme` engines.
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Mixed-effects regression tables}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
have_mixed <- requireNamespace("lme4", quietly = TRUE) &&
  requireNamespace("lmerTest", quietly = TRUE) &&
  requireNamespace("merDeriv", quietly = TRUE)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = have_mixed
)

build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true")
have_glmmTMB <- have_mixed && requireNamespace("glmmTMB", quietly = TRUE)
have_nlme    <- have_mixed && requireNamespace("nlme", quietly = TRUE)
have_rlrsim  <- have_mixed && requireNamespace("RLRsim", quietly = TRUE)
have_clubsw  <- have_mixed && requireNamespace("clubSandwich", quietly = TRUE)
have_broom   <- have_mixed && requireNamespace("broom", quietly = TRUE)
source("_pkgdown-helpers.R")
```

```{r setup, message = FALSE}
library(spicy)
library(lmerTest)   # masks lme4::lmer() with the Satterthwaite-aware version
```

This vignette covers **mixed-effects (multilevel) regression** — models
for data with a grouping structure, such as repeated measurements
nested in subjects or respondents nested in regions. The companion
vignette [*Publication-ready regression
tables*](table-regression.html) covers the shared mechanics (`vcov`,
`ci_level`, output formats, multi-model layouts, broom integration);
here we focus on what is *specific* to mixed fits — above all, how the
**random effects** are reported.

`table_regression()` supports four mixed engines:

* **`lme4::lmer()`** — linear mixed models. Load **`lmerTest`** before
  fitting to get Satterthwaite degrees of freedom; plain `lme4` fits
  fall back to a large-sample Wald-z (the footer says which you got).
* **`lme4::glmer()`** — generalized linear mixed models (Wald-z).
* **`glmmTMB::glmmTMB()`** — a wider family space, plus zero-inflation
  and dispersion submodels.
* **`nlme::lme()`** — the classical engine (containment-df t-tests).

The running example is `lme4::sleepstudy`: reaction times of 18
subjects measured on 10 consecutive days of a sleep-restriction
study. Each subject contributes 10 rows, so observations are not
independent — the
canonical case for a random intercept and slope.

## The mixed model in one paragraph

A mixed model splits the coefficients in two. **Fixed effects** are the
population-level slopes you would report from any regression. **Random
effects** let selected coefficients vary across groups: `(Days |
Subject)` gives every subject their own intercept and their own `Days`
slope, drawn from a common distribution. What the model *estimates* for
the random part is not 18 pairs of coefficients but the **variance
components** of that distribution — how much subjects differ at
baseline, how much their slopes differ, and how the two go together. A
regression table therefore needs two kinds of rows, and that is exactly
what `table_regression()` prints.

## Basic table

```{r basic}
fit <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
table_regression(fit)
```

Reading the table, top to bottom:

* **Fixed effects** first, as in any regression table. With `lmerTest`
  loaded, inference is a **t-test with Satterthwaite degrees of
  freedom** — the `lmerTest` default (Kuznetsova et al. 2017), offered
  by SAS PROC MIXED as `DDFM=SATTERTHWAITE` — and the footer says so.
* The **`Random effects:`** block reports the variance components as
  rows, each with its own SE and CI: the between-subject SD of the
  intercept and of the `Days` slope (`σ`), their correlation (`ρ`), and
  the residual SD. The SD scale is the default because it is in the
  units of the outcome (Gelman & Hill 2007) — sleep deprivation adds
  about 10.5 ms per day on average (fixed `Days`), while individual
  slopes spread around that mean with an SD of about 5.9 ms.
* The p column of the random-effect rows shows a **dash, by design**
  — see the next section.
* Below the rule: `n` (observations), **`N (Subject)`** (groups),
  and the **Nakagawa marginal / conditional R²** — the variance
  explained by the fixed effects alone versus fixed and random
  together (Nakagawa & Schielzeth 2013). The gap between the two
  (0.28 vs 0.80 here) is itself a summary of how much the grouping
  structure matters.
* The footer closes with the **likelihood-ratio test of the whole
  random part** against the same model without random effects,
  referred to a **chi-bar-squared** mixture distribution.

## Why the variance components carry no p-value

A variance cannot be negative, so the null hypothesis "this variance
component is zero" sits **on the boundary** of the parameter space.
There, the usual Wald z-test is invalid — its reference distribution is
wrong, and the resulting p-value is conservative in an unknown degree
(Self & Liang 1987). Printing a Wald p next to each `σ` would look
rigorous and be meaningless. None of the dedicated mixed-model
engines prints one by default — `lme4`, `nlme`, Stata `mixed`, and
MLwiN all decline; SAS PROC MIXED offers a Wald Z only behind its
`COVTEST` option, with documented cautions.

What *is* valid is a **likelihood-ratio test with a boundary-corrected
reference**: a 50:50 mixture of chi-squared distributions
(chi-bar-squared; Self & Liang 1987; Stram & Lee 1994). The footer
reports such a test for the random part as a whole — across its
several parameters jointly it follows Stata's conservative halved-p
convention, while the per-term tests below use the exact mixture. Here
chi-bar-squared(3) = 148.35, p < .001: the random structure earns its
place.

## When the fit is singular

The boundary is not only a testing problem — estimates land *on* it.
When a variance component is estimated at exactly zero (`lme4` calls
this a **singular fit**), its Wald SE and CI are meaningless, so
`table_regression()` omits them and says why in the footer:

```{r singular, message = FALSE, warning = FALSE}
# (the accompanying console warning is suppressed in this rendering)
set.seed(2026)
d <- data.frame(x = rnorm(120), g = factor(rep(1:12, each = 10)))
d$y <- 2 + 0.5 * d$x + rnorm(120)   # no group effect at all
sfit <- lmer(y ~ x + (1 | g), data = d)
table_regression(sfit)
```

The chi-bar-squared test reads p = 0.500 — the boundary value when the
data show no group variance at all. The note states the fact for the
table's reader; the actionable advice arrives as a console warning
when the table is built: simplify the random structure, or test the
component with `re_test = "lrt"`, rather than report a zero variance
with invented uncertainty.

## Testing individual components: `re_test`

When a reviewer asks about one *specific* component — "do you really
need the random slope?" — request an opt-in per-term test.
`re_test = "lrt"` refits the model without each testable component and
fills the p column of its row with the boundary-corrected
likelihood-ratio test. The statistic is the one `lmerTest::ranova()`
computes (a REML deviance difference); `ranova` refers it to a plain
χ², which is conservative at the boundary, so spicy applies the
chi-bar-squared mixture instead:

```{r re-test-lrt}
table_regression(fit, re_test = "lrt")
```

Only the `Days` slope row gains a p-value: following the marginality
rule of `lmerTest::ranova()`, an intercept is tested only when it is
the bar's sole term, and a correlation is not removable on its own.
The footer names the test.

For a model with a **single** variance component — here the
random-intercept-only baseline, which returns in the comparison below —
`re_test = "rlrt"` uses the exact restricted likelihood-ratio test of
Crainiceanu & Ruppert (2004) instead, with a simulated null
distribution (via `RLRsim::exactRLRT()`) — preferable in small samples where the
chi-bar-squared approximation is at its weakest:

```{r fit-ri}
fit_ri <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
```

```{r re-test-rlrt, eval = have_rlrsim}
table_regression(fit_ri, re_test = "rlrt")
```

## The random-effects block on your terms

Three arguments control the block without touching the estimates.
`re_scale = "variance"` switches the rows from SD (`σ`) to variance
(`σ²`); `re_columns` restricts which quantities the variance-component
rows display (the SE and CI on a variance component come from
`merDeriv` and are themselves boundary-fragile — dropping them is a
defensible editorial choice); `show_re = FALSE` removes the block and
all random-effects footer lines, leaving a fixed-effects-only table
(the `N (groups)` and R² fit-stat rows remain):

```{r re-scale}
table_regression(fit, re_scale = "variance", re_columns = "est")
```

Whatever the display, `broom::tidy()` always carries the full set
(estimate, SE, CI) for the variance-component rows — display and data
are decoupled.

## Random intercept or random slope? Models side by side

Model comparison is where the row layout pays off: the variance
components align across columns like any coefficient, and each model
gets its own footer line.

```{r side-by-side}
table_regression(
  list("Intercept only" = fit_ri, "+ random slope" = fit),
  show_columns = c("b", "se", "p")
)
```

**What is — and is not — being tested here.** Each footer's
chi-bar-squared line answers one question only: *does this model need
its random part at all?* — the fit against a plain linear regression.
Neither footer compares the two columns to each other. That comparison
— *does the slope earn its place on top of the intercept?* — is the
per-term test of the previous section: `re_test = "lrt"` on the slope
model refits exactly the intercept-only model of column 1 and refers
the REML deviance difference (42.8) to the 50:50 mixture of χ²(1) and
χ²(2), p ≈ 3 × 10⁻¹⁰ — the textbook procedure for retaining a random
slope (Snijders & Bosker 2012, §6.2; Stram & Lee 1994). AIC and BIC
agree (1794.5 → 1755.6; 1807.2 → 1774.8). For hierarchies that grow
the *fixed* part instead, `nested = TRUE` adds chi-squared, AIC, and BIC
change rows — an ML-refit test with a plain χ² reference,
appropriate there but conservative for random-structure changes, which
is why the slope decision belongs to `re_test`.

**Reading the two R² rows — without over-reading them.** The Nakagawa
pair is descriptive, and each member answers a different question.
*Marginal* R² is the share of variance explained by the fixed effects
alone: it is 0.28 in both columns because both models have the same
fixed part — the average trajectory explains what it explains,
whatever the random structure around it. *Conditional* R² adds the
random effects: it rises (0.70 → 0.80) because individual slopes let
each subject's own trajectory absorb variance the first model left in
the residual. The trap: a conditional R² can hardly go *down* when the
random structure grows, so its increase is **not** evidence that the
slope is needed — it measures how much the grouping structure
explains, never whether a component earns its place. Selection belongs
to the boundary-corrected test and the information criteria; the R²
pair then describes the model you selected.

**The ICC row is an interpretation, not a test.** For the
intercept-only model it comes straight from the two σ rows above it:
ICC = σ²(Subject) / (σ²(Subject) + σ²(Residual)) =
37.12² / (37.12² + 30.99²) ≈ 0.59 — squares, because the block
displays standard deviations (`re_scale = "variance"` shows the variances directly).
Read it two equivalent ways: 59% of the variance remaining after the
fixed effects lies *between* subjects rather than within them — the
variance partition coefficient of the multilevel textbooks (Snijders &
Bosker 2012, §3.3) — and, equivalently, two observations from the same
subject are expected to correlate at 0.59. It appears only for the
intercept-only model: with a random slope, the correlation between two
observations from the same subject depends on *when* they were taken,
so a single ICC no longer exists — none is printed.

One further detail repays attention: the fixed `Days` SE **doubles**
(0.80 → 1.55). A model without the random slope overstates the
precision of the average slope by ignoring between-subject slope
variation (Schielzeth & Forstmeier 2009) — the inferential stake of
the decision, visible in the table itself.

Whether to *test* the slope at all is its own debate. The design-driven
position includes every slope the design permits (Barr et al. 2013);
parsimony advocates prune components the data cannot support
(Matuschek et al. 2017); Gelman & Hill (2007) would rather estimate
and let partial pooling shrink a near-zero variance. The table serves
all three positions: the components are always displayed with their
uncertainty, and the test stays opt-in.

## Generalized linear mixed models

`glmer()` fits render identically, with two family-driven changes:
inference is **Wald-z** (no Satterthwaite for GLMMs), and
`exponentiate = TRUE` is link-gated exactly as for `glm` — odds ratios
under logit, rate ratios under log, and a hard error on links whose
exponential has no meaning (probit). The example is `lme4::cbpp`,
herd-level incidence of a cattle disease across four periods:

```{r glmer}
gfit <- glmer(
  cbind(incidence, size - incidence) ~ period + (1 | herd),
  data = cbpp, family = binomial()
)
table_regression(gfit, exponentiate = TRUE)
```

The random-effect row is now on the **log-odds scale** (the linear
predictor's scale), not the outcome scale; σ = 0.64 means herds differ
substantially in baseline incidence. The footer's chi-bar-squared test
(14.01 on 1 df, p < .001) confirms the herd effect, and the Nakagawa
R² pair carries over unchanged. No ICC row appears — for a
trial-weighted `cbind()` binomial response the single-trial
latent-scale ICC is not defined, so none is printed; a Bernoulli 0/1
`glmer` shows one, computed on the latent logit scale (the residual
variance is fixed at π²/3 there).

## `glmmTMB`: beyond `lme4`

`glmmTMB` fits reach the same table through the same code path, and
bring two extra submodels. A zero-inflation (`ziformula =`) or
dispersion (`dispformula =`) component renders as its own labelled
block of rows, with a footer line stating what the component models and
on which scale — here with the `Salamanders` count data:

```{r glmmtmb, eval = have_glmmTMB}
data("Salamanders", package = "glmmTMB")
zfit <- glmmTMB::glmmTMB(
  count ~ mined + (1 | site),
  ziformula = ~ mined,
  family = poisson(), data = Salamanders
)
table_regression(zfit, exponentiate = TRUE)
```

Note the per-block exponentiation: the count coefficients become
**IRR**, while the logit zero-inflation coefficients become **odds
ratios of a structural zero**. Component
blocks are covered in depth in the counts documentation
(`?table_regression_counts`); opt out with `show_components = FALSE`.

## `nlme::lme`

The classical engine renders through the same layout; its fixed-effect
inference is the **containment-df t-test** native to `nlme`, and the
footer says so:

```{r nlme, eval = have_nlme}
lfit <- nlme::lme(Reaction ~ Days, random = ~ 1 | Subject,
                  data = sleepstudy)
table_regression(lfit)
```

## Cluster-robust and other variance estimators

Mixed fits honour the cluster-robust family (`"CR0"`–`"CR3"`) through
`clubSandwich`, with Satterthwaite small-sample degrees of freedom
computed from the robust covariance — the footer attributes them
accordingly. Reach for `CR*` when you suspect the random-effects
structure does not fully capture the within-group dependence:

```{r cr2, eval = have_clubsw}
table_regression(fit_ri, vcov = "CR2", cluster = ~Subject)
```

Requests that have no valid backend are refused with a clear error
rather than silently ignored: `HC*` (an OLS concept, undefined for
mixed models), `CR*` on `glmer` (no `clubSandwich` method), and the
resampling estimators. For `glmmTMB`, `CR*` covers the conditional
component only, and the footer discloses it.

## Standardized coefficients

Mixed fits support `standardized = "refit"` — the model is refit on
z-scored data, the gold standard under correlated predictors (Cohen et
al. 2003). The β rows inherit the same Satterthwaite reference
distribution as their unstandardized counterparts, so B and β carry one
consistent p per term:

```{r standardized}
table_regression(fit, standardized = "refit", show_columns = c("b", "beta", "p"))
```

The algebraic shortcuts (`"posthoc"`, `"basic"`, `"smart"`) are not
defined for mixed models — the marginal SD(y) has no unique
decomposition across levels — and fall back to `"refit"` with a
warning. Formulas with inline transforms (`log(x)`, `poly()`) decline
the refit and omit the β rows, with a warning; pre-compute the
transformed column in `data` for an exact refit.

## Output formats

Everything above used the default console output. The same table
renders as a raw data frame, a long broom-style tibble, or — with the
corresponding Suggests package — a rich `gt`, `flextable`,
`tinytable`, Excel, or Word table; the random-effects block carries
through to every format.

```{r outputs-df}
head(table_regression(fit, output = "data.frame"), 8)
```

```{r outputs-gt, eval = build_rich_tables && have_mixed}
table_regression(fit, output = "gt")
```

In [`broom::tidy()`](https://broom.tidymodels.org) the
variance-component rows are marked `estimate_type = "vc"` and their
terms are prefixed `re::` (correlations carry a `::cor` suffix), so
they filter cleanly:

```{r broom, eval = have_broom}
td <- broom::tidy(table_regression(fit))
td[td$estimate_type == "vc", c("term", "estimate", "std.error", "conf.low", "conf.high")]
```

## References

* Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random
  effects structure for confirmatory hypothesis testing: Keep it
  maximal. *Journal of Memory and Language*, 68(3), 255–278.
* Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting
  linear mixed-effects models using lme4. *Journal of Statistical
  Software*, 67(1).
* Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). *Applied
  Multiple Regression/Correlation Analysis for the Behavioral
  Sciences* (3rd ed.). Lawrence Erlbaum.
* Crainiceanu, C. M., & Ruppert, D. (2004). Likelihood ratio tests in
  linear mixed models with one variance component. *Journal of the Royal
  Statistical Society, Series B*, 66(1), 165–185.
* Gelman, A., & Hill, J. (2007). *Data Analysis Using Regression and
  Multilevel/Hierarchical Models*. Cambridge University Press.
* Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017).
  lmerTest package: Tests in linear mixed effects models. *Journal of
  Statistical Software*, 82(13).
* Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D.
  (2017). Balancing Type I error and power in linear mixed models.
  *Journal of Memory and Language*, 94, 305–315.
* Nakagawa, S., & Schielzeth, H. (2013). A general and simple method
  for obtaining R² from generalized linear mixed-effects models.
  *Methods in Ecology and Evolution*, 4(2), 133–142.
* Schielzeth, H., & Forstmeier, W. (2009). Conclusions beyond support:
  overconfident estimates in mixed models. *Behavioral Ecology*,
  20(2), 416–420.
* Self, S. G., & Liang, K.-Y. (1987). Asymptotic properties of maximum
  likelihood estimators and likelihood ratio tests under nonstandard
  conditions. *JASA*, 82(398), 605–610.
* Snijders, T. A. B., & Bosker, R. J. (2012). *Multilevel Analysis: An
  Introduction to Basic and Advanced Multilevel Modeling* (2nd ed.).
  Sage.
* Stram, D. O., & Lee, J. W. (1994). Variance components testing in
  the longitudinal mixed effects model. *Biometrics*, 50(4),
  1171–1177.