--- title: "Mixed-effects regression tables" description: > Publication-ready tables for mixed-effects (multilevel) regression in R with `table_regression()`: random effects as table rows with SE and CI, the boundary-correct chi-bar-squared test, opt-in per-term LRT / RLRT, ICC and Nakagawa R-squared, Satterthwaite inference, odds ratios for `glmer`, and the `lme4`, `glmmTMB`, and `nlme` engines. output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Mixed-effects regression tables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} have_mixed <- requireNamespace("lme4", quietly = TRUE) && requireNamespace("lmerTest", quietly = TRUE) && requireNamespace("merDeriv", quietly = TRUE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = have_mixed ) build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true") have_glmmTMB <- have_mixed && requireNamespace("glmmTMB", quietly = TRUE) have_nlme <- have_mixed && requireNamespace("nlme", quietly = TRUE) have_rlrsim <- have_mixed && requireNamespace("RLRsim", quietly = TRUE) have_clubsw <- have_mixed && requireNamespace("clubSandwich", quietly = TRUE) have_broom <- have_mixed && requireNamespace("broom", quietly = TRUE) source("_pkgdown-helpers.R") ``` ```{r setup, message = FALSE} library(spicy) library(lmerTest) # masks lme4::lmer() with the Satterthwaite-aware version ``` This vignette covers **mixed-effects (multilevel) regression** — models for data with a grouping structure, such as repeated measurements nested in subjects or respondents nested in regions. The companion vignette [*Publication-ready regression tables*](table-regression.html) covers the shared mechanics (`vcov`, `ci_level`, output formats, multi-model layouts, broom integration); here we focus on what is *specific* to mixed fits — above all, how the **random effects** are reported. `table_regression()` supports four mixed engines: * **`lme4::lmer()`** — linear mixed models. Load **`lmerTest`** before fitting to get Satterthwaite degrees of freedom; plain `lme4` fits fall back to a large-sample Wald-z (the footer says which you got). * **`lme4::glmer()`** — generalized linear mixed models (Wald-z). * **`glmmTMB::glmmTMB()`** — a wider family space, plus zero-inflation and dispersion submodels. * **`nlme::lme()`** — the classical engine (containment-df t-tests). The running example is `lme4::sleepstudy`: reaction times of 18 subjects measured on 10 consecutive days of a sleep-restriction study. Each subject contributes 10 rows, so observations are not independent — the canonical case for a random intercept and slope. ## The mixed model in one paragraph A mixed model splits the coefficients in two. **Fixed effects** are the population-level slopes you would report from any regression. **Random effects** let selected coefficients vary across groups: `(Days | Subject)` gives every subject their own intercept and their own `Days` slope, drawn from a common distribution. What the model *estimates* for the random part is not 18 pairs of coefficients but the **variance components** of that distribution — how much subjects differ at baseline, how much their slopes differ, and how the two go together. A regression table therefore needs two kinds of rows, and that is exactly what `table_regression()` prints. ## Basic table ```{r basic} fit <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy) table_regression(fit) ``` Reading the table, top to bottom: * **Fixed effects** first, as in any regression table. With `lmerTest` loaded, inference is a **t-test with Satterthwaite degrees of freedom** — the `lmerTest` default (Kuznetsova et al. 2017), offered by SAS PROC MIXED as `DDFM=SATTERTHWAITE` — and the footer says so. * The **`Random effects:`** block reports the variance components as rows, each with its own SE and CI: the between-subject SD of the intercept and of the `Days` slope (`σ`), their correlation (`ρ`), and the residual SD. The SD scale is the default because it is in the units of the outcome (Gelman & Hill 2007) — sleep deprivation adds about 10.5 ms per day on average (fixed `Days`), while individual slopes spread around that mean with an SD of about 5.9 ms. * The p column of the random-effect rows shows a **dash, by design** — see the next section. * Below the rule: `n` (observations), **`N (Subject)`** (groups), and the **Nakagawa marginal / conditional R²** — the variance explained by the fixed effects alone versus fixed and random together (Nakagawa & Schielzeth 2013). The gap between the two (0.28 vs 0.80 here) is itself a summary of how much the grouping structure matters. * The footer closes with the **likelihood-ratio test of the whole random part** against the same model without random effects, referred to a **chi-bar-squared** mixture distribution. ## Why the variance components carry no p-value A variance cannot be negative, so the null hypothesis "this variance component is zero" sits **on the boundary** of the parameter space. There, the usual Wald z-test is invalid — its reference distribution is wrong, and the resulting p-value is conservative in an unknown degree (Self & Liang 1987). Printing a Wald p next to each `σ` would look rigorous and be meaningless. None of the dedicated mixed-model engines prints one by default — `lme4`, `nlme`, Stata `mixed`, and MLwiN all decline; SAS PROC MIXED offers a Wald Z only behind its `COVTEST` option, with documented cautions. What *is* valid is a **likelihood-ratio test with a boundary-corrected reference**: a 50:50 mixture of chi-squared distributions (chi-bar-squared; Self & Liang 1987; Stram & Lee 1994). The footer reports such a test for the random part as a whole — across its several parameters jointly it follows Stata's conservative halved-p convention, while the per-term tests below use the exact mixture. Here chi-bar-squared(3) = 148.35, p < .001: the random structure earns its place. ## When the fit is singular The boundary is not only a testing problem — estimates land *on* it. When a variance component is estimated at exactly zero (`lme4` calls this a **singular fit**), its Wald SE and CI are meaningless, so `table_regression()` omits them and says why in the footer: ```{r singular, message = FALSE, warning = FALSE} # (the accompanying console warning is suppressed in this rendering) set.seed(2026) d <- data.frame(x = rnorm(120), g = factor(rep(1:12, each = 10))) d$y <- 2 + 0.5 * d$x + rnorm(120) # no group effect at all sfit <- lmer(y ~ x + (1 | g), data = d) table_regression(sfit) ``` The chi-bar-squared test reads p = 0.500 — the boundary value when the data show no group variance at all. The note states the fact for the table's reader; the actionable advice arrives as a console warning when the table is built: simplify the random structure, or test the component with `re_test = "lrt"`, rather than report a zero variance with invented uncertainty. ## Testing individual components: `re_test` When a reviewer asks about one *specific* component — "do you really need the random slope?" — request an opt-in per-term test. `re_test = "lrt"` refits the model without each testable component and fills the p column of its row with the boundary-corrected likelihood-ratio test. The statistic is the one `lmerTest::ranova()` computes (a REML deviance difference); `ranova` refers it to a plain χ², which is conservative at the boundary, so spicy applies the chi-bar-squared mixture instead: ```{r re-test-lrt} table_regression(fit, re_test = "lrt") ``` Only the `Days` slope row gains a p-value: following the marginality rule of `lmerTest::ranova()`, an intercept is tested only when it is the bar's sole term, and a correlation is not removable on its own. The footer names the test. For a model with a **single** variance component — here the random-intercept-only baseline, which returns in the comparison below — `re_test = "rlrt"` uses the exact restricted likelihood-ratio test of Crainiceanu & Ruppert (2004) instead, with a simulated null distribution (via `RLRsim::exactRLRT()`) — preferable in small samples where the chi-bar-squared approximation is at its weakest: ```{r fit-ri} fit_ri <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy) ``` ```{r re-test-rlrt, eval = have_rlrsim} table_regression(fit_ri, re_test = "rlrt") ``` ## The random-effects block on your terms Three arguments control the block without touching the estimates. `re_scale = "variance"` switches the rows from SD (`σ`) to variance (`σ²`); `re_columns` restricts which quantities the variance-component rows display (the SE and CI on a variance component come from `merDeriv` and are themselves boundary-fragile — dropping them is a defensible editorial choice); `show_re = FALSE` removes the block and all random-effects footer lines, leaving a fixed-effects-only table (the `N (groups)` and R² fit-stat rows remain): ```{r re-scale} table_regression(fit, re_scale = "variance", re_columns = "est") ``` Whatever the display, `broom::tidy()` always carries the full set (estimate, SE, CI) for the variance-component rows — display and data are decoupled. ## Random intercept or random slope? Models side by side Model comparison is where the row layout pays off: the variance components align across columns like any coefficient, and each model gets its own footer line. ```{r side-by-side} table_regression( list("Intercept only" = fit_ri, "+ random slope" = fit), show_columns = c("b", "se", "p") ) ``` **What is — and is not — being tested here.** Each footer's chi-bar-squared line answers one question only: *does this model need its random part at all?* — the fit against a plain linear regression. Neither footer compares the two columns to each other. That comparison — *does the slope earn its place on top of the intercept?* — is the per-term test of the previous section: `re_test = "lrt"` on the slope model refits exactly the intercept-only model of column 1 and refers the REML deviance difference (42.8) to the 50:50 mixture of χ²(1) and χ²(2), p ≈ 3 × 10⁻¹⁰ — the textbook procedure for retaining a random slope (Snijders & Bosker 2012, §6.2; Stram & Lee 1994). AIC and BIC agree (1794.5 → 1755.6; 1807.2 → 1774.8). For hierarchies that grow the *fixed* part instead, `nested = TRUE` adds chi-squared, AIC, and BIC change rows — an ML-refit test with a plain χ² reference, appropriate there but conservative for random-structure changes, which is why the slope decision belongs to `re_test`. **Reading the two R² rows — without over-reading them.** The Nakagawa pair is descriptive, and each member answers a different question. *Marginal* R² is the share of variance explained by the fixed effects alone: it is 0.28 in both columns because both models have the same fixed part — the average trajectory explains what it explains, whatever the random structure around it. *Conditional* R² adds the random effects: it rises (0.70 → 0.80) because individual slopes let each subject's own trajectory absorb variance the first model left in the residual. The trap: a conditional R² can hardly go *down* when the random structure grows, so its increase is **not** evidence that the slope is needed — it measures how much the grouping structure explains, never whether a component earns its place. Selection belongs to the boundary-corrected test and the information criteria; the R² pair then describes the model you selected. **The ICC row is an interpretation, not a test.** For the intercept-only model it comes straight from the two σ rows above it: ICC = σ²(Subject) / (σ²(Subject) + σ²(Residual)) = 37.12² / (37.12² + 30.99²) ≈ 0.59 — squares, because the block displays standard deviations (`re_scale = "variance"` shows the variances directly). Read it two equivalent ways: 59% of the variance remaining after the fixed effects lies *between* subjects rather than within them — the variance partition coefficient of the multilevel textbooks (Snijders & Bosker 2012, §3.3) — and, equivalently, two observations from the same subject are expected to correlate at 0.59. It appears only for the intercept-only model: with a random slope, the correlation between two observations from the same subject depends on *when* they were taken, so a single ICC no longer exists — none is printed. One further detail repays attention: the fixed `Days` SE **doubles** (0.80 → 1.55). A model without the random slope overstates the precision of the average slope by ignoring between-subject slope variation (Schielzeth & Forstmeier 2009) — the inferential stake of the decision, visible in the table itself. Whether to *test* the slope at all is its own debate. The design-driven position includes every slope the design permits (Barr et al. 2013); parsimony advocates prune components the data cannot support (Matuschek et al. 2017); Gelman & Hill (2007) would rather estimate and let partial pooling shrink a near-zero variance. The table serves all three positions: the components are always displayed with their uncertainty, and the test stays opt-in. ## Generalized linear mixed models `glmer()` fits render identically, with two family-driven changes: inference is **Wald-z** (no Satterthwaite for GLMMs), and `exponentiate = TRUE` is link-gated exactly as for `glm` — odds ratios under logit, rate ratios under log, and a hard error on links whose exponential has no meaning (probit). The example is `lme4::cbpp`, herd-level incidence of a cattle disease across four periods: ```{r glmer} gfit <- glmer( cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial() ) table_regression(gfit, exponentiate = TRUE) ``` The random-effect row is now on the **log-odds scale** (the linear predictor's scale), not the outcome scale; σ = 0.64 means herds differ substantially in baseline incidence. The footer's chi-bar-squared test (14.01 on 1 df, p < .001) confirms the herd effect, and the Nakagawa R² pair carries over unchanged. No ICC row appears — for a trial-weighted `cbind()` binomial response the single-trial latent-scale ICC is not defined, so none is printed; a Bernoulli 0/1 `glmer` shows one, computed on the latent logit scale (the residual variance is fixed at π²/3 there). ## `glmmTMB`: beyond `lme4` `glmmTMB` fits reach the same table through the same code path, and bring two extra submodels. A zero-inflation (`ziformula =`) or dispersion (`dispformula =`) component renders as its own labelled block of rows, with a footer line stating what the component models and on which scale — here with the `Salamanders` count data: ```{r glmmtmb, eval = have_glmmTMB} data("Salamanders", package = "glmmTMB") zfit <- glmmTMB::glmmTMB( count ~ mined + (1 | site), ziformula = ~ mined, family = poisson(), data = Salamanders ) table_regression(zfit, exponentiate = TRUE) ``` Note the per-block exponentiation: the count coefficients become **IRR**, while the logit zero-inflation coefficients become **odds ratios of a structural zero**. Component blocks are covered in depth in the counts documentation (`?table_regression_counts`); opt out with `show_components = FALSE`. ## `nlme::lme` The classical engine renders through the same layout; its fixed-effect inference is the **containment-df t-test** native to `nlme`, and the footer says so: ```{r nlme, eval = have_nlme} lfit <- nlme::lme(Reaction ~ Days, random = ~ 1 | Subject, data = sleepstudy) table_regression(lfit) ``` ## Cluster-robust and other variance estimators Mixed fits honour the cluster-robust family (`"CR0"`–`"CR3"`) through `clubSandwich`, with Satterthwaite small-sample degrees of freedom computed from the robust covariance — the footer attributes them accordingly. Reach for `CR*` when you suspect the random-effects structure does not fully capture the within-group dependence: ```{r cr2, eval = have_clubsw} table_regression(fit_ri, vcov = "CR2", cluster = ~Subject) ``` Requests that have no valid backend are refused with a clear error rather than silently ignored: `HC*` (an OLS concept, undefined for mixed models), `CR*` on `glmer` (no `clubSandwich` method), and the resampling estimators. For `glmmTMB`, `CR*` covers the conditional component only, and the footer discloses it. ## Standardized coefficients Mixed fits support `standardized = "refit"` — the model is refit on z-scored data, the gold standard under correlated predictors (Cohen et al. 2003). The β rows inherit the same Satterthwaite reference distribution as their unstandardized counterparts, so B and β carry one consistent p per term: ```{r standardized} table_regression(fit, standardized = "refit", show_columns = c("b", "beta", "p")) ``` The algebraic shortcuts (`"posthoc"`, `"basic"`, `"smart"`) are not defined for mixed models — the marginal SD(y) has no unique decomposition across levels — and fall back to `"refit"` with a warning. Formulas with inline transforms (`log(x)`, `poly()`) decline the refit and omit the β rows, with a warning; pre-compute the transformed column in `data` for an exact refit. ## Output formats Everything above used the default console output. The same table renders as a raw data frame, a long broom-style tibble, or — with the corresponding Suggests package — a rich `gt`, `flextable`, `tinytable`, Excel, or Word table; the random-effects block carries through to every format. ```{r outputs-df} head(table_regression(fit, output = "data.frame"), 8) ``` ```{r outputs-gt, eval = build_rich_tables && have_mixed} table_regression(fit, output = "gt") ``` In [`broom::tidy()`](https://broom.tidymodels.org) the variance-component rows are marked `estimate_type = "vc"` and their terms are prefixed `re::` (correlations carry a `::cor` suffix), so they filter cleanly: ```{r broom, eval = have_broom} td <- broom::tidy(table_regression(fit)) td[td$estimate_type == "vc", c("term", "estimate", "std.error", "conf.low", "conf.high")] ``` ## References * Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. *Journal of Memory and Language*, 68(3), 255–278. * Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. *Journal of Statistical Software*, 67(1). * Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). *Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences* (3rd ed.). Lawrence Erlbaum. * Crainiceanu, C. M., & Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. *Journal of the Royal Statistical Society, Series B*, 66(1), 165–185. * Gelman, A., & Hill, J. (2007). *Data Analysis Using Regression and Multilevel/Hierarchical Models*. Cambridge University Press. * Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. *Journal of Statistical Software*, 82(13). * Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. *Journal of Memory and Language*, 94, 305–315. * Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R² from generalized linear mixed-effects models. *Methods in Ecology and Evolution*, 4(2), 133–142. * Schielzeth, H., & Forstmeier, W. (2009). Conclusions beyond support: overconfident estimates in mixed models. *Behavioral Ecology*, 20(2), 416–420. * Self, S. G., & Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. *JASA*, 82(398), 605–610. * Snijders, T. A. B., & Bosker, R. J. (2012). *Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling* (2nd ed.). Sage. * Stram, D. O., & Lee, J. W. (1994). Variance components testing in the longitudinal mixed effects model. *Biometrics*, 50(4), 1171–1177.