---
title: "Ordinal regression tables"
description: >
  Publication-ready tables for ordinal (proportional-odds / cumulative
  logit) regression in R with `table_regression()`: shared slope
  coefficients and ordered thresholds, odds ratios, per-outcome-category
  average marginal effects laid out as a probability matrix,
  cluster-robust variance, the `MASS::polr` and `ordinal::clm` engines,
  and the partial-proportional-odds caveat.
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Ordinal regression tables}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true")
have_ordinal <- requireNamespace("ordinal", quietly = TRUE)
source("_pkgdown-helpers.R")
```

```{r setup}
library(spicy)
library(MASS)        # polr()
```

This vignette covers **ordinal regression** — models for an *ordered*
categorical response such as a Poor &lt; Fair &lt; Good &lt; Very good
health rating or a Likert agreement scale. The companion vignette
[*Publication-ready regression tables*](table-regression.html) covers
the shared mechanics (`vcov`, `ci_level`, output formats, multi-model
layouts, broom integration); here we focus on what is *specific* to
ordinal fits.

`table_regression()` supports both ordinal engines:

* **`MASS::polr()`** — proportional-odds cumulative logit (also probit /
  cloglog / cauchit via `method =`);
* **`ordinal::clm()`** — the more flexible cumulative link model, with
  optional scale and partial-proportional-odds (nominal) effects.

The model used throughout is a self-rated-health rating regressed on
age, sex, smoking, and physical activity, using the bundled `sochealth`
survey data (`?sochealth`). `self_rated_health` is an ordered factor:

```{r data}
levels(sochealth$self_rated_health)
```

## The proportional-odds model in one paragraph

A cumulative logit model with $K$ ordered response categories estimates
**one slope per predictor** (shared across all $K-1$ cumulative
splits — the *proportional-odds assumption*) plus $K-1$ ordered
**thresholds** (cut-points). A positive slope means that higher values
of the predictor push the response toward *higher* categories. There is
deliberately **no per-category coefficient**: that is the whole point of
the proportional-odds restriction. The per-category structure reappears
only in the *marginal effects* (below), where a one-unit change has a
different effect on the probability of each category.

## Basic table

Pass a fitted `polr()` object. `table_regression()` renders the shared
slope coefficients with a Wald-$z$ inference regime, followed by a
subordinate **`Thresholds`** block listing the ordered cut-points with
their own B / SE / CI / p (the field convention: `summary.polr()`, SPSS
PLUM, SAS, Stata `ologit`, Bender & Grouven 1997):

```{r basic}
fit <- polr(
  self_rated_health ~ age + sex + smoking + physical_activity,
  data = sochealth, Hess = TRUE
)
table_regression(fit)
```

Reading the table:

* Each predictor has **one** coefficient row (proportional odds);
  factor predictors are grouped under their parent variable with the
  reference level carrying `(ref.)` and an em-dash.
* Inference is **Wald-$z$** (`df = Inf`): ordinal MLE has no residual
  degrees of freedom, matching `summary.polr()`, Stata `ologit`, and
  SPSS PLUM.
* A subordinate **`Thresholds`** block lists the cut-points
  (`Poor | Fair`, `Fair | Good`, `Good | Very good`) with B / SE / CI /
  p, like the predictor rows. They locate the category boundaries on the
  latent logit scale and replace the single intercept of a binary logit.
  They are reported on the log-odds scale and are **never
  exponentiated**. The $z$-test against zero is rarely of interest (it
  only asks whether the baseline cumulative split sits at 50/50), so do
  not over-read a "significant" threshold -- the substantive headline is
  the odds ratios and marginal effects below. Hide the block with
  `show_thresholds = FALSE` to fall back to a compact footer line.
* Below the fit-stats rule, the model-fit block reports **N**, two
  pseudo-$R^2$ (**McFadden**, the Stata `ologit` default, and
  **Nagelkerke**, the SPSS PLUM default), and **AIC**. Override with
  `show_fit_stats` (e.g. `show_fit_stats = c("nobs", "AIC", "BIC")`).

## Odds ratios: `exponentiate = TRUE`

On the logit scale a coefficient is a log cumulative-odds ratio.
`exponentiate = TRUE` reports **odds ratios** instead, exponentiating
the estimate and its CI bounds and rebranding the column header:

```{r or}
table_regression(fit, exponentiate = TRUE)
```

An OR above 1 raises the odds of being in a *higher* health category;
below 1 lowers them. Smoking's OR here is interpretable as
"smokers have lower odds of better self-rated health." The `Thresholds`
rows stay on the **log-odds scale** (a cut-point is not an odds ratio,
so it is never exponentiated); only the predictor coefficients become
ORs, and the footer flags this.

## Average marginal effects: a probability matrix

Odds ratios are multiplicative and live on the latent scale. For a
reader-friendly, **probability-scale** summary, request average marginal
effects (AME). Because an ordinal model predicts a probability for
*every* response category, each predictor has **one AME per category** —
the effect of the predictor on the probability of that category.
`table_regression()` lays these out as the field-standard **matrix**:
predictors in rows, response categories in columns (Long &amp; Freese
2014; Williams 2012).

```{r ame}
table_regression(fit, show_columns = c("b", "ame"))
```

How to read it, with the `smoking = Yes` row:

* The four `AME ...` columns are the average change in the probability
  of each health category for a smoker versus a non-smoker, holding the
  other predictors at their observed values.
* The effects **sum to ≈ 0 across the row**: a predictor *redistributes*
  probability mass between categories (the total probability is always
  1). Here smoking moves mass *out of* the better categories and *into*
  the worse ones.
* Cells are **probabilities** (the 0–1 scale, matching
  `marginaleffects::avg_slopes()` and the binary-`glm` AME). An AME of
  `0.04` is a change of **4 percentage points** — *not* 4 percent. The
  footer note states this; reserve "percent" for multiplicative
  quantities such as odds ratios.

Request only the AME matrix (drop the coefficient column) with
`show_columns = "ame"`, or add the AME standard errors / CIs / p-values
with `"ame_se"`, `"ame_ci"`, `"ame_p"` (or the `"all_ame"` bundle):

```{r ame-only}
table_regression(fit, show_columns = "ame")
```

Marginal effects are computed with
[`marginaleffects::avg_slopes()`](https://marginaleffects.com) on the
response (probability) scale, then cross-validated to it to machine
precision. For a single-outcome model (binary `glm`, `lm`, mixed) the
AME stays a single column — the per-category matrix appears only when
the response has more than two categories.

## Cluster-robust standard errors

Ordinal fits honour the cluster-robust `vcov` family
(`"CR0"`–`"CR3"`) via `sandwich::vcovCL()`. Pass the cluster as a
formula, a column name, or a vector (see the main vignette,
*How to specify `cluster`*):

```{r cluster}
table_regression(
  fit, vcov = "CR2", cluster = ~region,
  show_columns = c("b", "ame")
)
```

The footer switches to name the estimator and the clustering variable.
Heteroskedasticity-consistent (`HC*`) and the `bootstrap` / `jackknife`
resamplers are *not* defined for ordinal fits and are refused with a
clear `spicy_unsupported_vcov` error rather than a silent fallback.

## Standard errors and confidence intervals

The three inference regimes differ in **how the standard error and the
confidence interval relate**:

* **Wald** (default): both come from the model information matrix. The CI
  is `estimate ± z × SE` (symmetric) and the *p*-value uses the same SE —
  SE, CI and *p* are one coherent set.
* **Robust / cluster-robust** (`vcov = "CR*"`): the whole set switches to
  the sandwich estimator — SE, CI (`± z × SE_robust`) and *p* shift
  together, still coupled.
* **Profile likelihood** (`ci_method = "profile"`): the CI is inverted
  from the likelihood-ratio statistic and is **asymmetric** — *not*
  `estimate ± z × SE`. Profile is a **CI-only refinement**: the estimate,
  SE, statistic and *p*-value stay Wald; only the CI changes. It covers
  the predictor coefficients (via `confint()`); the thresholds stay Wald.
  A robust `vcov` takes precedence (profile is model-based), so requesting
  both uses the robust Wald CIs.

Because a profile CI cannot be reconstructed from the SE, the footer
**discloses** it (`95% CIs: profile likelihood.`) alongside the SE method
— following APA 7 / SAMPL / STROBE and matching
`parameters::model_parameters()`:

```{r profile}
table_regression(fit, ci_method = "profile", show_columns = c("b", "ci", "p"))
```

## The `ordinal::clm()` engine

`ordinal::clm()` is rendered through the same code path and supports the
same columns. It is the engine to reach for when you need flexible link
functions, scale effects, or a relaxation of proportional odds.

```{r clm, eval = have_ordinal}
clm_fit <- ordinal::clm(
  self_rated_health ~ age + sex + smoking + physical_activity,
  data = sochealth
)
table_regression(clm_fit, show_columns = c("b", "ame"))
```

### Scale effects and partial proportional odds

`clm` offers two relaxations of the basic model.

A **scale** component (`scale = ~`) lets the latent dispersion depend on
covariates. The location coefficients remain a single shared block, so
the table renders normally. Cluster-robust SEs are not available for
scale fits (`sandwich` has no estimating-functions method for them), so
`CR*` is refused up front with a clear `spicy_unsupported_vcov` error:

```{r clm-scale, eval = have_ordinal}
clm_scale <- ordinal::clm(
  self_rated_health ~ age + smoking, scale = ~ smoking,
  data = sochealth
)
table_regression(clm_scale)             # classical: renders normally
```

```{r clm-scale-refuse, eval = have_ordinal, error = TRUE}
table_regression(clm_scale, vcov = "CR2", cluster = ~region)
```

A **nominal** component (`nominal = ~`, *partial* proportional odds)
lets a predictor's effect vary across the cut-points: it estimates a
separate coefficient **per cut-point** for the nominal terms. These
render as a labelled **`Non-proportional effects`** block — one row per
cut-point — between the proportional coefficients and the thresholds:

```{r clm-nominal, eval = have_ordinal}
clm_npo <- ordinal::clm(
  self_rated_health ~ age, nominal = ~ smoking,
  data = sochealth
)
table_regression(clm_npo)
```

Two caveats for the non-proportional terms, both surfaced by the table:
robust / cluster SEs are **not available** (`sandwich` has no
estimating-functions method, so a robust `vcov` is refused), and
`ci_method = "profile"` covers the proportional coefficients only (the
non-proportional and threshold rows stay Wald). Under
`exponentiate = TRUE` each non-proportional coefficient becomes an odds
ratio for its cut-point.

## Several models side by side

The same fit can appear several times with different estimators, or
different fits can be compared, with `table_regression(list(...))`.
Per-model `vcov` is supplied as a list:

```{r multi}
fit_min <- polr(self_rated_health ~ smoking, data = sochealth, Hess = TRUE)
table_regression(
  list(Unadjusted = fit_min, Adjusted = fit),
  show_columns = c("b", "ame_p")
)
```

## Output formats

The default console table shown above is one of several targets. The
`output` argument also produces a raw data frame, a long broom-style
tibble, and — with the corresponding Suggests package — rich `gt`,
`flextable`, `tinytable`, Excel, or Word tables. The structure (the
per-category AME matrix included) carries through to every format.

```{r outputs-df}
head(table_regression(fit, show_columns = c("b", "ame"), output = "data.frame"))
```

```{r outputs-gt, eval = build_rich_tables}
table_regression(fit, show_columns = c("b", "ame"), output = "gt")
```

[`broom::tidy()`](https://broom.tidymodels.org) returns the long frame,
one row per `(term, estimate_type)`; per-category AME rows carry the
response category in the `outcome_level`-derived structure:

```{r broom}
broom::tidy(table_regression(fit, show_columns = c("b", "ame")))
```

## References

* Agresti, A. (2010). *Analysis of Ordinal Categorical Data* (2nd ed.).
  Wiley.
* Long, J. S., &amp; Freese, J. (2014). *Regression Models for
  Categorical Dependent Variables Using Stata* (3rd ed.). Stata Press.
* Williams, R. (2012). Using the margins command to estimate and
  interpret adjusted predictions and marginal effects. *The Stata
  Journal*, 12(2), 308–331.
* Arel-Bundock, V., Greifer, N., &amp; Heiss, A. (2024). How to Interpret
  Statistical Models Using marginaleffects in R and Python. *Journal of
  Statistical Software*.