--- title: "Survival regression tables" description: > Publication-ready tables for survival regression in R with `table_regression()`: Cox hazard ratios with events and concordance, cluster-robust (Lin-Wei) variance for multi-centre data, hierarchical Cox comparisons by partial-likelihood ratio test, accelerated failure time models with time ratios, and the `survival`, `rms`, and `flexsurv` engines. output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Survival regression tables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} have_survival <- requireNamespace("survival", quietly = TRUE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = have_survival ) build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true") have_rms <- have_survival && requireNamespace("rms", quietly = TRUE) have_flexsurv <- have_survival && requireNamespace("flexsurv", quietly = TRUE) have_broom <- have_survival && requireNamespace("broom", quietly = TRUE) source("_pkgdown-helpers.R") ``` ```{r setup, message = FALSE} library(spicy) library(survival) ``` This vignette covers **survival (time-to-event) regression** — models for how long until an event occurs, with censoring for the subjects who never reach it. The companion vignette [*Publication-ready regression tables*](table-regression.html) covers the shared mechanics (`vcov`, `ci_level`, output formats, multi-model layouts, broom integration); here we focus on what is specific to survival fits: hazard ratios and time ratios, event counts and concordance, and the variance estimators of multi-centre studies. `table_regression()` supports four survival engines: * **`survival::coxph()`** — the semiparametric Cox proportional hazards model, the field default; * **`survival::survreg()`** — parametric accelerated failure time (AFT) models; * **`rms::cph()`** — the Cox model in the `rms` ecosystem; * **`flexsurv::flexsurvreg()`** — fully parametric models over a wide family of distributions. The running example is `survival::lung`: survival of 228 patients with advanced lung cancer, with age, sex, and the ECOG performance score (0 = fully active, higher is worse) as predictors. One row lacks `ph.ecog` and one lacks the enrolling institution, so we declare the analytic sample explicitly — the same 226 rows serve every model and, later, the cluster variable: ```{r data} lung2 <- na.omit(lung[, c("time", "status", "age", "sex", "ph.ecog", "inst")]) lung2$sex <- factor(lung2$sex, levels = 1:2, labels = c("Male", "Female")) nrow(lung2) ``` ## The Cox model in one paragraph The Cox model works on the **hazard** — the instantaneous event rate among those still at risk. Each coefficient multiplies that hazard by a constant factor at every point in time (the **proportional-hazards assumption**), and the baseline hazard itself is left completely unspecified: that is the semiparametric trick that made the model universal (Cox 1972; Therneau & Grambsch 2000). Two consequences shape the table. There is **no intercept row** — the baseline hazard absorbs it — and there is no residual variance, so the model-fit block reports what a survival reader expects instead: the **number of events** and the **concordance**. ## Basic table: hazard ratios ```{r cox} cx <- coxph(Surv(time, status) ~ age + sex + ph.ecog, data = lung2) table_regression(cx, exponentiate = TRUE) ``` Reading the table: * `exponentiate = TRUE` turns each log-hazard coefficient into a **hazard ratio**. Women's hazard is 0.57 times men's — a 43% lower hazard at any given moment, all else equal; each ECOG point multiplies the hazard by 1.60. Leave `exponentiate` off for the log-hazard `B` column. * Inference is **Wald-z** on the log scale, matching `summary.coxph()` and Stata `stcox`; the statistic and p-value are invariant under the exponentiation. * The footer reports **events** (163 deaths among 226 patients — the precision and power of a survival model are governed chiefly by its event count, not its n) and the **concordance** C = 0.64: the probability that, of two comparable patients, the one predicted to fail earlier actually does. 0.5 is a coin flip; 0.64 is modest discrimination (Harrell 2015). The table reports the model, not its assumption: test proportionality with `survival::cox.zph()` before reporting (here the global test gives p = .17 — no evidence against it; Therneau & Grambsch 2000). ## Why there is no AME column Requesting an `"ame"` column on a Cox fit is refused with an explanation rather than answered: an average marginal effect needs a response scale, and a Cox model deliberately does not commit to one — a per-coefficient "effect on survival" would be ambiguous (at what time? on which scale?) with unreliable standard errors. The hazard ratio *is* the per-coefficient summary of this model. Absolute-scale summaries (risk differences at a horizon, restricted mean survival time) are a different estimand family, not a per-coefficient column. ## Multi-centre data: cluster-robust variance The `lung` patients were enrolled by 18 institutions, and outcomes within an institution may correlate. The `CR*` request routes to the **Lin & Wei (1989)** robust variance — the grouped score (dfbeta) sandwich that `coxph(..., cluster = )` and Stata `stcox, vce(cluster)` use: ```{r cox-cluster} table_regression(cx, vcov = "CR0", cluster = ~inst, exponentiate = TRUE) ``` Two lessons here. First, the mechanics: the cluster variable must provide **one value per row of the model data** — declaring the analytic sample up front (as we did) keeps the cluster aligned when predictors carry missing values. Second, the direction: robust standard errors are not always larger — the sex SE *shrinks* here. Clustering corrects the variance in whichever direction the within-centre correlation points; treating it as a conservative inflation ritual misreads it. ## Hierarchical Cox models Does the performance score improve on age and sex? `nested = TRUE` compares adjacent models by the **partial-likelihood ratio test** (the `anova.coxph()` convention) and appends the change rows: ```{r cox-nested} cx0 <- coxph(Surv(time, status) ~ age + sex, data = lung2) table_regression(list(cx0, cx), nested = TRUE, exponentiate = TRUE, show_columns = c("b", "p")) ``` Adding `ph.ecog` buys a chi-squared change of 16.78 on one degree of freedom (p < .001), an AIC drop of 14.8, and lifts the concordance from 0.60 to 0.64 — three readings of the same improvement, on the likelihood, information, and discrimination scales respectively. ## Accelerated failure time: `survreg()` The parametric alternative models **time itself** (Wei 1992): covariates stretch or compress survival time by a constant factor. Under `exponentiate = TRUE` the coefficients become **time ratios** (TR): ```{r survreg} sr <- survreg(Surv(time, status) ~ age + sex + ph.ecog, data = lung2, dist = "weibull") table_regression(sr, exponentiate = TRUE) ``` The intercept is back — a parametric model must anchor the time scale, and exp(intercept) ≈ 801 days is that anchor (the extrapolated characteristic time at the reference levels and age 0), not a ratio despite the column header. Set the TR next to the Cox HR and the numbers *look* contradictory: women have TR 1.50 but HR 0.57. They are the same finding on opposite scales — a protective effect **multiplies survival time up** (50% longer) and **multiplies the hazard down** (43% lower). Neither direction is more correct; the Weibull family is the one distribution that is simultaneously AFT and proportional-hazards, and the two scales are linked by an exact identity within the fit: HR = TR^(−1/scale) = 1.50^(−1/0.73) ≈ 0.57, with the scale parameter read from the footer — matching the semiparametric Cox estimate to displayed precision (0.575 vs 0.573). The footer names the distribution because the interpretation depends on it — `exponentiate` produces time ratios only for log-scale distributions (Weibull, exponential, lognormal, loglogistic); for an identity-scale distribution (`dist = "gaussian"`) the request is skipped with a warning, since the coefficients already act on the time scale. ## `rms::cph()` The `rms` Cox engine renders through the same layout. One requirement carried over from `rms` itself: cluster-robust variance uses `rms::robcov()`, which needs the design matrices stored at fit time — fit with `x = TRUE, y = TRUE` or the robust request errors with that exact instruction: ```{r cph, eval = have_rms} cph_fit <- rms::cph(Surv(time, status) ~ age + sex, data = lung2, x = TRUE, y = TRUE) table_regression(cph_fit, vcov = "CR0", cluster = ~inst, exponentiate = TRUE) ``` ## Fully parametric: `flexsurv` `flexsurv::flexsurvreg()` (Jackson 2016) fits parametric survival models over a wide family of distributions (and the Royston–Parmar 2002 spline models). The location coefficients exponentiate to the generic `exp(B)` — the substantive reading (time ratio, hazard ratio) depends on the distribution, which the footer names together with its ancillary parameters: ```{r flexsurv, eval = have_flexsurv} fs <- flexsurv::flexsurvreg(Surv(time, status) ~ age + sex, data = lung2, dist = "weibull") table_regression(fs, exponentiate = TRUE) ``` Two guardrails apply. A spline model on the `"normal"` scale places its coefficients on a probit-like scale whose exponential has no meaning, so `exponentiate = TRUE` is refused there; and covariates on ancillary parameters (`anc =`) act on their own scales, so they refuse exponentiation as a whole rather than print meaningless ratios. Two Cox extensions flow through the same table without ceremony: stratified models (`strata()` terms are absorbed into the baseline, no coefficient row) and counting-process data (`Surv(tstart, tstop, event)` for time-varying covariates) render like any other fit. ## Output formats Everything above used the default console output. The same table renders as a raw data frame, a long broom-style tibble, or — with the corresponding Suggests package — a rich `gt`, `flextable`, `tinytable`, Excel, or Word table: ```{r outputs-df} table_regression(cx, exponentiate = TRUE, output = "data.frame") ``` ```{r outputs-gt, eval = build_rich_tables && have_survival} table_regression(cx, exponentiate = TRUE, output = "gt") ``` [`broom::tidy()`](https://broom.tidymodels.org) returns the long frame; with `exponentiate = TRUE` the estimates and CI bounds are on the ratio scale, as displayed: ```{r broom, eval = have_broom} broom::tidy(table_regression(cx, exponentiate = TRUE)) ``` ## References * Cox, D. R. (1972). Regression models and life-tables. *Journal of the Royal Statistical Society, Series B*, 34(2), 187–220. * Harrell, F. E. (2015). *Regression Modeling Strategies* (2nd ed.). Springer. * Jackson, C. H. (2016). `flexsurv`: A platform for parametric survival modeling in R. *Journal of Statistical Software*, 70(8). * Lin, D. Y., & Wei, L. J. (1989). The robust inference for the Cox proportional hazards model. *Journal of the American Statistical Association*, 84(408), 1074–1078. * Royston, P., & Parmar, M. K. B. (2002). Flexible parametric proportional-hazards and proportional-odds models for censored survival data. *Statistics in Medicine*, 21(15), 2175–2197. * Therneau, T. M., & Grambsch, P. M. (2000). *Modeling Survival Data: Extending the Cox Model*. Springer. * Wei, L. J. (1992). The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. *Statistics in Medicine*, 11(14–15), 1871–1879.