---
title: "Summary tables for APA-style reporting"
description: >
  Learn when to use table_categorical(), table_continuous(),
  table_continuous_lm(), and table_regression() for APA-style
  reporting in R, how their shared arguments fit together, and which
  output format to choose for console, Quarto, Word, or Excel
  workflows.
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Summary tables for APA-style reporting}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

build_rich_tables <- identical(Sys.getenv("IN_PKGDOWN"), "true")

source("_pkgdown-helpers.R")
```

```{r setup}
library(spicy)
```

spicy's four reporting helpers cover the full APA Manual 7 table
sequence used in empirical articles:

- `table_categorical()` and `table_continuous()` build **Table 1**
  (sample characteristics) and **Table 2** (group comparisons);
- `table_continuous_lm()` extends Table 2 to the linear-model regime
  when group means need robust SE, weights, or covariate adjustment;
- `table_regression()` builds **Table 3** (the coefficient table) from
  one or several fitted `lm()` / `glm()` models.

The four functions share the same output grammar — the same `output`
formats (`gt`, `tinytable`, `flextable`, `word`, `excel`, `clipboard`),
the same `decimal_mark`, `digits`, `p_digits`, `labels`, and `align`
arguments — so a single reporting workflow can move smoothly from
descriptive to inferential without juggling different APIs. This
vignette focuses on that shared logic; the function-specific articles
cover the methodological options in depth.

## Choose the right function

Use the function that matches the unit you want to report:

| Function | Reports | Selection grammar | Typical additions |
|:--|:--|:--|:--|
| `table_categorical()` | Categorical variables (factors, labelled) | `select`, `by` | Chi-squared test, association measure (`phi`, `cramer_v`, `tau_b`, ...), confidence interval |
| `table_continuous()` | Numeric / continuous variables | `select`, `by` | Group-comparison test (Student / Welch *t*, Wilcoxon, ANOVA, Kruskal–Wallis), effect size (`d`, `g`, `r`, `eta²`, `omega²`) |
| `table_continuous_lm()` | Numeric outcomes through one linear model per outcome | `select`, `by` (single predictor) | Robust / cluster-robust / bootstrap / jackknife SE, case weights, additive covariate adjustment, four effect-size families with noncentral CIs |
| `table_regression()` | One or several fitted `lm()` / `glm()` models | Fit-first: pass the model object(s) directly, no `select` / `by` | APA-aligned coefficient table with `B`, `β`, `95% CI`, `p`, AME, robust variance, side-by-side and hierarchical layouts |

In practice, follow the APA sequence:

- start with `table_categorical()` for smoking, education, or
  activity — APA Table 1 categorical descriptors;
- use `table_continuous()` for BMI, well-being, or income — Table 1
  continuous descriptors and Table 2 unadjusted group comparisons;
- switch to `table_continuous_lm()` when the same comparison must
  account for survey weights, robust SE, or covariate adjustment;
- finish with `table_regression()` once the substantive model is
  fitted — APA Table 3 with all predictors, factor groupings,
  reference rows, and (optionally) standardised coefficients,
  marginal effects, or nested model comparisons.

The first three functions live inside a `select` / `by` data-frame
grammar; `table_regression()` is **fit-first** — you build the model
the usual R way (`lm()` or `glm()`) and hand the object in. All four
share the post-construction grammar (`output`, `labels`, `digits`,
`decimal_mark`, `align`), so swapping functions never breaks your
rendering pipeline.

## A shared interface

The three descriptive functions share the same core arguments:

```{r grammar-categorical}
table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = education,
  labels = c("Smoking status", "Regular physical activity"),
  output = "tinytable"
)
```

```{r grammar-continuous}
table_continuous(
  sochealth,
  select = c(bmi, wellbeing_score, life_sat_health),
  by = education,
  labels = c(
    bmi = "Body mass index",
    wellbeing_score = "Well-being score",
    life_sat_health = "Satisfaction with health"
  ),
  output = "tinytable"
)
```

```{r grammar-continuous-lm}
table_continuous_lm(
  sochealth,
  select = c(bmi, wellbeing_score, life_sat_health),
  by = education,
  weights = weight
)
```

The same argument pattern is used in all three cases:

- `select` chooses the reported variables;
- `by` defines the grouping structure;
- `labels` cleans up the row labels;
- `output` decides how the result is rendered or exported.

For model-based continuous tables, the same pattern applies, but `by`
must be a single predictor because one linear model is fit per outcome.

`table_regression()` joins the same `labels` / `output` /
`decimal_mark` / `digits` grammar but is **fit-first**: rather than
expressing model structure inline through `select` and `by`, you pass
one or several already-fitted `lm()` or `glm()` objects:

```{r grammar-regression}
fit <- lm(
  wellbeing_score ~ age + sex + smoking + physical_activity,
  data = sochealth
)
table_regression(
  fit,
  labels = c(
    age               = "Age (years)",
    sex               = "Sex",
    smoking           = "Smoking status",
    physical_activity = "Regular physical activity"
  ),
  output = "tinytable"
)
```

This split is intentional. The descriptive trio (categorical,
continuous, continuous_lm) reports the *data* — `select` and `by`
describe what you want to see. `table_regression()` reports the
*model* — the model formula has already declared which predictors,
interactions, polynomials, transformations, splines, and contrasts to
report, so passing those again through `select` / `by` would
duplicate the model object's information and risk diverging from it.

## A practical reporting sequence

A common report contains both table types, often with the same grouping
variable. For example, you might first summarize categorical health
behaviors, then summarize continuous well-being indicators.

### Categorical variables

```{r report-categorical, eval = build_rich_tables}
pkgdown_dark_gt(
  table_categorical(
    sochealth,
    select = c(smoking, physical_activity, dentist_12m),
    by = education,
    labels = c(
      "Smoking status",
      "Regular physical activity",
      "Visited a dentist in the last 12 months"
    ),
    output = "gt"
  )
)
```

### Continuous variables

```{r report-continuous, eval = build_rich_tables}
pkgdown_dark_gt(
  table_continuous(
    sochealth,
    select = c(bmi, wellbeing_score, life_sat_health),
    by = education,
    labels = c(
      bmi = "Body mass index",
      wellbeing_score = "Well-being score",
      life_sat_health = "Satisfaction with health"
    ),
    p_value = TRUE,
    effect_size = TRUE,
    output = "gt"
  )
)
```

This keeps the reporting structure consistent while still using the
function that fits each variable type.

### Model-based continuous variables

```{r report-continuous-lm, eval = build_rich_tables}
pkgdown_dark_gt(
  table_continuous_lm(
    sochealth,
    select = c(bmi, wellbeing_score, life_sat_health),
    by = sex,
    vcov = "HC3",
    statistic = TRUE,
    output = "gt"
  )
)
```

This is the better summary-table path when the article is already
organized around simple linear models, weighted analyses, or robust
standard errors.

### The coefficient table

Once the substantive model is fitted, `table_regression()` produces
the APA Table 3 coefficient summary. The same `output` argument
controls rendering, so the regression table sits in the same
reporting pipeline as the descriptive ones above:

```{r report-regression, eval = build_rich_tables}
fit <- lm(
  wellbeing_score ~ age + sex + smoking + physical_activity,
  data = sochealth
)
pkgdown_dark_gt(
  table_regression(
    fit,
    standardized = "refit",
    show_columns = c("b", "beta", "ci", "p"),
    vcov = "HC3",
    output = "gt"
  )
)
```

The default footer documents the variance estimator and any
methodological choice that affected the rendered values (robust SE,
standardisation method, multiplicity correction) so the inferential
regime is visible without leaving the table.

Side-by-side reporting of competing specifications (e.g., unadjusted
vs. covariate-adjusted, or `lm` vs. `glm`) is supported by passing a
list of fits:

```{r report-regression-multi, eval = build_rich_tables}
fit_unadj <- lm(wellbeing_score ~ smoking, data = sochealth)
fit_adj   <- lm(
  wellbeing_score ~ smoking + age + sex + physical_activity,
  data = sochealth
)
pkgdown_dark_gt(
  table_regression(
    list("Unadjusted" = fit_unadj, "Adjusted" = fit_adj),
    show_columns = c("b", "ci", "p"),
    output = "gt"
  )
)
```

For binary or count outcomes, swap `lm()` for `glm()` and request
response-scale reporting (odds ratios, incidence rate ratios, etc.):

```{r report-regression-glm, eval = build_rich_tables}
fit_glm <- glm(
  smoking ~ age + sex + physical_activity,
  data = sochealth,
  family = binomial()
)
pkgdown_dark_gt(
  table_regression(
    fit_glm,
    exponentiate = TRUE,
    show_columns = c("b", "ci", "p", "ame", "ame_ci"),
    output = "gt"
  )
)
```

Average marginal effects (`ame`) are useful next to the odds ratio
because they report a probability-scale change for each predictor —
the quantity most reviewers want to interpret directly.

## Choose the output format

All four functions support the same reporting formats:

| Output | Best use |
|:--|:--|
| `"default"` | Quick console review in plain ASCII |
| `"tinytable"` | Quarto or R Markdown documents |
| `"gt"` | HTML output with styled reporting tables |
| `"flextable"` | Office-first workflows; also renders in HTML |
| `"excel"` | Spreadsheet handoff or downstream editing |
| `"word"` | Direct `.docx` export |
| `"clipboard"` | Fast pasting into another application |

Pick the output based on where the table is going, not on the analysis
itself. The underlying selection and grouping pattern stays the same.

If you want an object that fits naturally into Word and PowerPoint
workflows but can also be rendered in HTML documents, `flextable` is a
good choice:

```{r output-flextable, eval = FALSE}
if (requireNamespace("flextable", quietly = TRUE)) {
  table_continuous(
    sochealth,
    select = c(bmi, wellbeing_score, life_sat_health),
    by = education,
    output = "flextable"
  )
}
```

## Post-process the returned table object

All four summary-table helpers return regular `gt`, `tinytable`, or
`flextable` objects, so you can keep styling them with the native
package API. This includes `table_regression()`: nothing about the
fit-first interface changes what the rendering engine produces.

Use `gt::` functions when you want to keep the `gt` workflow:

```{r postprocess-gt, eval = build_rich_tables}
tab <- pkgdown_dark_gt(table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = education,
  labels = c("Smoking status", "Regular physical activity"),
  output = "gt"
))

tab |>
  gt::tab_header(
    title = "Health behaviors by education",
    subtitle = "Categorical summary table"
  ) |>
  gt::tab_source_note(
    gt::md("*Percentages are computed within each education group.*")
  )
```

Use `tinytable::` functions when you want lightweight table-specific
styling:

```{r postprocess-tinytable, eval = build_rich_tables}
tab <- table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = education,
  labels = c("Smoking status", "Regular physical activity"),
  output = "tinytable"
)

tab |>
  tinytable::style_tt(
    i = 2:3,
    j = 2:5,
    background = "red",
    color = "white",
    bold = TRUE
  )
```

Use `flextable::` functions when you want to keep working toward Office
or HTML document output. The example is shown as code here because the
dark pkgdown theme is not a reliable preview of the final `flextable`
HTML rendering:

```{r postprocess-flextable, eval = FALSE}
if (requireNamespace("flextable", quietly = TRUE)) {
  tab <- table_continuous(
    sochealth,
    select = c(bmi, wellbeing_score),
    by = education,
    output = "flextable"
  )

  tab |>
    flextable::theme_booktabs() |>
    flextable::autofit() |>
    flextable::fontsize(size = 10, part = "all")
}
```

## Keep the detailed options in the function-specific articles

The dedicated articles go deeper into each function:

- `table_categorical()` covers missing values, level filtering,
  association measures, and one-way frequency-style tables.
- `table_continuous()` covers grouped descriptive statistics,
  parametric and nonparametric tests, and effect sizes.
- `table_continuous_lm()` covers estimated marginal means or slopes
  from linear models, robust / cluster-robust / bootstrap / jackknife
  variance, case weights, additive covariate adjustment
  (G-computation or equal-weight), and four effect-size families
  with noncentral CIs.
- `table_regression()` covers single- and multi-model coefficient
  tables for `lm` / `glm`, four standardisation methods, partial
  effect sizes with noncentral-F CIs, average marginal effects,
  hierarchical (`nested = TRUE`) comparisons, multiplicity
  correction, and response-scale reporting for GLMs.

Use this vignette as the final reporting overview, then consult the
function-specific articles when you need the detailed controls.