| Title: | Descriptive Statistics, Summary Tables, and Data Management Tools |
|---|---|
| Description: | Provides tabulation, descriptive-summary, and variable-inspection tools for applied data analysis. Frequency tables and cross-tabulations with contingency-table association measures (Cramer's V, Phi, Goodman-Kruskal Gamma, Kendall's Tau-b, Somers' D, and others); categorical and continuous summary tables; regression coefficient tables for one or more 'lm' or 'glm' fits side by side; and outcome-by-group comparison tables from linear models with optional additive covariate adjustment. All table outputs follow APA conventions and expose 'broom'-compatible 'tidy()' / 'glance()' methods for downstream pipelines. Helpers cover interactive codebooks, variable-label extraction, clipboard export, and row-wise descriptive summaries. |
| Authors: | Amal Tawfik [aut, cre, cph] (ORCID: <https://orcid.org/0009-0006-2422-1555>, ROR: <https://ror.org/04j47fz63>) |
| Maintainer: | Amal Tawfik <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.12.0.9000 |
| Built: | 2026-05-21 02:45:22 UTC |
| Source: | https://github.com/amaltawfik/spicy |
spicy_regression_table
table_regression() returns a display representation by default
– a character data.frame with stars suffixes, em-dash for
reference rows, bracketed "[L, U]" confidence intervals, and APA
padding on p-values. This accessor returns the typed view that
the output engines (Excel, gt, tinytable, flextable, clipboard)
consume internally: a fully numeric body with CI pre-split into
LL / UL columns, NAs for non-applicable / reference cells, plus
per-cell markers and a format specification.
as_structured(x)as_structured(x)
x |
A |
This is the right entry point for users who want to:
Filter coefficients programmatically, e.g.
as_structured(tbl)$body[as_structured(tbl)$body$p < 0.05, ].
Aggregate raw values across rows, e.g.
mean(as_structured(tbl)$body[["B"]], na.rm = TRUE).
Build a custom downstream renderer that consumes the same structured contract as spicy's built-in engines.
A list with the structured view (see Details for the schema).
body – data.frame with a Variable character column and one
or more numeric columns. Confidence intervals are split into
LL / UL columns named like "95% CI: LL" / "95% CI: UL" (or
prefixed with the model label in multi-model output). Cells
that have no value (reference levels, non-applicable rows in
multi-model output, factor headers) are NA.
reference_rows, factor_header_rows, fit_stat_rows,
level_rows, outcome_row – integer row indices.
col_meta – per-column metadata keyed by structured column
name (token, model_id, precision, p-style, below-threshold,
CI pair / role / label).
spanners – named list mapping model labels to their column
indices in body (multi-model only).
ci_pairs – list of (label, cols) entries describing each
CI pair in body.
format_spec – global format defaults (decimal mark, digits,
p-style, CI level, etc.).
table_regression() for the user-facing entry point.
fit <- lm(mpg ~ wt + factor(cyl), data = mtcars) tbl <- table_regression(fit) s <- as_structured(tbl) s$body # raw numeric body s$body[s$body$p < 0.05, ] # filter significant rows s$col_meta$B # column metadata for Bfit <- lm(mpg ~ wt + factor(cyl), data = mtcars) tbl <- table_regression(fit) s <- as_structured(tbl) s$body # raw numeric body s$body[s$body$p < 0.05, ] # filter significant rows s$col_meta$B # column metadata for B
assoc_measures() computes a range of association measures for a
two-way contingency table and returns them in a tidy data frame.
assoc_measures( x, type = c("all", "nominal", "ordinal"), conf_level = 0.95, digits = 3L )assoc_measures( x, type = c("all", "nominal", "ordinal"), conf_level = 0.95, digits = 3L )
x |
A contingency table (of class |
type |
Which family of measures to compute:
|
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
type = "all" (the default) returns all nominal and ordinal
measures. Use type = "nominal" or type = "ordinal" to
restrict the output to a single family.
The nominal family includes cramer_v(), contingency_coef(),
lambda_gk(), goodman_kruskal_tau(), uncertainty_coef(),
and (for 2x2 tables) phi() and yule_q().
The ordinal family includes gamma_gk(), kendall_tau_b(),
kendall_tau_c(), and somers_d().
Standard error formulas follow the DescTools implementations (Signorell et al., 2024).
A data frame with columns measure, estimate, se,
ci_lower, ci_upper, and p_value. The p_value comes
from two test families:
Pearson chi-squared test of independence for Cramer's V, Phi, and the Contingency Coefficient (the three chi-squared-derived nominal measures). All three carry the same chi-squared p-value on a given table.
Wald z-test of H0: measure = 0 for every other measure: Yule's Q, Lambda, Goodman-Kruskal's Tau, the Uncertainty Coefficient, and all ordinal measures (Gamma, Tau-b, Tau-c, Somers' D).
Direction-dependent measures (lambda_gk(),
goodman_kruskal_tau(), uncertainty_coef(), somers_d())
contribute one row per direction (symmetric / R|C / C|R
where applicable), so the output has more rows than the
number of helper functions.
Agresti, A. (2002). Categorical Data Analysis (2nd ed.). Wiley.
Liebetrau, A. M. (1983). Measures of Association. Sage.
Signorell, A. et al. (2024). DescTools: Tools for Descriptive Statistics. R package.
cramer_v(), gamma_gk(), kendall_tau_b()
Other association measures:
contingency_coef(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
phi(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$smoking, sochealth$education) assoc_measures(tab) assoc_measures(tab, type = "nominal") assoc_measures(tab, type = "ordinal")tab <- table(sochealth$smoking, sochealth$education) assoc_measures(tab) assoc_measures(tab, type = "nominal") assoc_measures(tab, type = "ordinal")
code_book() creates an interactive and exportable codebook summarizing
selected variables of a data frame. It builds upon varlist() to provide
an overview of variable names, labels, classes, and representative values in
a sortable, searchable table.
The output is displayed as an interactive DT::datatable() in the Viewer pane
(for example in RStudio or Positron), allowing searching, sorting, and export
(copy, print, CSV, Excel, PDF) directly.
code_book( x, ..., values = FALSE, include_na = FALSE, title = "Codebook", filename = NULL, factor_levels = c("all", "observed") )code_book( x, ..., values = FALSE, include_na = FALSE, title = "Codebook", filename = NULL, factor_levels = c("all", "observed") )
x |
A data frame or tibble. |
... |
Optional tidyselect-style column selectors (e.g.
|
values |
Logical. If |
include_na |
Logical. If |
title |
Optional character string displayed as the table caption.
Defaults to |
filename |
Optional character string used as the base for exported CSV,
Excel, and PDF filenames. If |
factor_levels |
Character. Controls how factor values are displayed
in |
The interactive datatable supports column sorting, global
searching, and client-side export (copy, print, CSV, Excel,
PDF) directly from the Viewer.
Variable selection uses the same tidyselect interface as
varlist(); the underlying summary tibble is built by
varlist() with tbl = TRUE.
A DT::datatable object.
Requires the following package:
DT
varlist() for generating the underlying variable summaries.
Other variable inspection:
label_from_names(),
varlist()
## Not run: if (requireNamespace("DT", quietly = TRUE)) { code_book(sochealth) code_book(sochealth, starts_with("bmi")) code_book(sochealth, starts_with("bmi"), values = TRUE, include_na = TRUE) factors <- data.frame( group = factor(c("A", "B", NA), levels = c("A", "B", "C")) ) code_book( factors, values = TRUE, include_na = TRUE, factor_levels = "observed" ) code_book( sochealth, starts_with("bmi"), title = "BMI codebook", filename = "bmi_codebook" ) } ## End(Not run)## Not run: if (requireNamespace("DT", quietly = TRUE)) { code_book(sochealth) code_book(sochealth, starts_with("bmi")) code_book(sochealth, starts_with("bmi"), values = TRUE, include_na = TRUE) factors <- data.frame( group = factor(c("A", "B", NA), levels = c("A", "B", "C")) ) code_book( factors, values = TRUE, include_na = TRUE, factor_levels = "observed" ) code_book( sochealth, starts_with("bmi"), title = "BMI codebook", filename = "bmi_codebook" ) } ## End(Not run)
contingency_coef() computes Pearson's contingency coefficient C
for a two-way contingency table.
contingency_coef( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )contingency_coef( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
The contingency coefficient is
.
It ranges from 0 (independence) to a maximum that depends on
the table dimensions. No standard asymptotic standard error exists,
so the confidence interval is not computed.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests the null hypothesis of no association
(Pearson chi-squared test). CI values are NA because no
standard asymptotic SE exists for C.
Other association measures:
assoc_measures(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
phi(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$smoking, sochealth$education) contingency_coef(tab)tab <- table(sochealth$smoking, sochealth$education) contingency_coef(tab)
Copies a data.frame, matrix, 2D or higher array, table, or
atomic vector to the system clipboard, ready to paste into a
text editor, spreadsheet, or word processor. Wraps
clipr::write_clip() (a Suggests dependency); requires clipr
to be installed and a clipboard backend to be available on the
platform.
copy_clipboard( x, row.names.as.col = FALSE, row.names = TRUE, col.names = TRUE, show_message = TRUE, quiet = FALSE, ... )copy_clipboard( x, row.names.as.col = FALSE, row.names = TRUE, col.names = TRUE, show_message = TRUE, quiet = FALSE, ... )
x |
A |
row.names.as.col |
Logical or character. If |
row.names |
Logical. If |
col.names |
Logical. If |
show_message |
Logical. If |
quiet |
Logical. If |
... |
Additional arguments passed to |
Objects that are not data.frames or 2D matrices (atomic
vectors, arrays, tables) are automatically coerced to character
on the way to the clipboard, as required by
clipr::write_clip(). The R-side object passed to x is never
mutated.
Multidimensional arrays (3D and higher) are flattened to a 1D
character vector with one element per line. To preserve a
tabular layout, extract a 2D slice first, e.g.
copy_clipboard(my_array[, , 1]).
Invisibly returns x; the function is called for its
clipboard side effect.
if (clipr::clipr_available()) { # Data frame copy_clipboard(sochealth) # Data frame with row names as column copy_clipboard(head(sochealth), row.names.as.col = "id") # Matrix mat <- matrix(1:6, nrow = 2) copy_clipboard(mat) # Table tbl <- table(sochealth$education) copy_clipboard(tbl) # Array (3D) -- flattened to character arr <- array(1:8, dim = c(2, 2, 2)) copy_clipboard(arr) # Recommended: copy 2D slice for tabular layout copy_clipboard(arr[, , 1]) # Numeric vector copy_clipboard(c(3.14, 2.71, 1.618)) # Character vector copy_clipboard(c("apple", "banana", "cherry")) # Quiet mode (no messages shown) copy_clipboard(sochealth, quiet = TRUE) }if (clipr::clipr_available()) { # Data frame copy_clipboard(sochealth) # Data frame with row names as column copy_clipboard(head(sochealth), row.names.as.col = "id") # Matrix mat <- matrix(1:6, nrow = 2) copy_clipboard(mat) # Table tbl <- table(sochealth$education) copy_clipboard(tbl) # Array (3D) -- flattened to character arr <- array(1:8, dim = c(2, 2, 2)) copy_clipboard(arr) # Recommended: copy 2D slice for tabular layout copy_clipboard(arr[, , 1]) # Numeric vector copy_clipboard(c(3.14, 2.71, 1.618)) # Character vector copy_clipboard(c("apple", "banana", "cherry")) # Quiet mode (no messages shown) copy_clipboard(sochealth, quiet = TRUE) }
Counts, for each row of a data.frame or matrix, how many
times one or more values appear across selected columns. Supports
type-safe comparison (allow_coercion = FALSE), case-insensitive
string matching (ignore_case = TRUE), and detection of special
values (NA, NaN, Inf, -Inf) via special. Designed to
flow inside dplyr::mutate() pipelines.
count_n( data = NULL, select = tidyselect::everything(), exclude = NULL, count = NULL, special = NULL, allow_coercion = TRUE, ignore_case = FALSE, regex = FALSE, verbose = FALSE )count_n( data = NULL, select = tidyselect::everything(), exclude = NULL, count = NULL, special = NULL, allow_coercion = TRUE, ignore_case = FALSE, regex = FALSE, verbose = FALSE )
data |
A |
select |
Columns to include. Defaults to |
exclude |
Character vector of column names to exclude after selection.
Defaults to |
count |
Value(s) to count. Defaults to |
special |
Character vector of special values to count: |
allow_coercion |
Logical. If |
ignore_case |
Logical. If |
regex |
Logical. If |
verbose |
Logical. If |
A numeric vector of row-wise counts (unnamed), of length
nrow(data).
allow_coercion = FALSE)Comparison falls back to identical() when types differ, which
also inspects factor levels. Two consequences:
count = "b" does not match a factor "b" value: pass a
factor, e.g. count = factor("b", levels = levels(df$x)).
Even with a factor count, comparisons against columns
whose level set differs will return 0. To guarantee a
perfect match (label and levels), reuse a value taken from
the data itself (e.g. df$x[2]).
ignore_case = TRUE)All values are converted to lowercase via tolower() before
matching; factor columns are first coerced to character. This
mode takes precedence over allow_coercion: equality becomes
lowercase string equality, so "b" and "B" match even when
allow_coercion = FALSE.
count itselfR coerces mixed-type vectors at construction time: count = c(2, "2") becomes c("2", "2") before the function ever sees it.
To get type-sensitive matching, keep count homogeneous.
datawizard::row_count() for a closely related row-wise
counter; count_n() adds element-wise type-safe matching,
multi-value count, and special-value detection.
Other row-wise summaries:
mean_n(),
sum_n()
library(dplyr) library(tibble) library(labelled) # Basic usage df <- tibble( x = c(1, 2, 2, 3, NA), y = c(2, 2, NA, 3, 2), z = c("2", "2", "2", "3", "2") ) count_n(df, count = 2) count_n(df, count = 2, allow_coercion = FALSE) df |> mutate(num_twos = count_n(count = 2)) # Mixed types and special values df <- tibble( num = c(1, 2, NA, -Inf, NaN), char = c("a", "B", "b", "a", NA), fact = factor(c("a", "b", "b", "a", "c")), date = as.Date(c("2023-01-01", "2023-01-01", NA, "2023-01-02", "2023-01-01")), lab = labelled(c(1, 2, 1, 2, NA), labels = c(No = 1, Yes = 2)), logic = c(TRUE, FALSE, NA, TRUE, FALSE) ) count_n(df, count = 2) count_n(df, count = "b", ignore_case = TRUE) count_n(df, count = "a", select = fact) count_n(df, count = as.Date("2023-01-01"), select = date) # Count special values count_n(df, special = "NA") # Column selection strategies df <- tibble( score_math = c(1, 2, 2, 3, NA), score_science = c(2, 2, NA, 3, 2), score_lang = c("2", "2", "2", "3", "2"), name = c("Jean", "Marie", "Ali", "Zoe", "Nina") ) count_n(df, select = c(score_math, score_science), count = 2) count_n(df, select = starts_with("score_"), exclude = "score_lang", count = 2) count_n(df, select = "^score_", regex = TRUE, count = 2) df |> mutate(nb_two = count_n(count = 2)) # Strict type-safe matching with factor columns df <- tibble( x = factor(c("a", "b", "c")), y = factor(c("b", "B", "a")) ) # Coercion: character "b" matches both x and y count_n(df, count = "b") # Strict match: fails because "b" is character, not factor (returns only 0s) count_n(df, count = "b", allow_coercion = FALSE) # Strict match with factor value: works only where levels match count_n(df, count = factor("b", levels = levels(df$x)), allow_coercion = FALSE)library(dplyr) library(tibble) library(labelled) # Basic usage df <- tibble( x = c(1, 2, 2, 3, NA), y = c(2, 2, NA, 3, 2), z = c("2", "2", "2", "3", "2") ) count_n(df, count = 2) count_n(df, count = 2, allow_coercion = FALSE) df |> mutate(num_twos = count_n(count = 2)) # Mixed types and special values df <- tibble( num = c(1, 2, NA, -Inf, NaN), char = c("a", "B", "b", "a", NA), fact = factor(c("a", "b", "b", "a", "c")), date = as.Date(c("2023-01-01", "2023-01-01", NA, "2023-01-02", "2023-01-01")), lab = labelled(c(1, 2, 1, 2, NA), labels = c(No = 1, Yes = 2)), logic = c(TRUE, FALSE, NA, TRUE, FALSE) ) count_n(df, count = 2) count_n(df, count = "b", ignore_case = TRUE) count_n(df, count = "a", select = fact) count_n(df, count = as.Date("2023-01-01"), select = date) # Count special values count_n(df, special = "NA") # Column selection strategies df <- tibble( score_math = c(1, 2, 2, 3, NA), score_science = c(2, 2, NA, 3, 2), score_lang = c("2", "2", "2", "3", "2"), name = c("Jean", "Marie", "Ali", "Zoe", "Nina") ) count_n(df, select = c(score_math, score_science), count = 2) count_n(df, select = starts_with("score_"), exclude = "score_lang", count = 2) count_n(df, select = "^score_", regex = TRUE, count = 2) df |> mutate(nb_two = count_n(count = 2)) # Strict type-safe matching with factor columns df <- tibble( x = factor(c("a", "b", "c")), y = factor(c("b", "B", "a")) ) # Coercion: character "b" matches both x and y count_n(df, count = "b") # Strict match: fails because "b" is character, not factor (returns only 0s) count_n(df, count = "b", allow_coercion = FALSE) # Strict match with factor value: works only where levels match count_n(df, count = factor("b", levels = levels(df$x)), allow_coercion = FALSE)
cramer_v() computes Cramer's V for a two-way contingency table,
measuring the strength of association between two categorical variables.
cramer_v( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )cramer_v( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
Cramer's V is computed as
, where
is the Pearson chi-squared statistic, is the total count,
and . The point estimate matches the
DescTools (Signorell et al., 2024) and SPSS implementations.
The confidence interval uses the Fisher z-transformation
on (), which differs from the noncentral chi-squared
or bootstrap CIs reported by DescTools::CramerV().
When detail = FALSE: a single numeric value (the
estimate).
When detail = TRUE and conf_level is non-NULL:
c(estimate, ci_lower, ci_upper, p_value).
When detail = TRUE and conf_level = NULL:
c(estimate, p_value).
The p-value tests the null hypothesis of no association
(Pearson chi-squared test).
Agresti, A. (2002). Categorical Data Analysis (2nd ed.). Wiley.
Liebetrau, A. M. (1983). Measures of Association. Sage.
Signorell, A. et al. (2024). DescTools: Tools for Descriptive Statistics. R package.
phi(), contingency_coef(), assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
phi(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$smoking, sochealth$education) cramer_v(tab) cramer_v(tab, detail = TRUE) cramer_v(tab, detail = TRUE, conf_level = NULL)tab <- table(sochealth$smoking, sochealth$education) cramer_v(tab) cramer_v(tab, detail = TRUE) cramer_v(tab, detail = TRUE, conf_level = NULL)
Computes a two-way cross-tabulation with optional weights, grouping
(including combinations of multiple variables via interaction()),
row / column percentages, and inferential statistics (Chi-squared
test with an APA-style association measure).
Both x and y are required; for one-way frequency tables, use
freq().
cross_tab( data, x, y = NULL, by = NULL, weights = NULL, rescale = FALSE, percent = c("none", "column", "row"), include_stats = TRUE, assoc_measure = c("auto", "cramer_v", "phi", "gamma", "tau_b", "tau_c", "somers_d", "lambda", "none"), assoc_ci = FALSE, correct = FALSE, simulate_p = FALSE, simulate_B = 2000, digits = NULL, styled = TRUE, show_n = TRUE, decimal_mark = ".", p_digits = 3L )cross_tab( data, x, y = NULL, by = NULL, weights = NULL, rescale = FALSE, percent = c("none", "column", "row"), include_stats = TRUE, assoc_measure = c("auto", "cramer_v", "phi", "gamma", "tau_b", "tau_c", "somers_d", "lambda", "none"), assoc_ci = FALSE, correct = FALSE, simulate_p = FALSE, simulate_B = 2000, digits = NULL, styled = TRUE, show_n = TRUE, decimal_mark = ".", p_digits = 3L )
data |
A data frame. Alternatively, a vector when using the vector-based interface. |
x |
Row variable (unquoted). |
y |
Column variable (unquoted). Required; the |
by |
Optional grouping variable or expression. Can be a single variable
or a combination of multiple variables (e.g. |
weights |
Optional numeric weights. |
rescale |
Logical. If |
percent |
One of |
include_stats |
Logical. If |
assoc_measure |
Character. Which association measure to report.
|
assoc_ci |
Logical. If |
correct |
Logical. If |
simulate_p |
Logical. If |
simulate_B |
Integer. Number of replicates for Monte Carlo simulation.
Defaults to |
digits |
Number of decimals for cell values. Defaults to
|
styled |
Logical. If |
show_n |
Logical. If |
decimal_mark |
Character used as the decimal mark in printed
numeric values (cells, chi-squared, association estimate, CI
bounds, p-value). Defaults to |
p_digits |
Integer number of decimals used to format the
p-value (and to determine the small- |
Depends on styled and by:
styled = TRUE, no by: a spicy_cross_table object
(a data.frame carrying rendering metadata as attributes:
title, digits, decimal_mark, n_row_idx, n_col_name,
and the inferential block when include_stats = TRUE).
Printing dispatches to print.spicy_cross_table().
styled = TRUE, by supplied: a spicy_cross_table_list,
i.e. a named list of spicy_cross_table objects (one element
per group level, named by that level). Printing dispatches to
print.spicy_cross_table_list() which renders each table in
turn separated by a blank line.
styled = FALSE: the same payload returned as a plain
data.frame (or named list of data.frames with by),
stripped of the spicy_* classes for downstream programmatic
use.
Cell columns are the levels of y; rows are the levels of x.
When percent != "none", the N column (or N row) is added
according to show_n. When include_stats = TRUE, the result
carries a Chi-squared row (statistic, df, p) and an
association-measure row (estimate, optional CI via assoc_ci).
The function recognizes the following global options that modify its default behavior:
options(spicy.percent = "column")
Sets the default percentage mode for all calls to cross_tab().
Valid values are "none", "row", and "column".
Equivalent to setting percent = "column" (or another choice) in each call.
options(spicy.simulate_p = TRUE)
Enables Monte Carlo simulation for all Chi-squared tests by default.
Equivalent to setting simulate_p = TRUE in every call.
options(spicy.rescale = TRUE)
Automatically rescales weights so that total weighted N equals the raw N.
Equivalent to setting rescale = TRUE in each call.
These options are convenient for users who wish to enforce consistent behavior
across multiple calls to cross_tab() and other spicy table functions.
They can be disabled or reset by setting them to NULL:
options(spicy.percent = NULL, spicy.simulate_p = NULL, spicy.rescale = NULL).
Example:
options(spicy.simulate_p = TRUE, spicy.rescale = TRUE) cross_tab(sochealth, smoking, education, weights = weight)
# Basic crosstab cross_tab(sochealth, smoking, education) # Column percentages cross_tab(sochealth, smoking, education, percent = "column") # Weighted (rescaled) cross_tab(sochealth, smoking, education, weights = weight, rescale = TRUE) # Grouped by sex cross_tab(sochealth, smoking, education, by = sex) # Grouped by combination of variables cross_tab(sochealth, smoking, education, by = interaction(sex, age_group)) # Ordinal variables: auto-selects Kendall's Tau-b cross_tab(sochealth, education, self_rated_health) # 2x2 table with Yates correction cross_tab(sochealth, smoking, physical_activity, correct = TRUE) # APA-style p-value precision and European decimal mark cross_tab(sochealth, smoking, education, decimal_mark = ",", p_digits = 4)# Basic crosstab cross_tab(sochealth, smoking, education) # Column percentages cross_tab(sochealth, smoking, education, percent = "column") # Weighted (rescaled) cross_tab(sochealth, smoking, education, weights = weight, rescale = TRUE) # Grouped by sex cross_tab(sochealth, smoking, education, by = sex) # Grouped by combination of variables cross_tab(sochealth, smoking, education, by = interaction(sex, age_group)) # Ordinal variables: auto-selects Kendall's Tau-b cross_tab(sochealth, education, self_rated_health) # 2x2 table with Yates correction cross_tab(sochealth, smoking, physical_activity, correct = TRUE) # APA-style p-value precision and European decimal mark cross_tab(sochealth, smoking, education, decimal_mark = ",", p_digits = 4)
Creates a frequency table for a vector or variable from a data frame, with options for weighting, sorting, handling labelled data, defining custom missing values, and displaying cumulative percentages.
When styled = TRUE, the function prints a spicy-formatted ASCII table
using print.spicy_freq_table() and spicy_print_table(); otherwise, it
returns a data.frame containing frequencies and proportions.
freq( data, x = NULL, weights = NULL, digits = 1L, valid = TRUE, cum = FALSE, sort = "", na_val = NULL, labelled_levels = c("prefixed", "labels", "values"), factor_levels = c("observed", "all"), rescale = TRUE, decimal_mark = ".", styled = TRUE, ... )freq( data, x = NULL, weights = NULL, digits = 1L, valid = TRUE, cum = FALSE, sort = "", na_val = NULL, labelled_levels = c("prefixed", "labels", "values"), factor_levels = c("observed", "all"), rescale = TRUE, decimal_mark = ".", styled = TRUE, ... )
data |
A |
x |
A variable from |
weights |
Optional numeric vector of weights (same length as |
digits |
Number of decimal digits to display for percentages (default: |
valid |
Logical. If |
cum |
Logical. If |
sort |
Sorting method for values:
|
na_val |
Atomic vector of numeric or character values to be treated as missing ( For labelled variables (from haven or labelled), this argument must refer to the underlying coded values, not the visible labels. Example: x <- labelled(c(1, 2, 3, 1, 2, 3), c("Low" = 1, "Medium" = 2, "High" = 3))
freq(x, na_val = 1) # Treat all "Low" as missing
|
labelled_levels |
For
|
factor_levels |
Character. Controls how factor and labelled values
are displayed in the frequency table. |
rescale |
Logical. If |
decimal_mark |
Character used as the decimal mark in printed
percentages. Either |
styled |
Logical. If |
... |
Additional arguments passed to |
Designed to mimic common frequency procedures from SPSS or Stata
while integrating the flexibility of R's data structures. The
input type (vector, factor, labelled) is auto-detected; see
@param labelled_levels and @param factor_levels for the
schema-vs-observed level controls, and @param na_val for
optional sentinel-value recoding.
Weighting (weights): frequencies and percentages are computed
proportionally to the weights. Missing values in weights cause
those observations to be dropped from the table entirely (with a
warning), matching the behaviour of cross_tab() in spicy
0.11.0+. With rescale = TRUE, the remaining (non-NA-weighted)
weights are normalised so the total weighted N equals the count
of non-NA-weighted rows. With rescale = FALSE, the total
weighted N is the actual sum of non-NA weights.
For schema-level inspection without computing frequencies, use
varlist() or code_book().
With styled = FALSE, a plain data.frame with no extra attributes
and columns:
value - unique values or factor levels
n - frequency count (weighted if applicable)
prop - proportion of total
valid_prop - proportion of valid responses (if valid = TRUE)
cum_prop, cum_valid_prop - cumulative percentages (if cum = TRUE)
With styled = TRUE (default), prints the formatted table to the
console and invisibly returns a spicy_freq_table object: the same
data.frame carrying rendering metadata as attributes (digits,
data_name, var_name, var_label, class_name, n_total,
n_valid, weighted, rescaled, weight_var) used by
print.spicy_freq_table().
cross_tab() for two-way cross-tabulations;
table_categorical() for multi-variable categorical summary
tables; varlist() / code_book() for variable inspection;
print.spicy_freq_table() for formatted printing;
spicy_print_table() for the underlying ASCII rendering engine.
# Frequency table with labelled ordered factor freq(sochealth, education) freq(sochealth, self_rated_health, sort = "-") library(labelled) # Simple numeric vector x <- c(1, 2, 2, 3, 3, 3, NA) freq(x) # Plain vector with a sentinel value recoded as missing freq(c(1, 2, 3, 99, 99), na_val = 99) # Labelled variable (haven-style) x_lbl <- labelled( c(1, 2, 3, 1, 2, 3, 1, 2, NA), labels = c("Low" = 1, "Medium" = 2, "High" = 3) ) var_label(x_lbl) <- "Satisfaction level" # Treat value 1 ("Low") as missing freq(x_lbl, na_val = 1) # Display only labels, add cumulative % freq(x_lbl, labelled_levels = "labels", cum = TRUE) # Display values only, sorted descending freq(x_lbl, labelled_levels = "values", sort = "-") # Show all declared factor levels, including unused ones (SPSS-style). # The default "observed" mirrors Stata's `tab` and drops unused levels. f <- factor(c("Yes", "No", "Yes"), levels = c("Yes", "No", "Maybe")) freq(f, factor_levels = "all") # With weighting df <- data.frame( sex = factor(c("Male", "Female", "Female", "Male", NA, "Female")), weight = c(12, 8, 10, 15, 7, 9) ) # Weighted frequencies (normalized) freq(df, sex, weights = weight, rescale = TRUE) # Weighted frequencies (without rescaling) freq(df, sex, weights = weight, rescale = FALSE) # Base R style, with weights and cumulative percentages freq(df$sex, weights = df$weight, cum = TRUE) # Piped version (tidy syntax) and sort alphabetically descending ("name-") df |> freq(sex, sort = "name-") # European decimal mark (matches `cross_tab()` and the `table_*()` family) freq(sochealth, education, decimal_mark = ",") # Non-styled return (for programmatic use) f <- freq(df, sex, styled = FALSE) head(f)# Frequency table with labelled ordered factor freq(sochealth, education) freq(sochealth, self_rated_health, sort = "-") library(labelled) # Simple numeric vector x <- c(1, 2, 2, 3, 3, 3, NA) freq(x) # Plain vector with a sentinel value recoded as missing freq(c(1, 2, 3, 99, 99), na_val = 99) # Labelled variable (haven-style) x_lbl <- labelled( c(1, 2, 3, 1, 2, 3, 1, 2, NA), labels = c("Low" = 1, "Medium" = 2, "High" = 3) ) var_label(x_lbl) <- "Satisfaction level" # Treat value 1 ("Low") as missing freq(x_lbl, na_val = 1) # Display only labels, add cumulative % freq(x_lbl, labelled_levels = "labels", cum = TRUE) # Display values only, sorted descending freq(x_lbl, labelled_levels = "values", sort = "-") # Show all declared factor levels, including unused ones (SPSS-style). # The default "observed" mirrors Stata's `tab` and drops unused levels. f <- factor(c("Yes", "No", "Yes"), levels = c("Yes", "No", "Maybe")) freq(f, factor_levels = "all") # With weighting df <- data.frame( sex = factor(c("Male", "Female", "Female", "Male", NA, "Female")), weight = c(12, 8, 10, 15, 7, 9) ) # Weighted frequencies (normalized) freq(df, sex, weights = weight, rescale = TRUE) # Weighted frequencies (without rescaling) freq(df, sex, weights = weight, rescale = FALSE) # Base R style, with weights and cumulative percentages freq(df$sex, weights = df$weight, cum = TRUE) # Piped version (tidy syntax) and sort alphabetically descending ("name-") df |> freq(sex, sort = "name-") # European decimal mark (matches `cross_tab()` and the `table_*()` family) freq(sochealth, education, decimal_mark = ",") # Non-styled return (for programmatic use) f <- freq(df, sex, styled = FALSE) head(f)
gamma_gk() computes the Goodman-Kruskal Gamma statistic for a
two-way contingency table of ordinal variables.
gamma_gk( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )gamma_gk( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
Gamma is computed as , where
and are the numbers of concordant and
discordant pairs. It ignores tied pairs, making it appropriate
for ordinal variables with many ties.
Standard error formulas follow the DescTools implementations
(Signorell et al., 2024); see cramer_v() for full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: gamma = 0 (Wald z-test).
kendall_tau_b(), kendall_tau_c(), somers_d(),
assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
phi(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$education, sochealth$self_rated_health) gamma_gk(tab) gamma_gk(tab, detail = TRUE)tab <- table(sochealth$education, sochealth$self_rated_health) gamma_gk(tab) gamma_gk(tab, detail = TRUE)
goodman_kruskal_tau() computes Goodman-Kruskal's Tau, a
proportional reduction in error (PRE) measure for nominal
variables.
goodman_kruskal_tau( x, direction = c("row", "column"), detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )goodman_kruskal_tau( x, direction = c("row", "column"), detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
direction |
Direction of prediction:
|
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
Unlike lambda_gk(), Goodman-Kruskal's Tau uses all cell
frequencies rather than only the modal categories, making it
more sensitive to association patterns where lambda may be
zero. Goodman-Kruskal's Tau is intrinsically directional and
has no canonical symmetric form (unlike lambda_gk() or
uncertainty_coef()); only "row" and "column" are
supported.
Standard error formulas follow the DescTools implementations
(Signorell et al., 2024); see cramer_v() for full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: tau = 0 (Wald z-test).
lambda_gk(), uncertainty_coef(), assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
gamma_gk(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
phi(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$smoking, sochealth$education) goodman_kruskal_tau(tab) goodman_kruskal_tau(tab, direction = "column", detail = TRUE)tab <- table(sochealth$smoking, sochealth$education) goodman_kruskal_tau(tab) goodman_kruskal_tau(tab, direction = "column", detail = TRUE)
kendall_tau_b() computes Kendall's Tau-b for a two-way
contingency table of ordinal variables.
kendall_tau_b( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )kendall_tau_b( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
Kendall's Tau-b is computed as
,
where , is the number of
pairs tied on the row variable, and is the number
tied on the column variable. Tau-b corrects for ties and is
appropriate for square tables.
Standard error formulas follow the DescTools implementations
(Signorell et al., 2024); see cramer_v() for full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: tau-b = 0 (Wald z-test).
kendall_tau_c(), gamma_gk(), somers_d(),
assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_c(),
lambda_gk(),
phi(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$education, sochealth$self_rated_health) kendall_tau_b(tab)tab <- table(sochealth$education, sochealth$self_rated_health) kendall_tau_b(tab)
kendall_tau_c() computes Stuart's Tau-c (also known as
Kendall's Tau-c) for a two-way contingency table of ordinal
variables.
kendall_tau_c( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )kendall_tau_c( x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
Stuart's Tau-c is computed as
, where
. It is designed for rectangular tables;
the estimate is bounded by only when the table is
square, and may fall outside that range otherwise.
Standard error formulas follow the DescTools implementations
(Signorell et al., 2024); see cramer_v() for full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: tau-c = 0 (Wald z-test).
kendall_tau_b(), gamma_gk(), somers_d(),
assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
lambda_gk(),
phi(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$education, sochealth$self_rated_health) kendall_tau_c(tab)tab <- table(sochealth$education, sochealth$self_rated_health) kendall_tau_c(tab)
name<sep>label
Splits each column name at the first occurrence of sep,
renames the column to the part before sep (the name, trimmed
of surrounding whitespace), and assigns the part after sep as a
"label" attribute on the column. The label attribute follows
the haven convention also used
by labelled::var_label(), so labelled-aware tooling
(labelled, haven, varlist(), code_book(), ...) reads it
transparently. Splitting at the first sep means the label
itself may contain the separator.
label_from_names(df, sep = ". ")label_from_names(df, sep = ". ")
df |
A |
sep |
Character string used as separator between name and
label. Default |
Designed primarily for LimeSurvey CSV exports with Headings:
Question code & question text, which produce column names like
"code. question text". The default separator ". " matches
that export.
LimeSurvey question codes (the part before sep) are
restricted to alphanumerics, must start with a letter, and
contain no spaces – so the column name has to carry both the
code and the question text. If your export uses Headings:
Question code (codes only), re-export with Question code &
question text before calling this function; there is no way to
recover a label from a code alone.
Whitespace handling: the name (left of sep) is trimmed
of surrounding whitespace, because R column names are intended
to be referenced bare (without backticks) and leading / trailing
whitespace would force quoting throughout the user's downstream
code. The label (right of sep) is preserved verbatim,
following the Stata / SPSS convention that variable labels are
faithful user content – spicy does not silently mutate label
strings. To trim labels yourself, post-process with
labelled::var_label(df) <- lapply(labelled::var_label(df), trimws).
An object of the same class as df – a base
data.frame if df was a base data.frame, a tbl_df if df
was a tibble. The output has column names equal to the trimmed
names (before sep) and, for every column whose original name
contained sep, a "label" attribute equal to the label (after
sep). Columns whose name does not contain sep are passed
through unchanged with no label attached.
The function raises an actionable error – rather than letting the downstream constructor raise a cryptic one – when the split produces:
duplicate column names (two original names share the same
prefix before sep); or
an empty column name (the original name starts with sep
and has nothing before it).
labelled::var_label() reads the "label" attribute set
by this function; varlist() and code_book() surface it in
their inspection outputs.
Other variable inspection:
code_book(),
varlist()
# LimeSurvey-style column names (default sep = ". "). df <- data.frame( "age. Age of respondent" = c(25, 30), "score. Total score. Manually computed." = c(12, 14), check.names = FALSE ) out <- label_from_names(df) attr(out$age, "label") attr(out$score, "label") # Custom separator. df2 <- data.frame( "id|Identifier" = 1:3, "score|Total score" = c(10, 20, 30), check.names = FALSE ) out2 <- label_from_names(df2, sep = "|")# LimeSurvey-style column names (default sep = ". "). df <- data.frame( "age. Age of respondent" = c(25, 30), "score. Total score. Manually computed." = c(12, 14), check.names = FALSE ) out <- label_from_names(df) attr(out$age, "label") attr(out$score, "label") # Custom separator. df2 <- data.frame( "id|Identifier" = 1:3, "score|Total score" = c(10, 20, 30), check.names = FALSE ) out2 <- label_from_names(df2, sep = "|")
lambda_gk() computes Goodman-Kruskal's Lambda, a proportional
reduction in error (PRE) measure for nominal variables.
lambda_gk( x, direction = c("symmetric", "row", "column"), detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )lambda_gk( x, direction = c("symmetric", "row", "column"), detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
direction |
Direction of prediction:
|
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
Lambda measures how much prediction error is reduced when
the independent variable is used to predict the dependent
variable. It ranges from 0 (no reduction) to 1 (perfect
prediction). Lambda can equal zero even when variables
are associated if the modal category dominates in every
column (or row).
Standard error formulas follow the DescTools implementations
(Signorell et al., 2024); see cramer_v() for full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: lambda = 0 (Wald z-test).
goodman_kruskal_tau(), uncertainty_coef(),
assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
phi(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$smoking, sochealth$education) lambda_gk(tab) lambda_gk(tab, direction = "row") lambda_gk(tab, direction = "column", detail = TRUE)tab <- table(sochealth$smoking, sochealth$education) lambda_gk(tab) lambda_gk(tab, direction = "row") lambda_gk(tab, direction = "column", detail = TRUE)
Computes row-wise means across selected numeric columns of a
data.frame or matrix. Missing values are handled per row via
min_valid (an integer count or proportion of non-NA values
required); rows that fail the rule return NA. Non-numeric
columns are dropped silently (set verbose = TRUE to see which).
Designed to flow inside dplyr::mutate(): when called without
an explicit data argument, the current data context is used.
mean_n( data = NULL, select = tidyselect::everything(), exclude = NULL, min_valid = NULL, digits = NULL, regex = FALSE, verbose = FALSE )mean_n( data = NULL, select = tidyselect::everything(), exclude = NULL, min_valid = NULL, digits = NULL, regex = FALSE, verbose = FALSE )
data |
A |
select |
Columns to include. If |
exclude |
Columns to exclude (default: |
min_valid |
Minimum number of valid (non-
Non-integer values |
digits |
Optional non-negative integer giving the number of
decimal places to round the result to. Defaults to |
regex |
Logical. If |
verbose |
Logical. If |
A numeric vector of row-wise means.
Other row-wise summaries:
count_n(),
sum_n()
library(dplyr) # Create a simple numeric data frame df <- tibble( var1 = c(10, NA, 30, 40, 50), var2 = c(5, NA, 15, NA, 25), var3 = c(NA, 30, 20, 50, 10) ) # Compute row-wise mean (all values must be valid by default) mean_n(df) # Require at least 2 valid (non-NA) values per row mean_n(df, min_valid = 2) # Require at least 50% valid (non-NA) values per row mean_n(df, min_valid = 0.5) # Round the result to 1 decimal mean_n(df, digits = 1) # Select specific columns mean_n(df, select = c(var1, var2)) # Select specific columns using a pipe df |> select(var1, var2) |> mean_n() # Exclude a column mean_n(df, exclude = "var3") # Select columns ending with "1" mean_n(df, select = ends_with("1")) # Use with native pipe df |> mean_n(select = starts_with("var")) # Use inside dplyr::mutate() df |> mutate(mean_score = mean_n(min_valid = 2)) # Select columns directly inside mutate() df |> mutate(mean_score = mean_n(select = c(var1, var2), min_valid = 1)) # Select columns before mutate df |> select(var1, var2) |> mutate(mean_score = mean_n(min_valid = 1)) # Show verbose processing info df |> mutate(mean_score = mean_n(min_valid = 2, digits = 1, verbose = TRUE)) # Add character and grouping columns df_mixed <- mutate(df, name = letters[1:5], group = c("A", "A", "B", "B", "A") ) df_mixed # Non-numeric columns are ignored mean_n(df_mixed) # Use within mutate() on mixed data df_mixed |> mutate(mean_score = mean_n(select = starts_with("var"))) # Use everything() but exclude non-numeric columns manually mean_n(df_mixed, select = everything(), exclude = "group") # Select columns using regex mean_n(df_mixed, select = "^var", regex = TRUE) mean_n(df_mixed, select = "ar", regex = TRUE) # Apply to a subset of rows (first 3) df_mixed[1:3, ] |> mean_n(select = starts_with("var")) # Store the result in a new column df_mixed$mean_score <- mean_n(df_mixed, select = starts_with("var")) df_mixed # With a numeric matrix mat <- matrix(c(1, 2, NA, 4, 5, NA, 7, 8, 9), nrow = 3, byrow = TRUE) mat mat |> mean_n(min_valid = 2)library(dplyr) # Create a simple numeric data frame df <- tibble( var1 = c(10, NA, 30, 40, 50), var2 = c(5, NA, 15, NA, 25), var3 = c(NA, 30, 20, 50, 10) ) # Compute row-wise mean (all values must be valid by default) mean_n(df) # Require at least 2 valid (non-NA) values per row mean_n(df, min_valid = 2) # Require at least 50% valid (non-NA) values per row mean_n(df, min_valid = 0.5) # Round the result to 1 decimal mean_n(df, digits = 1) # Select specific columns mean_n(df, select = c(var1, var2)) # Select specific columns using a pipe df |> select(var1, var2) |> mean_n() # Exclude a column mean_n(df, exclude = "var3") # Select columns ending with "1" mean_n(df, select = ends_with("1")) # Use with native pipe df |> mean_n(select = starts_with("var")) # Use inside dplyr::mutate() df |> mutate(mean_score = mean_n(min_valid = 2)) # Select columns directly inside mutate() df |> mutate(mean_score = mean_n(select = c(var1, var2), min_valid = 1)) # Select columns before mutate df |> select(var1, var2) |> mutate(mean_score = mean_n(min_valid = 1)) # Show verbose processing info df |> mutate(mean_score = mean_n(min_valid = 2, digits = 1, verbose = TRUE)) # Add character and grouping columns df_mixed <- mutate(df, name = letters[1:5], group = c("A", "A", "B", "B", "A") ) df_mixed # Non-numeric columns are ignored mean_n(df_mixed) # Use within mutate() on mixed data df_mixed |> mutate(mean_score = mean_n(select = starts_with("var"))) # Use everything() but exclude non-numeric columns manually mean_n(df_mixed, select = everything(), exclude = "group") # Select columns using regex mean_n(df_mixed, select = "^var", regex = TRUE) mean_n(df_mixed, select = "ar", regex = TRUE) # Apply to a subset of rows (first 3) df_mixed[1:3, ] |> mean_n(select = starts_with("var")) # Store the result in a new column df_mixed$mean_score <- mean_n(df_mixed, select = starts_with("var")) df_mixed # With a numeric matrix mat <- matrix(c(1, 2, NA, 4, 5, NA, 7, 8, 9), nrow = 3, byrow = TRUE) mat mat |> mean_n(min_valid = 2)
phi() computes the phi coefficient for a 2x2 contingency table.
phi(x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE)phi(x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE)
x |
A contingency table (of class |
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
The phi coefficient is . It is
equivalent to Cramer's V for 2x2 tables and equals the absolute
value of the Pearson correlation between the two binary
variables – spicy returns only the magnitude (always
non-negative), matching the DescTools (Signorell et al., 2024)
and SPSS conventions. To recover the signed direction of the
2x2 association, compute the Pearson correlation directly
(e.g. cor(x, y) after coding both variables 0/1).
The confidence interval uses the Fisher z-transformation on
; see cramer_v() for the formula and full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests the null hypothesis of no association
(Pearson chi-squared test).
cramer_v(), yule_q(), assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
somers_d(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$smoking, sochealth$sex) phi(tab) phi(tab, detail = TRUE)tab <- table(sochealth$smoking, sochealth$sex) phi(tab) phi(tab, detail = TRUE)
A simulated dataset of 1200 respondents from a fictional social-health survey, designed to illustrate the main features of the spicy package: variable labels, ordered factors, survey weights, association measures, and APA-style reporting.
sochealthsochealth
A tibble with 1200 rows and 24 variables:
Factor. Sex of the respondent.
Numeric. Age in years (25–75).
Ordered factor. Age group (25–34, 35–49, 50–64, 65–75).
Ordered factor. Highest education level (Lower secondary, Upper secondary, Tertiary).
Ordered factor. Subjective social class (Lower, Working, Lower middle, Middle, Upper middle).
Factor. Region of residence (6 regions).
Factor. Employment status (Employed, Student, Unemployed, Inactive).
Ordered factor. Household income group (Low, Lower middle, Upper middle, High). Contains missing values.
Numeric. Monthly household income in CHF (1000–7400).
Factor. Current smoker (No, Yes). Contains missing values.
Factor. Regular physical activity (No, Yes).
Factor. Dentist visit in the last 12 months (No, Yes).
Ordered factor. Self-rated health (Poor, Fair, Good, Very good). Contains missing values.
Numeric. WHO-5 wellbeing index (0–100).
Numeric. Body mass index in kg/m
(16–39). Contains missing values.
Ordered factor. BMI category (Normal weight, Overweight, Obesity). Contains missing values.
Ordered factor. Trust in institutions (Very low, Low, High, Very high).
Numeric. Political position on a 0 (left) to 10 (right) scale. Contains missing values.
Integer. Satisfaction with own health (1–5 Likert scale). Contains missing values.
Integer. Satisfaction with work or main activity (1–5 Likert scale). Contains missing values.
Integer. Satisfaction with personal relationships (1–5 Likert scale). Contains missing values.
Integer. Satisfaction with standard of living (1–5 Likert scale). Contains missing values.
POSIXct. Date and time of survey response (September–November 2024).
Numeric. Survey design weight (range
0.29–3.45); calibrated so that sum(weight) matches the
unweighted N and mean(weight) is approximately 1. See
Details.
Every variable carries a "label" attribute (read by
labelled::var_label() and surfaced by varlist() /
code_book()). The mix of factor types is deliberate: nominal
factors (sex, region, ...) and ordered factors (education,
self_rated_health, ...) live side by side so that
cross_tab() and table_categorical() can demonstrate the
automatic ordinal-vs-nominal dispatch (Cramer's V, Phi, Kendall's
Tau-b, Goodman-Kruskal Gamma) on the same dataset.
Survey weights (weight) are calibrated: sum(weight) matches
the unweighted N to within rounding () and
mean(weight) is . Weighted means therefore agree
with unweighted means up to sampling noise without further
rescaling.
Simulated data for illustration purposes; reproducible by
sourcing data-raw/sochealth.R. The script seeds the main
generation block with set.seed(2025), and the two
missing-value injection blocks with set.seed(2027) (the
four life_sat_* items) and set.seed(2026) (smoking,
self_rated_health, income_group, political_position,
bmi).
data(sochealth) varlist(sochealth) freq(sochealth, education) cross_tab(sochealth, education, self_rated_health)data(sochealth) varlist(sochealth) freq(sochealth, education) cross_tab(sochealth, education, self_rated_health)
somers_d() computes Somers' D for a two-way contingency
table of ordinal variables.
somers_d( x, direction = c("row", "column", "symmetric"), detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )somers_d( x, direction = c("row", "column", "symmetric"), detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
direction |
Direction of prediction:
|
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
Somers' D is an asymmetric ordinal measure defined as
, where is the
number of pairs tied on the independent variable. The
symmetric version (direction = "symmetric") is the
harmonic mean of the two asymmetric values, matching the
SPSS / PSPP convention; this is not identical to
Kendall's Tau-b (which is the geometric mean of the same
two quantities), although the two often agree to two
decimals. No analytic SE / CI is reported for the symmetric
form (DescTools follows the same convention).
Standard error formulas for the asymmetric directions follow
the DescTools implementations (Signorell et al., 2024); see
cramer_v() for full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: D = 0 (Wald z-test).
kendall_tau_b(), gamma_gk(), assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
phi(),
uncertainty_coef(),
yule_q()
tab <- table(sochealth$education, sochealth$self_rated_health) somers_d(tab, direction = "row") somers_d(tab, direction = "column", detail = TRUE)tab <- table(sochealth$education, sochealth$self_rated_health) somers_d(tab, direction = "row") somers_d(tab, direction = "column", detail = TRUE)
User-facing helper that prints a spicy-styled ASCII table to the
console with optional title and note, table-type-aware alignment
defaults, and automatic horizontal panelling when the table is
wider than the console. Wraps the internal renderer
build_ascii_table().
spicy_print_table( x, title = attr(x, "title"), note = attr(x, "note"), padding = 2L, first_column_line = TRUE, row_total_line = TRUE, column_total_line = TRUE, bottom_line = FALSE, lines_color = "darkgrey", align_left_cols = NULL, align_center_cols = integer(0), center_headers = FALSE, spanners = NULL, group_sep_rows = integer(0), total_row_idx = attr(x, "total_row_idx"), display_labels = NULL, ... )spicy_print_table( x, title = attr(x, "title"), note = attr(x, "note"), padding = 2L, first_column_line = TRUE, row_total_line = TRUE, column_total_line = TRUE, bottom_line = FALSE, lines_color = "darkgrey", align_left_cols = NULL, align_center_cols = integer(0), center_headers = FALSE, spanners = NULL, group_sep_rows = integer(0), total_row_idx = attr(x, "total_row_idx"), display_labels = NULL, ... )
x |
A |
title |
Optional title displayed above the table. Defaults to the
|
note |
Optional note displayed below the table. Defaults to the |
padding |
Non-negative integer giving the number of extra
characters added to each column's auto-computed width
(max of cell-content width and header width). Defaults to
|
first_column_line |
Logical. If |
row_total_line, column_total_line, bottom_line
|
Logical flags controlling
the presence of horizontal lines before total rows/columns or at the bottom
of the table.
Both |
lines_color |
Character. Color for table separators. Defaults to |
align_left_cols |
Integer vector of column indices to left-align.
If
|
align_center_cols |
Integer vector of column indices to
center-align. Defaults to |
center_headers |
Logical. When |
spanners |
Optional named list of column-group labels
(label -> integer column indices). Passed through to
|
group_sep_rows |
Integer vector of row indices before which a
light dashed separator line is drawn. Defaults to |
total_row_idx |
Optional integer vector of 1-based row indices
identifying the totals rows; defaults to the |
display_labels |
Optional character vector of length |
... |
Additional arguments passed to |
Table type is auto-detected from x and drives the default
alignment when align_left_cols = NULL:
frequency table (a Category column is present): the
first two columns (Category, Values) are left-aligned.
cross table (otherwise): only the first column (row variable) is left-aligned.
If the table is wider than the console, it is split into stacked
horizontal panels with the left-most identifier columns repeated
on each panel. Unicode line-drawing characters are used by
default; coloured separators are drawn when the terminal supports
ANSI colour (crayon::has_color()) and fall back to monochrome
otherwise.
Invisibly returns x, after printing the formatted ASCII table to the console.
build_ascii_table() for the underlying text rendering engine.
print.spicy_freq_table() for the specialized printing method used by freq().
# Simple demonstration df <- data.frame( Category = c("Valid", "", "Missing", "Total"), Values = c("Yes", "No", "NA", ""), Freq. = c(12, 8, 1, 21), Percent = c(57.1, 38.1, 4.8, 100.0) ) spicy_print_table(df, title = "Frequency table: Example", note = "Class: data.frame\nData: demo" )# Simple demonstration df <- data.frame( Category = c("Valid", "", "Missing", "Total"), Values = c("Yes", "No", "NA", ""), Freq. = c(12, 8, 1, 21), Percent = c(57.1, 38.1, 4.8, 100.0) ) spicy_print_table(df, title = "Frequency table: Example", note = "Class: data.frame\nData: demo" )
Computes row-wise sums across selected numeric columns of a
data.frame or matrix. Missing values are handled per row via
min_valid (an integer count or proportion of non-NA values
required); rows that fail the rule return NA. Non-numeric
columns are dropped silently (set verbose = TRUE to see which).
Designed to flow inside dplyr::mutate(): when called without
an explicit data argument, the current data context is used.
sum_n( data = NULL, select = tidyselect::everything(), exclude = NULL, min_valid = NULL, digits = NULL, regex = FALSE, verbose = FALSE )sum_n( data = NULL, select = tidyselect::everything(), exclude = NULL, min_valid = NULL, digits = NULL, regex = FALSE, verbose = FALSE )
data |
A |
select |
Columns to include. If |
exclude |
Columns to exclude (default: |
min_valid |
Minimum number of valid (non-
Non-integer values |
digits |
Optional non-negative integer giving the number of
decimal places to round the result to. Defaults to |
regex |
Logical. If |
verbose |
Logical. If |
A numeric vector of row-wise sums.
Other row-wise summaries:
count_n(),
mean_n()
library(dplyr) # Create a simple numeric data frame df <- tibble( var1 = c(10, NA, 30, 40, 50), var2 = c(5, NA, 15, NA, 25), var3 = c(NA, 30, 20, 50, 10) ) # Compute row-wise sums (all values must be valid by default) sum_n(df) # Require at least 2 valid (non-NA) values per row sum_n(df, min_valid = 2) # Require at least 50% valid (non-NA) values per row sum_n(df, min_valid = 0.5) # Round the results to 1 decimal sum_n(df, digits = 1) # Select specific columns sum_n(df, select = c(var1, var2)) # Select specific columns using a pipe df |> select(var1, var2) |> sum_n() # Exclude a column sum_n(df, exclude = "var3") # Select columns ending with "1" sum_n(df, select = ends_with("1")) # Use with native pipe df |> sum_n(select = starts_with("var")) # Use inside dplyr::mutate() df |> mutate(sum_score = sum_n(min_valid = 2)) # Select columns directly inside mutate() df |> mutate(sum_score = sum_n(select = c(var1, var2), min_valid = 1)) # Select columns before mutate df |> select(var1, var2) |> mutate(sum_score = sum_n(min_valid = 1)) # Show verbose message df |> mutate(sum_score = sum_n(min_valid = 2, digits = 1, verbose = TRUE)) # Add character and grouping columns df_mixed <- mutate(df, name = letters[1:5], group = c("A", "A", "B", "B", "A") ) df_mixed # Non-numeric columns are ignored sum_n(df_mixed) # Use inside mutate with mixed data df_mixed |> mutate(sum_score = sum_n(select = starts_with("var"))) # Use everything(), but exclude known non-numeric sum_n(df_mixed, select = everything(), exclude = "group") # Select columns using regex sum_n(df_mixed, select = "^var", regex = TRUE) sum_n(df_mixed, select = "ar", regex = TRUE) # Apply to a subset of rows df_mixed[1:3, ] |> sum_n(select = starts_with("var")) # Store the result in a new column df_mixed$sum_score <- sum_n(df_mixed, select = starts_with("var")) df_mixed # With a numeric matrix mat <- matrix(c(1, 2, NA, 4, 5, NA, 7, 8, 9), nrow = 3, byrow = TRUE) mat mat |> sum_n(min_valid = 2)library(dplyr) # Create a simple numeric data frame df <- tibble( var1 = c(10, NA, 30, 40, 50), var2 = c(5, NA, 15, NA, 25), var3 = c(NA, 30, 20, 50, 10) ) # Compute row-wise sums (all values must be valid by default) sum_n(df) # Require at least 2 valid (non-NA) values per row sum_n(df, min_valid = 2) # Require at least 50% valid (non-NA) values per row sum_n(df, min_valid = 0.5) # Round the results to 1 decimal sum_n(df, digits = 1) # Select specific columns sum_n(df, select = c(var1, var2)) # Select specific columns using a pipe df |> select(var1, var2) |> sum_n() # Exclude a column sum_n(df, exclude = "var3") # Select columns ending with "1" sum_n(df, select = ends_with("1")) # Use with native pipe df |> sum_n(select = starts_with("var")) # Use inside dplyr::mutate() df |> mutate(sum_score = sum_n(min_valid = 2)) # Select columns directly inside mutate() df |> mutate(sum_score = sum_n(select = c(var1, var2), min_valid = 1)) # Select columns before mutate df |> select(var1, var2) |> mutate(sum_score = sum_n(min_valid = 1)) # Show verbose message df |> mutate(sum_score = sum_n(min_valid = 2, digits = 1, verbose = TRUE)) # Add character and grouping columns df_mixed <- mutate(df, name = letters[1:5], group = c("A", "A", "B", "B", "A") ) df_mixed # Non-numeric columns are ignored sum_n(df_mixed) # Use inside mutate with mixed data df_mixed |> mutate(sum_score = sum_n(select = starts_with("var"))) # Use everything(), but exclude known non-numeric sum_n(df_mixed, select = everything(), exclude = "group") # Select columns using regex sum_n(df_mixed, select = "^var", regex = TRUE) sum_n(df_mixed, select = "ar", regex = TRUE) # Apply to a subset of rows df_mixed[1:3, ] |> sum_n(select = starts_with("var")) # Store the result in a new column df_mixed$sum_score <- sum_n(df_mixed, select = starts_with("var")) df_mixed # With a numeric matrix mat <- matrix(c(1, 2, NA, 4, 5, NA, 7, 8, 9), nrow = 3, byrow = TRUE) mat mat |> sum_n(min_valid = 2)
Builds a publication-ready frequency or cross-tabulation table for one or many categorical variables selected with tidyselect syntax.
With by, produces grouped cross-tabulation summaries (using
cross_tab() internally) with Chi-squared p-values and optional
association measures.
Without by, produces one-way frequency-style summaries.
Multiple output formats are available via output: a printed ASCII
table ("default"), a wide or long numeric data.frame
("data.frame", "long"), or publication-ready tables
("tinytable", "gt", "flextable", "excel", "clipboard",
"word").
table_categorical( data, select, by = NULL, labels = NULL, levels_keep = NULL, include_total = TRUE, drop_na = TRUE, weights = NULL, rescale = FALSE, correct = FALSE, simulate_p = FALSE, simulate_B = 2000, percent_digits = 1, p_digits = 3, v_digits = 2, assoc_measure = "auto", assoc_ci = FALSE, decimal_mark = ".", align = c("decimal", "center", "right"), output = c("default", "data.frame", "long", "tinytable", "gt", "flextable", "excel", "clipboard", "word"), indent_text = " ", indent_text_excel_clipboard = strrep(" ", 6), add_multilevel_header = TRUE, blank_na_wide = FALSE, excel_path = NULL, excel_sheet = "Categorical", clipboard_delim = "\t", word_path = NULL )table_categorical( data, select, by = NULL, labels = NULL, levels_keep = NULL, include_total = TRUE, drop_na = TRUE, weights = NULL, rescale = FALSE, correct = FALSE, simulate_p = FALSE, simulate_B = 2000, percent_digits = 1, p_digits = 3, v_digits = 2, assoc_measure = "auto", assoc_ci = FALSE, decimal_mark = ".", align = c("decimal", "center", "right"), output = c("default", "data.frame", "long", "tinytable", "gt", "flextable", "excel", "clipboard", "word"), indent_text = " ", indent_text_excel_clipboard = strrep(" ", 6), add_multilevel_header = TRUE, blank_na_wide = FALSE, excel_path = NULL, excel_sheet = "Categorical", clipboard_delim = "\t", word_path = NULL )
data |
A data frame. |
select |
Columns to include as row variables. Supports tidyselect syntax and character vectors of column names. |
by |
Optional grouping column used for columns/groups. Accepts an unquoted column name or a single character column name. |
labels |
Optional display labels for the variables. Two
forms are accepted (matching
When |
levels_keep |
Optional character vector of levels to keep/order for row
modalities. If |
include_total |
Logical. If |
drop_na |
Logical. If |
weights |
Optional weights. Either |
rescale |
Logical. If |
correct |
Logical. If |
simulate_p |
Logical. If |
simulate_B |
Integer. Number of Monte Carlo replicates when
|
percent_digits |
Number of digits for percentages in report outputs.
Defaults to |
p_digits |
Integer >= 1. Number of decimal places used to
render p-values in the |
v_digits |
Number of digits for the association measure. Defaults
to |
assoc_measure |
Which association measure to report alongside the chi-squared p-value. Accepts four input shapes:
When a single measure is used for every row, the column header is
that measure's name (e.g.
|
assoc_ci |
Passed to |
decimal_mark |
Decimal separator ( |
align |
Horizontal alignment of numeric columns in the
printed ASCII table and in the
The |
output |
Output format. One of:
|
indent_text |
Prefix used for modality labels in report table building.
Defaults to |
indent_text_excel_clipboard |
Stronger indentation used in Excel and clipboard exports. Defaults to six non-breaking spaces. |
add_multilevel_header |
Logical. If |
blank_na_wide |
Logical. If |
excel_path |
Path for |
excel_sheet |
Sheet name for Excel export. Defaults to |
clipboard_delim |
Delimiter for clipboard text export. Defaults to |
word_path |
Path for |
Depends on output:
"default": prints a styled ASCII table and returns the
underlying data.frame invisibly (S3 class
"spicy_categorical_table").
"data.frame": a wide data.frame with one row per
variable–level combination.
When by is used, the columns are Variable, Level, and one
pair of n / \% columns per group level (plus Total when
include_total = TRUE), followed by Chi2, df, p, and the
association measure column.
When by = NULL, the columns are Variable, Level, n, \%.
"long": a long data.frame with columns variable,
level, group, n, percent (and chi2, df, p,
association measure columns when by is used).
"tinytable": a tinytable object.
"gt": a gt_tbl object.
"flextable": a flextable object.
"excel" / "word": writes to disk and returns the file
path invisibly.
"clipboard": copies the table and returns the display
data.frame invisibly.
When by is used, each selected variable is cross-tabulated
against the grouping variable with cross_tab() and the omnibus
chi-squared p-value is reported in the p column. See
@param correct / simulate_p to switch on Yates' continuity
correction or Monte Carlo p-values, and @param assoc_measure
for the per-row dispatch table used by "auto" (2x2 -> Phi,
both ordered -> Kendall's Tau-b, otherwise Cramer's V). Without
by, the table reports the marginal frequency distribution of
each variable with no inferential statistics.
For model-based comparisons (cluster-robust SE, weighted contrasts,
fitted means) on continuous outcomes, see table_continuous_lm().
For descriptive (empirical) comparisons on continuous outcomes, see
table_continuous().
Decimal alignment, p-value formatting, and required suggested
packages per output engine are documented under @param align,
@param p_digits, and @param output respectively.
table_continuous() for empirical comparisons on
continuous outcomes; table_continuous_lm() for the model-based
companion (heteroskedasticity-consistent / cluster-robust /
bootstrap / jackknife SE, fitted means, weighted contrasts);
cross_tab() for two-way cross-tabulations; freq() for
one-way frequency tables.
Other spicy tables:
table_continuous(),
table_continuous_lm()
# --- Basic usage --------------------------------------------------------- # Default: ASCII console table grouped by sex. table_categorical( sochealth, select = c(smoking, physical_activity), by = sex ) # One-way frequency-style table (no `by`). table_categorical( sochealth, select = c(smoking, physical_activity) ) # Pretty labels keyed by column name. table_categorical( sochealth, select = c(smoking, physical_activity), by = education, labels = c( smoking = "Current smoker", physical_activity = "Physical activity" ) ) # Survey weights with rescaling. table_categorical( sochealth, select = c(smoking, physical_activity), by = education, weights = "weight", rescale = TRUE ) # Confidence interval for the association measure. table_categorical( sochealth, select = smoking, by = education, assoc_ci = TRUE ) # --- Per-variable association measure ---------------------------------- # Default (`assoc_measure = "auto"`): one measure per row variable based on # the variable type (2x2 -> Phi, both ordered factors -> Kendall's Tau-b, # otherwise Cramer's V). When the chosen measures differ across rows, the # column header collapses to `"Effect size"` and an APA-style `Note.` line # documents which measure was used for which variable. table_categorical( sochealth, select = c(smoking, education), by = sex ) # Force a uniform measure across all row variables. table_categorical( sochealth, select = c(smoking, education), by = sex, assoc_measure = "cramer_v" ) # Per-variable override (recommended named form). table_categorical( sochealth, select = c(smoking, education, self_rated_health), by = sex, assoc_measure = c( smoking = "phi", # binary x binary education = "cramer_v", # multi-category nominal self_rated_health = "tau_b" # ordinal x binary, Tau-b ) ) # --- Output formats ----------------------------------------------------- # The rendered outputs below all wrap the same call: # table_categorical(sochealth, # select = c(smoking, physical_activity), # by = sex) # only `output` changes. Assign each result to a variable -- some # engines auto-print as a console-friendly text fallback inside # the `?` help viewer. # Wide data.frame (one row per modality). table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "data.frame" ) # Long data.frame (one row per (modality x group)). table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "long" ) # Rendered HTML / docx objects -- best viewed inside a # Quarto / R Markdown document or a pkgdown article. if (requireNamespace("tinytable", quietly = TRUE)) { tt <- table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "tinytable" ) } if (requireNamespace("gt", quietly = TRUE)) { tbl <- table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "gt" ) } if (requireNamespace("flextable", quietly = TRUE)) { ft <- table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "flextable" ) } # Excel and Word: write to a temporary file. if (requireNamespace("openxlsx2", quietly = TRUE)) { tmp <- tempfile(fileext = ".xlsx") table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "excel", excel_path = tmp ) unlink(tmp) } if ( requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE) ) { tmp <- tempfile(fileext = ".docx") table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "word", word_path = tmp ) unlink(tmp) } ## Not run: # Clipboard: writes to the system clipboard. table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "clipboard" ) ## End(Not run)# --- Basic usage --------------------------------------------------------- # Default: ASCII console table grouped by sex. table_categorical( sochealth, select = c(smoking, physical_activity), by = sex ) # One-way frequency-style table (no `by`). table_categorical( sochealth, select = c(smoking, physical_activity) ) # Pretty labels keyed by column name. table_categorical( sochealth, select = c(smoking, physical_activity), by = education, labels = c( smoking = "Current smoker", physical_activity = "Physical activity" ) ) # Survey weights with rescaling. table_categorical( sochealth, select = c(smoking, physical_activity), by = education, weights = "weight", rescale = TRUE ) # Confidence interval for the association measure. table_categorical( sochealth, select = smoking, by = education, assoc_ci = TRUE ) # --- Per-variable association measure ---------------------------------- # Default (`assoc_measure = "auto"`): one measure per row variable based on # the variable type (2x2 -> Phi, both ordered factors -> Kendall's Tau-b, # otherwise Cramer's V). When the chosen measures differ across rows, the # column header collapses to `"Effect size"` and an APA-style `Note.` line # documents which measure was used for which variable. table_categorical( sochealth, select = c(smoking, education), by = sex ) # Force a uniform measure across all row variables. table_categorical( sochealth, select = c(smoking, education), by = sex, assoc_measure = "cramer_v" ) # Per-variable override (recommended named form). table_categorical( sochealth, select = c(smoking, education, self_rated_health), by = sex, assoc_measure = c( smoking = "phi", # binary x binary education = "cramer_v", # multi-category nominal self_rated_health = "tau_b" # ordinal x binary, Tau-b ) ) # --- Output formats ----------------------------------------------------- # The rendered outputs below all wrap the same call: # table_categorical(sochealth, # select = c(smoking, physical_activity), # by = sex) # only `output` changes. Assign each result to a variable -- some # engines auto-print as a console-friendly text fallback inside # the `?` help viewer. # Wide data.frame (one row per modality). table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "data.frame" ) # Long data.frame (one row per (modality x group)). table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "long" ) # Rendered HTML / docx objects -- best viewed inside a # Quarto / R Markdown document or a pkgdown article. if (requireNamespace("tinytable", quietly = TRUE)) { tt <- table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "tinytable" ) } if (requireNamespace("gt", quietly = TRUE)) { tbl <- table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "gt" ) } if (requireNamespace("flextable", quietly = TRUE)) { ft <- table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "flextable" ) } # Excel and Word: write to a temporary file. if (requireNamespace("openxlsx2", quietly = TRUE)) { tmp <- tempfile(fileext = ".xlsx") table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "excel", excel_path = tmp ) unlink(tmp) } if ( requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE) ) { tmp <- tempfile(fileext = ".docx") table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "word", word_path = tmp ) unlink(tmp) } ## Not run: # Clipboard: writes to the system clipboard. table_categorical( sochealth, select = c(smoking, physical_activity), by = sex, output = "clipboard" ) ## End(Not run)
Computes descriptive statistics (mean, SD, min, max, confidence interval of the mean, n) for one or many continuous variables selected with tidyselect syntax.
With by, produces grouped summaries and reports a group-comparison
p-value by default (Welch test; change via test). Additional
inferential output is opt-in: test statistics (statistic) and
effect sizes (effect_size / effect_size_ci). Set p_value = FALSE
to suppress the p-value column. Without by, produces one-way
descriptive summaries.
Multiple output formats are available via output: a printed ASCII
table ("default"), a plain data.frame ("data.frame" or
"long" – synonyms for the underlying long-format data, see
Details), or publication-ready tables ("tinytable", "gt",
"flextable", "excel", "clipboard", "word").
This is the descriptive companion to table_continuous_lm(). The
two functions share their layout, alignment, and reporting precision
so descriptive and model-based analyses of the same data look
uniform side by side. Use table_continuous_lm() when you need
robust SE, weighted contrasts, fitted means, or covariate
adjustment.
table_continuous( data, select = tidyselect::everything(), by = NULL, exclude = NULL, regex = FALSE, test = c("welch", "student", "nonparametric"), p_value = NULL, statistic = FALSE, show_n = TRUE, effect_size = c("none", "auto", "hedges_g", "eta_sq", "r_rb", "epsilon_sq"), effect_size_ci = FALSE, ci = TRUE, labels = NULL, ci_level = 0.95, digits = 2, effect_size_digits = 2, p_digits = 3, decimal_mark = ".", align = c("decimal", "center", "right"), output = c("default", "data.frame", "long", "tinytable", "gt", "flextable", "excel", "clipboard", "word"), excel_path = NULL, excel_sheet = "Descriptives", clipboard_delim = "\t", word_path = NULL, verbose = FALSE )table_continuous( data, select = tidyselect::everything(), by = NULL, exclude = NULL, regex = FALSE, test = c("welch", "student", "nonparametric"), p_value = NULL, statistic = FALSE, show_n = TRUE, effect_size = c("none", "auto", "hedges_g", "eta_sq", "r_rb", "epsilon_sq"), effect_size_ci = FALSE, ci = TRUE, labels = NULL, ci_level = 0.95, digits = 2, effect_size_digits = 2, p_digits = 3, decimal_mark = ".", align = c("decimal", "center", "right"), output = c("default", "data.frame", "long", "tinytable", "gt", "flextable", "excel", "clipboard", "word"), excel_path = NULL, excel_sheet = "Descriptives", clipboard_delim = "\t", word_path = NULL, verbose = FALSE )
data |
A |
select |
Columns to include. If |
by |
Optional grouping column. Accepts an unquoted column name or a single character column name. Coerced to factor for grouping; non-numeric grouping columns (factor, character, logical) are supported as-is. |
exclude |
Columns to exclude. Supports tidyselect syntax and character vectors of column names. |
regex |
Logical. If |
test |
Character. Statistical test to use when comparing groups.
One of
Used whenever |
p_value |
Logical or |
statistic |
Logical. If |
show_n |
Logical. If |
effect_size |
Effect-size measure to include in the rendered outputs. One of:
For backward compatibility, |
effect_size_ci |
Logical. If |
ci |
Logical. If |
labels |
An optional named character vector of variable labels.
Names must match column names in |
ci_level |
Confidence level for the mean confidence interval
(default: |
digits |
Number of decimal places for descriptive values and test
statistics (default: |
effect_size_digits |
Number of decimal places for effect-size values
in formatted displays (default: |
p_digits |
Integer >= 1. Number of decimal places used to
render p-values in the |
decimal_mark |
Character used as decimal separator.
Either |
align |
Horizontal alignment of numeric columns in the printed
ASCII table and in the
The |
output |
Output format. One of:
|
excel_path |
File path for |
excel_sheet |
Sheet name for |
clipboard_delim |
Delimiter for |
word_path |
File path for |
verbose |
Logical. If |
Depends on output:
"default": prints a styled ASCII table and returns the
underlying data.frame invisibly (S3 class
"spicy_continuous_table" / "spicy_table"). The object can
be re-coerced via as.data.frame.spicy_continuous_table() or
piped into broom::tidy() / broom::glance().
"data.frame" / "long": a plain data.frame with
columns variable, label, group (when by is used),
mean, sd, min, max, ci_lower, ci_upper, n. When
by is used together with p_value = TRUE, statistic = TRUE,
or effect_size != "none", additional columns are appended
(populated on the first row of each variable block only):
test_type – test identifier (e.g., "welch_t",
"welch_anova", "student_t", "anova", "wilcoxon",
"kruskal").
statistic, df1, df2, p.value – test results.
es_type – effect-size identifier ("hedges_g",
"eta_sq", "r_rb", or "epsilon_sq"), when
effect_size != "none".
es_value, es_ci_lower, es_ci_upper – effect-size
estimate and confidence interval bounds.
The two names "data.frame" and "long" are synonyms (the
descriptive output is naturally already long). Pick whichever
reads better in your code.
"tinytable": a tinytable object.
"gt": a gt_tbl object.
"flextable": a flextable object.
"excel" / "word": writes to disk and returns the file
path invisibly.
"clipboard": copies the table and returns the display
data.frame invisibly.
The omnibus test is computed only when by is supplied and at
least two groups remain after dropping NAs, with every group
contributing at least two observations. Choice of test family is
driven by test (see the @param entry for the full dispatch
and the underlying stats:: functions called).
For model-based contrasts (heteroskedasticity-consistent SE,
cluster-robust SE, weighted contrasts, fitted means, covariate
adjustment), use table_continuous_lm().
See @param effect_size for the dispatch table (canonical
measure for each (test, n_groups) combination) and the
validation rules applied to explicit requests.
Confidence intervals (enabled with effect_size_ci = TRUE) use
noncentral F inversion for , the Hedges-Olkin
normal approximation for g, the Fisher z-transform for r,
and percentile bootstrap (2,000 replicates) for
.
For Cohen's d, Hays' , and Cohen's f
(derived from a fitted, possibly weighted lm()), use the
model-based companion table_continuous_lm().
Decimal alignment, p-value formatting, and required suggested
packages per output engine are documented under @param align,
@param p_digits, and @param output respectively.
Non-numeric columns are silently dropped (set verbose = TRUE to
see which columns were excluded). When a constant column is
passed, SD and CI are shown as "--" in the ASCII table.
table_continuous_lm() for the model-based companion
(heteroskedasticity-consistent SE, cluster-robust SE, weighted
contrasts, fitted means);
table_categorical() for categorical variables;
freq() for one-way frequency tables;
cross_tab() for two-way cross-tabulations.
Other spicy tables:
table_categorical(),
table_continuous_lm()
# --- Basic usage --------------------------------------------------------- # Default: ASCII console table. table_continuous( sochealth, select = c(bmi, wellbeing_score) ) # Grouped by education (Welch p-value added by default). table_continuous( sochealth, select = c(bmi, wellbeing_score), by = education ) # Test statistic alongside the p-value. table_continuous( sochealth, select = c(bmi, wellbeing_score), by = education, statistic = TRUE ) # --- Effect sizes ------------------------------------------------------- # Auto-selected effect size with confidence interval (Hedges' g for # binary `by`, eta-squared for k > 2). table_continuous( sochealth, select = wellbeing_score, by = sex, effect_size = "auto", effect_size_ci = TRUE ) # Explicit effect-size measure. table_continuous( sochealth, select = wellbeing_score, by = education, effect_size = "eta_sq", effect_size_ci = TRUE, effect_size_digits = 3 ) # --- Selection helpers -------------------------------------------------- # Regex selection. table_continuous( sochealth, select = "^life_sat", regex = TRUE ) # Pretty labels keyed by column name. table_continuous( sochealth, select = c(bmi, life_sat_health), labels = c( bmi = "Body mass index", life_sat_health = "Satisfaction with health" ) ) # --- Output formats ----------------------------------------------------- # The rendered outputs below all wrap the same call: # table_continuous(sochealth, # select = c(bmi, wellbeing_score), # by = sex) # only `output` changes. Assign each result to a variable -- some # engines auto-print as a console-friendly text fallback inside # the `?` help viewer. # Wide / long data.frame (synonyms): one row per (variable x group). table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "data.frame" ) # Rendered HTML / docx objects -- best viewed inside a # Quarto / R Markdown document or a pkgdown article. if (requireNamespace("tinytable", quietly = TRUE)) { tt <- table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "tinytable" ) } if (requireNamespace("gt", quietly = TRUE)) { tbl <- table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "gt" ) } if (requireNamespace("flextable", quietly = TRUE)) { ft <- table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "flextable" ) } # Excel and Word: write to a temporary file. if (requireNamespace("openxlsx2", quietly = TRUE)) { tmp <- tempfile(fileext = ".xlsx") table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "excel", excel_path = tmp ) unlink(tmp) } if ( requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE) ) { tmp <- tempfile(fileext = ".docx") table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "word", word_path = tmp ) unlink(tmp) } ## Not run: # Clipboard: writes to the system clipboard. table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "clipboard" ) ## End(Not run)# --- Basic usage --------------------------------------------------------- # Default: ASCII console table. table_continuous( sochealth, select = c(bmi, wellbeing_score) ) # Grouped by education (Welch p-value added by default). table_continuous( sochealth, select = c(bmi, wellbeing_score), by = education ) # Test statistic alongside the p-value. table_continuous( sochealth, select = c(bmi, wellbeing_score), by = education, statistic = TRUE ) # --- Effect sizes ------------------------------------------------------- # Auto-selected effect size with confidence interval (Hedges' g for # binary `by`, eta-squared for k > 2). table_continuous( sochealth, select = wellbeing_score, by = sex, effect_size = "auto", effect_size_ci = TRUE ) # Explicit effect-size measure. table_continuous( sochealth, select = wellbeing_score, by = education, effect_size = "eta_sq", effect_size_ci = TRUE, effect_size_digits = 3 ) # --- Selection helpers -------------------------------------------------- # Regex selection. table_continuous( sochealth, select = "^life_sat", regex = TRUE ) # Pretty labels keyed by column name. table_continuous( sochealth, select = c(bmi, life_sat_health), labels = c( bmi = "Body mass index", life_sat_health = "Satisfaction with health" ) ) # --- Output formats ----------------------------------------------------- # The rendered outputs below all wrap the same call: # table_continuous(sochealth, # select = c(bmi, wellbeing_score), # by = sex) # only `output` changes. Assign each result to a variable -- some # engines auto-print as a console-friendly text fallback inside # the `?` help viewer. # Wide / long data.frame (synonyms): one row per (variable x group). table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "data.frame" ) # Rendered HTML / docx objects -- best viewed inside a # Quarto / R Markdown document or a pkgdown article. if (requireNamespace("tinytable", quietly = TRUE)) { tt <- table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "tinytable" ) } if (requireNamespace("gt", quietly = TRUE)) { tbl <- table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "gt" ) } if (requireNamespace("flextable", quietly = TRUE)) { ft <- table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "flextable" ) } # Excel and Word: write to a temporary file. if (requireNamespace("openxlsx2", quietly = TRUE)) { tmp <- tempfile(fileext = ".xlsx") table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "excel", excel_path = tmp ) unlink(tmp) } if ( requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE) ) { tmp <- tempfile(fileext = ".docx") table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "word", word_path = tmp ) unlink(tmp) } ## Not run: # Clipboard: writes to the system clipboard. table_continuous( sochealth, select = c(bmi, wellbeing_score), by = sex, output = "clipboard" ) ## End(Not run)
Builds APA-style summary tables from a series of linear models for one or many continuous outcomes selected with tidyselect syntax.
A single focal predictor is supplied with by; each selected numeric
outcome is fit as lm(outcome ~ by, ...), optionally extended with
additive covariates via covariates and case weights via weights.
Categorical by produces model-based estimated marginal means by
level (covariate-adjusted via adjustment when covariates are
present), plus an optional single difference for dichotomous
predictors. Numeric by produces the slope and its confidence
interval.
Inference adapts via vcov: classical OLS, "HC0"-"HC5"
(heteroscedasticity-consistent), "CR0"-"CR3" (cluster-robust,
requires cluster), or "bootstrap" / "jackknife" resampling.
Effect sizes (Cohen's "d", Hedges' "g", Hays' "omega2",
Cohen's "f2") are reported with optional noncentral t / F
confidence intervals via effect_size_ci, and adapt under
covariate adjustment (see effect_size).
Multiple output formats are available via output: a printed ASCII table
("default"), a plain wide data.frame ("data.frame"), a raw long
data.frame ("long"), or rendered outputs ("tinytable", "gt",
"flextable", "excel", "clipboard", "word").
table_continuous_lm( data, select = tidyselect::everything(), by, covariates = NULL, adjustment = c("proportional", "balanced"), exclude = NULL, regex = FALSE, weights = NULL, vcov = c("classical", "HC0", "HC1", "HC2", "HC3", "HC4", "HC4m", "HC5", "CR0", "CR1", "CR2", "CR3", "bootstrap", "jackknife"), cluster = NULL, boot_n = 1000, contrast = c("auto", "none"), statistic = FALSE, p_value = TRUE, show_n = TRUE, show_weighted_n = FALSE, effect_size = c("none", "f2", "d", "g", "omega2"), effect_size_ci = FALSE, r2 = c("r2", "adj_r2", "none"), ci = TRUE, labels = NULL, ci_level = 0.95, digits = 2, fit_digits = 2, effect_size_digits = 2, p_digits = 3, decimal_mark = ".", align = c("decimal", "center", "right"), output = c("default", "data.frame", "long", "tinytable", "gt", "flextable", "excel", "clipboard", "word"), excel_path = NULL, excel_sheet = "Linear models", clipboard_delim = "\t", word_path = NULL, verbose = FALSE )table_continuous_lm( data, select = tidyselect::everything(), by, covariates = NULL, adjustment = c("proportional", "balanced"), exclude = NULL, regex = FALSE, weights = NULL, vcov = c("classical", "HC0", "HC1", "HC2", "HC3", "HC4", "HC4m", "HC5", "CR0", "CR1", "CR2", "CR3", "bootstrap", "jackknife"), cluster = NULL, boot_n = 1000, contrast = c("auto", "none"), statistic = FALSE, p_value = TRUE, show_n = TRUE, show_weighted_n = FALSE, effect_size = c("none", "f2", "d", "g", "omega2"), effect_size_ci = FALSE, r2 = c("r2", "adj_r2", "none"), ci = TRUE, labels = NULL, ci_level = 0.95, digits = 2, fit_digits = 2, effect_size_digits = 2, p_digits = 3, decimal_mark = ".", align = c("decimal", "center", "right"), output = c("default", "data.frame", "long", "tinytable", "gt", "flextable", "excel", "clipboard", "word"), excel_path = NULL, excel_sheet = "Linear models", clipboard_delim = "\t", word_path = NULL, verbose = FALSE )
data |
A |
select |
Outcome columns to include. If |
by |
A single predictor column. Accepts an unquoted column name or a single character column name. The predictor can be:
Rows with |
covariates |
Optional additive covariates to adjust each per-outcome
linear model for. Accepts a tidyselect expression (e.g.
When non-empty, each model is fitted as v1 supports additive covariates only. Formula syntax with
interactions or transforms ( Rows with |
adjustment |
How the covariate-adjusted estimated marginal
means (the
Both methods reduce to the same linear-contrast formula
|
exclude |
Columns to exclude from |
regex |
Logical. If |
weights |
Optional case weights. Accepts:
Validation: weights must be finite, non-negative, and contain at least
one positive value (otherwise the function errors). Rows with |
vcov |
Variance estimator used for standard errors, confidence intervals, and Wald test statistics. One of:
The |
cluster |
Cluster identifier for cluster-aware variance
estimators. Required when
Rows with |
boot_n |
Integer. Number of bootstrap replicates used when
|
contrast |
Contrast display for categorical predictors. One of:
|
statistic |
Logical. If |
p_value |
Logical. If |
show_n |
Logical. If |
show_weighted_n |
Logical. If |
effect_size |
Character. Effect-size column to include in the wide and rendered outputs. One of:
When Under covariate adjustment (
|
effect_size_ci |
Logical. If |
r2 |
Character. Fit statistic to include in the wide and rendered outputs. One of:
When |
ci |
Logical. If |
labels |
An optional named character vector of outcome labels. Names
must match column names in |
ci_level |
Confidence level for coefficient and model-based mean
intervals (default: |
digits |
Number of decimal places for descriptive values, regression
coefficients, and test statistics (default: |
fit_digits |
Number of decimal places for model-fit columns ( |
effect_size_digits |
Number of decimal places for the effect-size
column ( |
p_digits |
Integer >= 1. Number of decimal places used to render
p-values in the |
decimal_mark |
Character used as decimal separator. Either |
align |
Horizontal alignment of numeric columns in the printed
ASCII table and in the
The |
output |
Output format. One of:
|
excel_path |
File path for |
excel_sheet |
Sheet name for |
clipboard_delim |
Delimiter for |
word_path |
File path for |
verbose |
Logical. If |
Depends on output:
"default": prints a styled ASCII table and invisibly returns
the underlying long data.frame with class
"spicy_continuous_lm_table" / "spicy_table".
"data.frame": a plain wide data.frame with one row per
outcome and numeric columns for means (categorical by) or slope
(numeric by), optional contrast and CI, optional test statistic,
p, fit statistic (\eqn{R^2}{R^2} or adjusted \eqn{R^2}{R^2}), effect size, optional
effect_size_ci_lower / effect_size_ci_upper (when
effect_size_ci = TRUE), n, and Weighted n.
"long": a raw data.frame with one block per outcome and 28
columns covering identification (variable, label,
predictor_type, predictor_label, level, reference),
fitted means and their CI (emmean, emmean_se, emmean_ci_lower,
emmean_ci_upper), contrast or slope estimates and CI
(estimate_type, estimate, estimate_se, estimate_ci_lower,
estimate_ci_upper), inferential output (test_type, statistic,
df1, df2, p.value), effect size with its CI (es_type,
es_value, es_ci_lower, es_ci_upper), fit (r2, adj_r2),
and sample size (n, weighted_n).
"tinytable": a tinytable object.
"gt": a gt_tbl object.
"flextable": a flextable object.
"excel" / "word": writes to disk and returns the file path.
"clipboard": copies the wide table and returns it invisibly.
If no numeric outcome columns remain after applying select, exclude,
and regex, the function emits a warning and returns an empty
data.frame() regardless of output.
table_continuous_lm() is designed for article-style reporting around
a single focal predictor: one model per selected continuous outcome,
fitted as lm(outcome ~ by, ...) and optionally extended with case
weights and additive covariates (lm(outcome ~ by + cov1 + ...)).
For categorical by, the reported means are model-based fitted means
(or covariate-adjusted estimated marginal means; see adjustment) for
each level, and contrasts come from the same fitted linear model. For
an unweighted lm(y ~ factor) with classical variance and no
covariates, the fitted means coincide numerically with empirical
subgroup means; the model-based qualifier matters because (a) under
weights the means become weighted least-squares estimates, (b) their
CIs derive from the model vcov (classical, HC*, CR*,
bootstrap or jackknife), (c) under covariates they become
adjusted marginal means, and (d) tests, p-values and effect sizes
all come from the same fitted model, keeping the table internally
consistent.
Compared with table_continuous(), this function is the model-based
companion: choose it when you want heteroskedasticity-consistent standard
errors (vcov = "HC*"), model fit statistics, or case weights via
lm(..., weights = ...). Because the function exists to report a fitted
model, its inferential output is on by default: p_value = TRUE and
r2 = "r2" are the defaults; set p_value = FALSE or r2 = "none" to
suppress them.
Effect size is selected explicitly via effect_size (defaults to
"none"). All variants are derived from the same fitted model as the
displayed coefficients, \eqn{R^2}{R^2}, and CIs, so the effect size stays
internally consistent with the rest of the table.
"f2": Cohen's \eqn{f^2}{f^2} = \eqn{R^2}{R^2} / (1 - \eqn{R^2}{R^2}) (Cohen 1988). Defined
for any predictor type. For a single-predictor model, \eqn{f^2}{f^2} is a
monotone transform of \eqn{R^2}{R^2} and adds no information beyond it; its
primary use is in a priori power analysis (e.g. G*Power).
"d", "g": standardized mean difference (Cohen's d or Hedges'
g), defined only when by has exactly two non-empty levels.
d = beta_hat / sigma_hat with sigma_hat = summary(fit)$sigma (the
pooled within-group SD for the unweighted two-group case);
g = J * d with J = 1 - 3 / (4 * df_resid - 1) (Hedges and Olkin
1985). The sign matches the displayed Delta (level2 - level1).
For published reports of two-group comparisons, g is the
convention recommended by Hedges and Olkin (1985).
"omega2": Hays' , computed from weighted
sums of squares as (SS_effect - df_effect * MSE) / (SS_total + MSE) and truncated at 0 for small or null effects (Hays
1963; Olejnik and Algina 2003). Less biased than
(which equals R^2 in this single-predictor
design) and recommended for reporting variance explained in
ANOVA-style designs (Olejnik and Algina 2003).
All four effect sizes are point estimates derived from the OLS/WLS fit
and are invariant to vcov: choosing HC* changes the SE, CI, and
test statistic of the contrast but not the standardized magnitude
itself.
Under covariate adjustment (covariates non-empty), "f2" and
"omega2" become the partial / partial of by, derived
from the partial F via stats::drop1() restricted to the focal
term. "d" and "g" raise a spicy_unsupported error: the pooled
standard deviation has no canonical extension under adjustment, so
Cohen's d and Hedges' g are undefined for adjusted models. See
effect_size for the full dispatch.
Confidence intervals for the effect size are available via
effect_size_ci = TRUE and use the modern noncentral-distribution
inversion approach, the consensus standard in commercial statistical
software (Stata esize / estat esize, SAS PROC TTEST and
PROC GLM EFFECTSIZE 14.2+) and in mainstream R packages
(effectsize, MOTE, TOSTER, effsize):
"d", "g": noncentral t inversion (Steiger and Fouladi 1997;
Goulet-Pelletier and Cousineau 2018). Empirical coverage is nominal
across sample sizes (Cousineau and Goulet-Pelletier 2021), unlike the
older Hedges-Olkin normal approximation which is biased for small
samples. For Hedges' g the bounds inherit the J small-sample
correction.
"omega2", "f2": noncentral F inversion (Steiger 2004).
Bounds are converted from the noncentrality parameter using
omega^2 = ncp / (ncp + N) and \eqn{f^2}{f^2} = ncp / N respectively, with
N = df1 + df2 + 1 (total sample size).
For the weighted case, the CI uses raw (unweighted) group counts and
df.residual(fit) = n - p, consistent with the WLS reporting convention
(DuMouchel and Duncan 1983). For propensity-score balance assessment or
complex-survey designs, dedicated packages (cobalt::bal.tab() for the
Austin and Stuart 2015 formulation; survey for design-based effect
sizes) are more appropriate.
When vcov is one of the HC* variants, the standard errors, CIs, and
Wald test statistics use a heteroskedasticity-consistent sandwich
estimator computed via sandwich::vcovHC() (Zeileis 2004), the
canonical R implementation. For a brief guide:
"HC0" is the original White (1980) form; "HC1" adds the
n / (n - p) correction (MacKinnon and White 1985), Stata's
, robust default.
"HC2" and "HC3" use leverage-based residual rescalings
(MacKinnon and White 1985); "HC3" is the sandwich::vcovHC()
default for small to moderate samples (Long and Ervin 2000).
"HC4" adapts the leverage exponent for influential
observations (Cribari-Neto 2004); "HC4m" is a modified-exponent
refinement (Cribari-Neto and da Silva 2011); "HC5" is an
alternative leverage-adaptive variant (Cribari-Neto, Souza and
Vasconcellos 2007).
When observations are not independent (repeated measurements per
individual, students nested in classes, patients in hospitals,
country-year panels), classical and HC* standard errors are biased
downward. Use the CR* variants together with cluster = id_var to
get cluster-robust inference (Liang and Zeger 1986). The
implementation dispatches to clubSandwich::vcovCR() for the
variance and to clubSandwich::coef_test() (single-coefficient,
Satterthwaite t) and clubSandwich::Wald_test() (multi-coefficient
Hotelling-T-squared with Satterthwaite df, "HTZ") for inference.
"CR2" (Bell and McCaffrey 2002; Pustejovsky and Tipton 2018) is the
modern recommended default; it generally produces fractional
Satterthwaite degrees of freedom in df2, which the displayed
t(df) / F(df1, df2) header renders to one decimal. "CR1"
matches Stata's , vce(cluster id). Effect sizes remain invariant
to vcov (including CR*); only the SE, CI, test statistic, and
df2 of the contrast change.
Two resampling-based estimators are also available without adding
any dependency: vcov = "bootstrap" (nonparametric resampling-cases
bootstrap; Davison and Hinkley 1997) and vcov = "jackknife"
(leave-one-out delete-1; Quenouille 1956; MacKinnon and White 1985).
Supplying cluster switches both to their cluster-aware variants
(cluster bootstrap, Cameron, Gelbach and Miller 2008;
leave-one-cluster-out jackknife). The number of bootstrap replicates
is controlled by boot_n (default 1000); replicates that fail to
fit on rank-deficient resamples are dropped, with an explicit warning
if more than half fail and a fallback to the classical OLS variance
below 10 valid replicates. Inference for both estimators is
asymptotic (z for single-coefficient contrasts, chi^2(q) for the
multi-coefficient global Wald test on k > 2 categorical
predictors), reflected in the displayed test header. Use the
bootstrap when the residual distribution is non-standard or the
sample is small; use the jackknife as a closed-form, deterministic
alternative.
\eqn{R^2}{R^2}, adjusted \eqn{R^2}{R^2}, and the effect sizes remain ordinary
least-squares (or weighted least-squares) statistics regardless of
vcov.
When weights is supplied, table_continuous_lm() fits weighted
linear models via lm(..., weights = ...). Means become weighted
least-squares estimates and contrasts and slopes are weighted. The
fit statistics \eqn{R^2}{R^2} and adjusted \eqn{R^2}{R^2}, as well as Hays' omega^2
and Cohen's \eqn{f^2}{f^2}, use the corresponding weighted sums of squares
from the WLS fit. Cohen's d and Hedges' g use the WLS
coefficient and the model's weighted residual standard deviation
(summary(fit)$sigma), which is the standard convention for
case-weighted regression-style reporting (DuMouchel and Duncan
1983); the noncentral t CI for d / g uses the raw (unweighted)
group counts and the residual degrees of freedom of the WLS fit
(n - p). This case-weighted workflow is appropriate for weighted
article tables, but is not a substitute for a full complex-survey
design (see e.g. the survey package), nor for propensity-score
balance assessment under the Austin and Stuart (2015) convention
(see e.g. cobalt::bal.tab()).
The n column always reports the unweighted analytic sample size for
each outcome. When show_weighted_n = TRUE, an additional
Weighted n column reports the sum of case weights in the same
analytic sample.
For dichotomous categorical predictors, the wide outputs report fitted
means in reference-level order and label the contrast column
explicitly as Delta (level2 - level1). For categorical predictors
with more than two levels, no single contrast or contrast CI is shown
in the wide outputs; instead, the table reports level-specific means
plus the overall F test when statistic = TRUE (or F(df1, df2)
when the degrees of freedom are constant across outcomes).
When covariates is non-empty, the printed ASCII table appends an
APA-style footer naming the covariates and the chosen estimand, e.g.
Note. Adjusted for age, education (proportional).
Optional output engines require the corresponding suggested packages:
tinytable for output = "tinytable"
gt for output = "gt"
flextable for output = "flextable"
flextable + officer for output = "word"
openxlsx2 for output = "excel"
clipr for output = "clipboard"
Austin, P. C., & Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Statistics in Medicine, 34(28), 3661–3679. doi:10.1002/sim.6607
Bell, R. M., & McCaffrey, D. F. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology, 28(2), 169–181.
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-based improvements for inference with clustered errors. Review of Economics and Statistics, 90(3), 414–427. doi:10.1162/rest.90.3.414
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Cousineau, D., & Goulet-Pelletier, J.-C. (2021). Expected and empirical coverages of different methods for generating noncentral t confidence intervals for a standardized mean difference. Behavior Research Methods, 53, 2376–2394. doi:10.3758/s13428-021-01550-4
Cribari-Neto, F. (2004). Asymptotic inference under heteroskedasticity of unknown form. Computational Statistics & Data Analysis, 45(2), 215–233. doi:10.1016/S0167-9473(02)00366-3
Cribari-Neto, F., Souza, T. C., & Vasconcellos, K. L. P. (2007). Inference under heteroskedasticity and leveraged data. Communications in Statistics – Theory and Methods, 36(10), 1877–1888. doi:10.1080/03610920601126589
Cribari-Neto, F., & da Silva, W. B. (2011). A new heteroskedasticity-consistent covariance matrix estimator for the linear regression model. AStA Advances in Statistical Analysis, 95(2), 129–146. doi:10.1007/s10182-010-0141-2
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843
DuMouchel, W. H., & Duncan, G. J. (1983). Using sample survey weights in multiple regression analyses of stratified samples. Journal of the American Statistical Association, 78(383), 535–543. doi:10.1080/01621459.1983.10478006
Goulet-Pelletier, J.-C., & Cousineau, D. (2018). A review of effect sizes and their confidence intervals, Part I: The Cohen's d family. The Quantitative Methods for Psychology, 14(4), 242–265. doi:10.20982/tqmp.14.4.p242
Hays, W. L. (1963). Statistics for Psychologists. New York: Holt, Rinehart and Winston.
Hedges, L. V., & Olkin, I. (1985). Statistical Methods for Meta-Analysis. Orlando, FL: Academic Press.
Long, J. S., & Ervin, L. H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. The American Statistician, 54(3), 217–224. doi:10.1080/00031305.2000.10474549
Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22. doi:10.1093/biomet/73.1.13
MacKinnon, J. G., & White, H. (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics, 29(3), 305–325. doi:10.1016/0304-4076(85)90158-7
Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some common research designs. Psychological Methods, 8(4), 434–447. doi:10.1037/1082-989X.8.4.434
Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672–683. doi:10.1080/07350015.2016.1247004
Quenouille, M. H. (1956). Notes on bias in estimation. Biometrika, 43(3/4), 353–360. doi:10.1093/biomet/43.3-4.353
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164–182. doi:10.1037/1082-989X.9.2.164
Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical models. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 221–257). Mahwah, NJ: Lawrence Erlbaum.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838. doi:10.2307/1912934
Zeileis, A. (2004). Econometric computing with HC and HAC covariance matrix estimators. Journal of Statistical Software, 11(10), 1–17. doi:10.18637/jss.v011.i10
table_continuous(), table_categorical().
For broader workflows on the same statistical building blocks:
sandwich::vcovHC() (the canonical R implementation of the HC*
sandwich estimators, used internally for vcov = "HC*");
clubSandwich::vcovCR(), clubSandwich::coef_test() and
clubSandwich::Wald_test() (the canonical R implementation of
cluster-robust variance and Satterthwaite-style inference, used
internally for vcov = "CR*"); effectsize::cohens_d(),
effectsize::hedges_g(), and effectsize::omega_squared()
(alternative effect-size computations and CIs); cobalt::bal.tab()
for propensity-score covariate balance with weighted standardized
mean differences (Austin and Stuart 2015); the
survey package for
design-based inference on complex-survey samples.
Other spicy tables:
table_categorical(),
table_continuous()
# --- Basic usage --------------------------------------------------------- # Default: ASCII table with model-based means, p, and \eqn{R^2}{R^2}. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex ) # --- Effect sizes ------------------------------------------------------- # Cohen's d (binary by required). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, effect_size = "d" ) # Hedges' g with weighted analysis and weighted n column. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, weights = weight, statistic = TRUE, effect_size = "g", show_weighted_n = TRUE ) # Hedges' g with noncentral t confidence interval (bracket notation). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, effect_size = "g", effect_size_ci = TRUE ) # Cohen's \eqn{f^2}{f^2} alongside \eqn{R^2}{R^2} (familiar power-analysis effect size). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, effect_size = "f2" ) # Hays' omega-squared for a 3-level predictor (d / g would error here). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = education, effect_size = "omega2" ) # --- Robust SE for a numeric predictor ---------------------------------- # HC3 standard errors for the slope of a continuous predictor. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = age, vcov = "HC3", ci = FALSE ) # Cluster-robust SE for repeated-measures data: the `sleep` dataset # has 10 subjects measured twice (one observation per group). table_continuous_lm( sleep, select = extra, by = group, cluster = ID, vcov = "CR2" ) # --- Covariate adjustment ---------------------------------------------- # Adjust the comparison of `wellbeing_score` and `bmi` by `sex` for `age` # and `education`. The footer surfaces the adjustment estimand # ("proportional" by default = G-computation, matching Stata `margins`). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, covariates = c(age, education), vcov = "HC3" ) # Same model with the emmeans / SPSS UNIANOVA convention (equal-weight # marginal means on a synthetic covariate grid). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, covariates = c(age, education), adjustment = "balanced", vcov = "HC3" ) # Effect sizes adjust automatically: f2 / omega2 become partial # effect sizes via partial F (drop1) restricted to the focal `by`. # d / g are undefined under adjustment and raise spicy_unsupported. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, covariates = c(age, education), effect_size = "f2", effect_size_ci = TRUE ) # --- Article-style polish ----------------------------------------------- # Pretty outcome labels and adjusted \eqn{R^2}{R^2}. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, labels = c( wellbeing_score = "WHO-5 wellbeing (0-100)", bmi = "Body-mass index (kg/m^2)" ), r2 = "adj_r2" ) # European decimal comma. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, decimal_mark = "," ) # Regex selection of all columns starting with "life_sat". table_continuous_lm( sochealth, select = "^life_sat", by = sex, regex = TRUE ) # --- Output formats ----------------------------------------------------- # The rendered outputs below all wrap the same call: # table_continuous_lm(sochealth, # select = c(wellbeing_score, bmi), # by = sex) # only `output` changes. Assign to a variable to avoid the # console-friendly text fallback that some engines fall back to # when printed directly in `?` help. # Wide data.frame (one row per outcome). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "data.frame" ) # Raw long data.frame (one block per outcome). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "long" ) # Rendered HTML / docx objects -- best viewed inside a # Quarto / R Markdown document or a pkgdown article. if (requireNamespace("tinytable", quietly = TRUE)) { tt <- table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "tinytable" ) } if (requireNamespace("gt", quietly = TRUE)) { tbl <- table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "gt" ) } if (requireNamespace("flextable", quietly = TRUE)) { ft <- table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "flextable" ) } # Excel and Word: write to a temporary file. if (requireNamespace("openxlsx2", quietly = TRUE)) { tmp <- tempfile(fileext = ".xlsx") table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "excel", excel_path = tmp ) unlink(tmp) } if ( requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE) ) { tmp <- tempfile(fileext = ".docx") table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "word", word_path = tmp ) unlink(tmp) } ## Not run: # Clipboard: writes to the system clipboard. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "clipboard" ) ## End(Not run)# --- Basic usage --------------------------------------------------------- # Default: ASCII table with model-based means, p, and \eqn{R^2}{R^2}. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex ) # --- Effect sizes ------------------------------------------------------- # Cohen's d (binary by required). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, effect_size = "d" ) # Hedges' g with weighted analysis and weighted n column. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, weights = weight, statistic = TRUE, effect_size = "g", show_weighted_n = TRUE ) # Hedges' g with noncentral t confidence interval (bracket notation). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, effect_size = "g", effect_size_ci = TRUE ) # Cohen's \eqn{f^2}{f^2} alongside \eqn{R^2}{R^2} (familiar power-analysis effect size). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, effect_size = "f2" ) # Hays' omega-squared for a 3-level predictor (d / g would error here). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = education, effect_size = "omega2" ) # --- Robust SE for a numeric predictor ---------------------------------- # HC3 standard errors for the slope of a continuous predictor. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = age, vcov = "HC3", ci = FALSE ) # Cluster-robust SE for repeated-measures data: the `sleep` dataset # has 10 subjects measured twice (one observation per group). table_continuous_lm( sleep, select = extra, by = group, cluster = ID, vcov = "CR2" ) # --- Covariate adjustment ---------------------------------------------- # Adjust the comparison of `wellbeing_score` and `bmi` by `sex` for `age` # and `education`. The footer surfaces the adjustment estimand # ("proportional" by default = G-computation, matching Stata `margins`). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, covariates = c(age, education), vcov = "HC3" ) # Same model with the emmeans / SPSS UNIANOVA convention (equal-weight # marginal means on a synthetic covariate grid). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, covariates = c(age, education), adjustment = "balanced", vcov = "HC3" ) # Effect sizes adjust automatically: f2 / omega2 become partial # effect sizes via partial F (drop1) restricted to the focal `by`. # d / g are undefined under adjustment and raise spicy_unsupported. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, covariates = c(age, education), effect_size = "f2", effect_size_ci = TRUE ) # --- Article-style polish ----------------------------------------------- # Pretty outcome labels and adjusted \eqn{R^2}{R^2}. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, labels = c( wellbeing_score = "WHO-5 wellbeing (0-100)", bmi = "Body-mass index (kg/m^2)" ), r2 = "adj_r2" ) # European decimal comma. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, decimal_mark = "," ) # Regex selection of all columns starting with "life_sat". table_continuous_lm( sochealth, select = "^life_sat", by = sex, regex = TRUE ) # --- Output formats ----------------------------------------------------- # The rendered outputs below all wrap the same call: # table_continuous_lm(sochealth, # select = c(wellbeing_score, bmi), # by = sex) # only `output` changes. Assign to a variable to avoid the # console-friendly text fallback that some engines fall back to # when printed directly in `?` help. # Wide data.frame (one row per outcome). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "data.frame" ) # Raw long data.frame (one block per outcome). table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "long" ) # Rendered HTML / docx objects -- best viewed inside a # Quarto / R Markdown document or a pkgdown article. if (requireNamespace("tinytable", quietly = TRUE)) { tt <- table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "tinytable" ) } if (requireNamespace("gt", quietly = TRUE)) { tbl <- table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "gt" ) } if (requireNamespace("flextable", quietly = TRUE)) { ft <- table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "flextable" ) } # Excel and Word: write to a temporary file. if (requireNamespace("openxlsx2", quietly = TRUE)) { tmp <- tempfile(fileext = ".xlsx") table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "excel", excel_path = tmp ) unlink(tmp) } if ( requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE) ) { tmp <- tempfile(fileext = ".docx") table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "word", word_path = tmp ) unlink(tmp) } ## Not run: # Clipboard: writes to the system clipboard. table_continuous_lm( sochealth, select = c(wellbeing_score, bmi), by = sex, output = "clipboard" ) ## End(Not run)
Publication-ready coefficient table from one or more fitted
lm / glm models. Supports standardised coefficients
(), average marginal effects (AME), partial
effect sizes ( / /
for lm; partial
for glm), pseudo-
(glm), and a full vocabulary of variance estimators
(classical / HC* / cluster-robust with Satterthwaite-corrected
df / bootstrap / jackknife). glm covers binomial / poisson /
Gamma / inverse.gaussian / quasi families with any link.
table_regression( models, vcov = "classical", cluster = NULL, ci_level = 0.95, ci_method = c("wald", "profile"), boot_n = 1000L, standardized = c("none", "refit", "posthoc", "basic", "smart", "pseudo"), exponentiate = FALSE, p_adjust = "none", show_columns = NULL, keep = NULL, drop = NULL, show_intercept = TRUE, intercept_position = c("first", "last"), factor_layout = c("grouped", "flat"), reference_style = c("row", "annotation", "footer", "none"), reference_label = "(ref.)", show_fit_stats = NULL, fit_stats_layout = c("first_col", "merged"), model_labels = NULL, outcome_labels = NULL, stars = FALSE, nested = FALSE, digits = 2L, p_digits = 3L, effect_size_digits = 2L, fit_digits = 2L, ic_digits = 1L, decimal_mark = ".", align = c("decimal", "center", "right"), padding = 0L, labels = NULL, title = NULL, note = NULL, output = c("default", "data.frame", "long", "gt", "flextable", "tinytable", "excel", "clipboard", "word"), excel_path = NULL, excel_sheet = "Regression", clipboard_delim = "\t", word_path = NULL, word_template = NULL )table_regression( models, vcov = "classical", cluster = NULL, ci_level = 0.95, ci_method = c("wald", "profile"), boot_n = 1000L, standardized = c("none", "refit", "posthoc", "basic", "smart", "pseudo"), exponentiate = FALSE, p_adjust = "none", show_columns = NULL, keep = NULL, drop = NULL, show_intercept = TRUE, intercept_position = c("first", "last"), factor_layout = c("grouped", "flat"), reference_style = c("row", "annotation", "footer", "none"), reference_label = "(ref.)", show_fit_stats = NULL, fit_stats_layout = c("first_col", "merged"), model_labels = NULL, outcome_labels = NULL, stars = FALSE, nested = FALSE, digits = 2L, p_digits = 3L, effect_size_digits = 2L, fit_digits = 2L, ic_digits = 1L, decimal_mark = ".", align = c("decimal", "center", "right"), padding = 0L, labels = NULL, title = NULL, note = NULL, output = c("default", "data.frame", "long", "gt", "flextable", "tinytable", "excel", "clipboard", "word"), excel_path = NULL, excel_sheet = "Regression", clipboard_delim = "\t", word_path = NULL, word_template = NULL )
models |
An |
vcov |
Variance-covariance estimator: |
cluster |
Cluster identifier for cluster-robust variance
(used when
For multi-model use, pass a list of one form per model
(mix-and-match allowed). Bare unquoted names
( |
ci_level |
Confidence level for all reported CIs (B, |
ci_method |
CI construction. |
boot_n |
Number of bootstrap replicates when
|
standardized |
Standardisation method for the |
exponentiate |
Logical. When |
p_adjust |
Multiple-comparison adjustment method applied
to the family of estimated coefficient p-values within each
model (intercept and reference rows excluded). One of
|
show_columns |
Character vector of tokens selecting the
per-coefficient columns and their display order. Accepts
atomic tokens ( |
keep |
Character vector of regexes. Only coefficient rows
whose term name (as in |
drop |
Character vector of regexes. Coefficient rows
matching any pattern are removed. Mutually exclusive with
|
show_intercept |
Whether to display the intercept row.
Default |
intercept_position |
Where to place the intercept when
shown. |
factor_layout |
Layout of factor predictors. Applies to
any categorical predictor –
|
reference_style |
Rendering of factor reference levels. Four modes, distinguishing WHERE the reference information is exposed (in a row, inline, in the footer, or nowhere):
Ordered factors with AME: under R's default |
reference_label |
Suffix shown after the reference level in
|
show_fit_stats |
Character vector of tokens for the
model-level rows below the coefficients; row order follows
token order.
Under |
fit_stats_layout |
Layout of the fit-stat values (
Cell merging is supported by
Decimal alignment of every numeric column is preserved in
both modes: the |
model_labels |
Per-model labels used as the column-group
spanner above each model's sub-columns (console + gt /
flextable / tinytable / Excel / Word renderers). |
outcome_labels |
Optional Outcome body row override.
|
stars |
Significance asterisks. |
nested |
Whether to inject pairwise change-statistic rows
for adjacent models (M2 vs M1, M3 vs M2, ...). |
digits |
Decimal places for general numeric tokens
( |
p_digits |
Decimal places for p-values ( |
effect_size_digits |
Decimals for per-coefficient effect
sizes ( |
fit_digits |
Decimals for variance-explained / model-level
effect-size fit stats ( |
ic_digits |
Decimals for information criteria ( |
decimal_mark |
Decimal mark used in numeric display.
|
align |
Numeric column alignment.
|
padding |
Non-negative integer giving the extra characters
added to each data column's auto-computed width when the
default |
labels |
Named character vector overriding per-coefficient
row labels. Names are coefficient term names (from
|
title, note
|
Override or suppress the auto-built caption / methodological footer. Three modes per argument:
Validation messages, the spanner row, and the in-body change- stat rows are not affected – they belong to the table structure, not to the banner. |
output |
Output type. |
excel_path |
File path for |
excel_sheet |
Sheet name when writing to Excel. Default
|
clipboard_delim |
Field delimiter for
Paste behaviour by target:
|
word_path |
File path for R Markdown / Quarto: for embedded use, prefer |
word_template |
Optional path to a custom .docx file used as the
template for Customising the caption appearance: the table caption is
tagged with the Word named style |
A spicy_regression_table object (a data.frame
subclass with classes c("spicy_regression_table", "spicy_table", "data.frame")) when output = "default".
The result carries rendering attributes (title, note,
align, padding) and provenance attributes (outcome,
model_ids) consumed by the print method and the broom
methods. For other output values, returns the
format-specific object (gt_tbl, flextable, tinytable,
data.frame, tbl_df, or invisible(x) for side-effect
outputs).
Two vector arguments – show_columns and show_fit_stats –
accept named tokens that select what to display and in
what order. All tokens are lowercase
(snake_case for compound tokens). Group tokens
("all_b", "all_ame", ...) expand to a fixed vector of
atomic tokens; see show_columns below.
show_columns – per-coefficient columnsEach token = one displayed column.
Coefficient family: "b", "beta" (standardised),
"se", "ci", "t", "p".
Marginal effects: "ame", "ame_se", "ame_ci",
"ame_p". "p" always refers to the B-coefficient
p-value; for the AME-specific p-value use "ame_p".
Partial effect sizes – lm only: "partial_f2",
"partial_eta2", "partial_omega2", each with a paired
_ci companion ("partial_f2_ci", ...).
Partial effect size – glm only: "partial_chi2"
(likelihood-ratio chi-square via drop1(test = "LRT");
SAS PROC LOGISTIC TYPE3; Long & Freese 2014 Section 3.5).
Rendered as value (df) to disambiguate factor terms
(k-1 df) from numeric terms (1 df).
Group tokens (presets) expand to a fixed atomic vector before validation:
"all_b" -> c("b", "se", "ci", "p")
"all_b_compact" -> c("b", "se", "p")
"all_b_full" -> c("b", "se", "ci", "t", "p")
"all_beta" -> c("b", "beta", "se", "ci", "p")
"all_ame" -> c("ame", "ame_se", "ame_ci", "ame_p")
"all_ame_compact" -> c("ame", "ame_p")
"all_f2" / "all_eta2" / "all_omega2" -> partial_*
+ its _ci companion.
Mix groups and atomic tokens:
show_columns = c("all_b", "ame", "ame_p"). Duplicates after
expansion are deduplicated; the order of tokens controls the
order of the displayed columns. If standardized != "none" and
"beta" is not already requested, it is auto-injected after
"b". Asking for "beta" while standardized = "none" raises
spicy_invalid_input.
Default (show_columns = NULL) is context-aware:
"all_b" for a single model (APA-7 Section 6.46 publication layout),
"all_b_compact" for two or more models (CI dropped to fit the
side-by-side layout; restore it explicitly when needed).
show_fit_stats – model-level rows below the coefficients Counts: "nobs", "weighted_nobs".
Variance explained (lm only): "r2", "adj_r2",
"omega2".
Pseudo- (glm only): "pseudo_r2_mcfadden"
(McFadden 1974), "pseudo_r2_nagelkerke" (Nagelkerke 1991),
"pseudo_r2_tjur" (Tjur 2009; binomial only).
Residual scale: "sigma" (lm
/ glm dispersion), "rmse".
Effect size: "f2".
Information criteria: "AIC", "AICc", "BIC",
"deviance".
Change-stats for hierarchical comparison
(active under nested = TRUE; see Hierarchical
comparison below): "r2_change", "adj_r2_change",
"f_change", "f2_change", "lrt_change",
"aic_change", "aicc_change", "bic_change",
"deviance_change", "p_change".
Default (resolved when NULL) is class-aware: lm fits get
c("nobs", "r2", "adj_r2"); glm fits get
c("nobs", "pseudo_r2_mcfadden", "pseudo_r2_nagelkerke", "AIC");
mixed lm + glm sets union both groups (the renderer per-row
em-dashes the inappropriate cell). When nested = TRUE, the
class-aware default is extended with change tokens
(c("r2_change", "f_change", "p_change") for lm,
c("lrt_change", "p_change") for glm). The order of tokens in
show_fit_stats controls the order of the rows.
Pass a single fit or a list() of fits. Multi-model layout
draws a centred spanner label above each model's
sub-columns:
list("Naive" = m1, "Adjusted" = m2) -> spanner labels
"Naive" / "Adjusted". Partial naming (list("Naive" = m1, m2))
auto-fills missing slots as "Model <position>".
list(m1, m2) (unnamed) -> if all response variables
differ, the bare DV name (from formula(fit)[[2]]) becomes
the spanner label and the redundant Outcome body row is
suppressed. If DVs match, the labels default to
"Model 1, 2, ...".
model_labels = c("A", "B") overrides everything.
Duplicate explicit names in the list are rejected
(spicy_invalid_input) – they would silently collide in the
internal model_id key.
vcov selects the variance-covariance estimator:
"classical" – OLS (lm) / MLE inverse Hessian (glm).
"HC0" to "HC5" – heteroskedasticity-consistent
(via sandwich::vcovHC()).
"CR0" to "CR3" – cluster-robust with
Satterthwaite-corrected df (via clubSandwich::vcovCR()).
Requires cluster.
"bootstrap" – nonparametric or cluster bootstrap
(boot_n replicates).
"jackknife" – leave-one-out / leave-one-cluster-out.
For multi-model use, both vcov and cluster accept a single
value (recycled to all models) or a list (one per model). The
same fit can appear several times with different estimators to
compare standard errors side-by-side.
Inferential regimes (B and AME share the same regime):
classical, HC* -> t with df.residual.
bootstrap, jackknife -> z asymptotic.
CR0-CR3 -> t with Satterthwaite-corrected df (B
via clubSandwich::coef_test(); AME via
clubSandwich::linear_contrast(); Pustejovsky & Tipton
2018). Under non-linear terms (poly(), I(), log(),
splines::ns()), AME falls back to z-asymptotic with a
spicy_fallback warning.
cluster
Three accepted forms, in order of preference:
Formula – cluster = ~region (or
cluster = ~region:year for the interaction of two
variables). The variables are looked up in
model.frame(fit) first, then in the original data
argument captured by the fit. Recommended: independent
of the dataset's name, composable for multi-way clustering,
consistent with sandwich::vcovCL() /
clubSandwich::vcovCR().
String – cluster = "region". A single column
name resolved the same way as the formula. Convenient but
cannot express interactions.
Vector – cluster = df$region. An atomic vector of
length nobs(fit). Use this when the cluster key is
derived on the fly (cluster = interaction(df$region, df$year),
cluster = as.integer(format(df$date, "%Y"))), comes from
a different dataset with matching row order, or is
otherwise not a column of the model's data.
Bare unquoted names (cluster = region) are not accepted –
they would require non-standard evaluation magic that breaks
under programmatic use (function wrapping, dynamic column
choice, loops). Use ~region or "region" instead.
For multi-model use, mix forms freely:
cluster = list(~region, "region", df$region).
nested = TRUE adds per-pair change statistics as in-table
rows (APA Table 7.13 / Stata esttab / SPSS Model Summary
convention). Each adjacent pair (M2 vs M1, M3 vs M2, ...)
contributes one column of change stats; the FIRST model column
gets em-dashes (no previous model to compare to). Validation
requires identical nobs and identical response variable
across all models.
Default change tokens auto-injected when show_fit_stats is
NULL:
All-lm: c("r2_change", "f_change", "p_change") –
APA hierarchical regression standard.
All-glm: c("lrt_change", "p_change") – Hosmer &
Lemeshow Section 3.5; Long & Freese 2014 Section 3.6.
To customise, pass the change tokens directly to
show_fit_stats. Variance-explained change tokens on an
all-glm hierarchy raise spicy_invalid_input (the
residual-sum-of-squares partition does not apply outside the
least-squares framework – the renderer points the user at
lrt_change).
standardized controls the method when "beta" is in
show_columns:
"refit" – refit on z-scored data. For lm both X
and Y are z-scored (Cohen et al. 2003 gold standard); for
glm only numeric X (Long & Freese 2014 Section 4.3.4
"x-standardization").
"posthoc" – post-hoc scaling. lm:
;
glm: X-only
(Y is undefined on the link scale).
"basic" – like "posthoc" but factor dummies are
scaled by their column SD.
"smart" – Gelman (2008): divide binary predictors
by 2 * SD instead of SD.
"pseudo" – glm only. Menard (2004, 2011)
fully-standardised ,
with the latent variable on the link scale and
( logit, 1 probit,
cloglog). Binomial families only;
non-binomial returns NA with a spicy_caveat.
"none" (default) – no computed.
Under interactions or transformed predictors (I(), poly(),
log(), splines::ns()), a spicy_caveat warns that
standardised coefficients on such terms are subtle to interpret
(Cohen et al. 2003 Section 7.7; Aiken & West 1991). The caveat is
auto-documented in the footer.
Adjusting the p-values of all coefficients of a single
regression model is not the standard convention. Each
coefficient tests a distinct hypothesis on a distinct
predictor – not the situation multiple-testing procedures
were designed for (Rothman 1990; Greenland 2017; APA Manual 7
Section 6.46; Harrell Regression Modeling Strategies Section 5.4; Gelman,
Hill & Yajima 2012). Hence the default p_adjust = "none".
Adjustment is appropriate for: mass screening with no prior
hypothesis (typically "BH" / FDR), pre-registered
multi-endpoint confirmatory designs (typically "holm"), or
when a journal / SAP explicitly requests it.
The adjustment runs before any keep / drop filtering,
so the family is the model's full coefficient set (intercept
and reference rows excluded), not the displayed subset –
filtering is a display choice and must not change the
inferential family.
output selects the return type:
"default" – a spicy_regression_table
(data.frame subclass) printed via spicy_print_table().
"data.frame" / "long" – raw data.frame /
long-format tibble.
"gt" / "flextable" / "tinytable" – rich-format
HTML / Word / PDF tables (require the corresponding
Suggests package).
"excel" – writes to excel_path via
openxlsx2::write_xlsx().
"word" – writes to word_path via
flextable::save_as_docx().
"clipboard" – copies to the system clipboard via
clipr::write_clip().
broom::tidy() returns a long tibble with one row per
(model_id, term, estimate_type) and broom-canonical column
names (estimate, std.error, conf.low, conf.high,
statistic, p.value). broom::glance() returns one row per
model with the model-level statistics; df.residual is kept
numeric so cluster-robust Satterthwaite df is preserved.
No weights argument: weights are a property of the fit
(extracted via stats::weights()). Pass them when fitting:
lm(y ~ x, data = df, weights = w). All downstream
computations (vcov, AME, standardisation, weighted_nobs)
extract them automatically.
Output is in English. Override user-facing strings via
reference_label, model_labels, outcome_labels, and
labels. The title and footer are post-processable via
attr(result, "title") and attr(result, "note").
Every error and warning emitted by table_regression() carries
a classed condition for programmatic dispatch via tryCatch()
or withCallingHandlers(). Errors inherit from spicy_error
(root); warnings from spicy_warning. Specific leaves used by
this function include spicy_invalid_input,
spicy_invalid_data, spicy_unsupported,
spicy_missing_pkg, spicy_missing_column,
spicy_ignored_arg, spicy_caveat, spicy_fallback. See
spicy for the full taxonomy.
APA Manual 7 (American Psychological Association, 2020), Tables 7.13-7.15.
Aiken, L.S. & West, S.G. (1991). Multiple regression: Testing and interpreting interactions.
Cohen, J., Cohen, P., West, S.G., & Aiken, L.S. (2003). Applied multiple regression / correlation analysis for the behavioral sciences (3rd ed.). Lawrence Erlbaum.
Pustejovsky, J.E. & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672-683.
Wasserstein, R.L., Schirm, A.L., & Lazar, N.A. (2019). Moving to a world beyond "p < 0.05". The American Statistician, 73(sup1), 1-19.
Other regression-table functions:
table_continuous_lm() for one-predictor-by-many-outcomes
descriptive tables.
Other spicy table functions:
freq(), cross_tab(), table_categorical(),
table_continuous().
Underlying machinery:
spicy_print_table() for ASCII rendering;
build_ascii_table() for the low-level renderer.
Inferential infrastructure (internal):
compute_lm_vcov(), compute_lm_coef_inference(),
compute_lm_wald_test().
broom integration:
broom::tidy(), broom::glance().
# ---- Single-model usage ------------------------------------------ fit <- lm(wellbeing_score ~ age + sex + smoking, data = sochealth) # Default APA layout: B / SE / 95% CI / p plus the n / R^2 / # Adj.R^2 fit-stats footer. Factor reference level is annotated # with `(ref.)` and shows an em-dash in the statistic columns. table_regression(fit) # Standardised coefficients (beta) injected next to B. Four # methods available; "refit" is the SPSS / Stata regress, beta # gold standard. table_regression(fit, standardized = "refit") # Custom column set: B + AME + AME-specific p-value. Note that # the `p` token always belongs to B, never to AME -- use the # explicit `ame_p` token for AME inference. table_regression( fit, show_columns = c("b", "p", "ame", "ame_ci", "ame_p") ) # Group-token shortcut: "all_b" + "all_ame" expands to the full # B / AME column families side by side. table_regression(fit, show_columns = c("all_b", "all_ame")) # ---- Cluster-robust variance ------------------------------------- # CR2 (Bell-McCaffrey) with Satterthwaite-corrected df is the # recommended default under few clusters. Three forms are accepted # for `cluster`; the formula is preferred for composability with # multi-way clustering and for programmatic robustness. table_regression(fit, vcov = "CR2", cluster = ~region) table_regression(fit, vcov = "CR2", cluster = "region") table_regression(fit, vcov = "CR2", cluster = ~region:age_group) # ---- Hierarchical (nested) regression ---------------------------- # Adds in-table change-statistic rows (Delta R^2 / F-change / # p-change for lm; LRT / p-change for glm) below the fit-stats. # Note: hierarchical comparison requires identical observations # across all models -- prepare a complete-case subset first so # R's listwise deletion does not produce different `nobs` per # model (which the function rejects). sochealth_cc <- na.omit( sochealth[, c("wellbeing_score", "age", "sex", "smoking")] ) m1 <- lm(wellbeing_score ~ age, data = sochealth_cc) m2 <- lm(wellbeing_score ~ age + sex, data = sochealth_cc) m3 <- lm(wellbeing_score ~ age + sex + smoking, data = sochealth_cc) table_regression( list("Step 1" = m1, "Step 2" = m2, "Step 3" = m3), nested = TRUE ) # ---- Side-by-side variance comparison ---------------------------- # Same fit, three vcovs in one wide table. Useful for showing the # sensitivity of inference to the variance assumption. table_regression( list("Classical" = fit, "HC3" = fit, "CR2" = fit), vcov = list("classical", "HC3", "CR2"), cluster = list(NULL, NULL, ~region) ) # ---- Tidy long format for downstream pipelines ------------------- broom::tidy(table_regression(fit)) ## Not run: # ---- Rich-format outputs (require optional Suggests packages) ---- table_regression(fit, output = "gt") table_regression(fit, output = "flextable") table_regression(fit, output = "tinytable") # ---- File outputs ------------------------------------------------ table_regression(fit, output = "excel", excel_path = tempfile(fileext = ".xlsx")) table_regression(fit, output = "word", word_path = tempfile(fileext = ".docx")) # ---- System clipboard (interactive use) -------------------------- table_regression(fit, output = "clipboard") ## End(Not run)# ---- Single-model usage ------------------------------------------ fit <- lm(wellbeing_score ~ age + sex + smoking, data = sochealth) # Default APA layout: B / SE / 95% CI / p plus the n / R^2 / # Adj.R^2 fit-stats footer. Factor reference level is annotated # with `(ref.)` and shows an em-dash in the statistic columns. table_regression(fit) # Standardised coefficients (beta) injected next to B. Four # methods available; "refit" is the SPSS / Stata regress, beta # gold standard. table_regression(fit, standardized = "refit") # Custom column set: B + AME + AME-specific p-value. Note that # the `p` token always belongs to B, never to AME -- use the # explicit `ame_p` token for AME inference. table_regression( fit, show_columns = c("b", "p", "ame", "ame_ci", "ame_p") ) # Group-token shortcut: "all_b" + "all_ame" expands to the full # B / AME column families side by side. table_regression(fit, show_columns = c("all_b", "all_ame")) # ---- Cluster-robust variance ------------------------------------- # CR2 (Bell-McCaffrey) with Satterthwaite-corrected df is the # recommended default under few clusters. Three forms are accepted # for `cluster`; the formula is preferred for composability with # multi-way clustering and for programmatic robustness. table_regression(fit, vcov = "CR2", cluster = ~region) table_regression(fit, vcov = "CR2", cluster = "region") table_regression(fit, vcov = "CR2", cluster = ~region:age_group) # ---- Hierarchical (nested) regression ---------------------------- # Adds in-table change-statistic rows (Delta R^2 / F-change / # p-change for lm; LRT / p-change for glm) below the fit-stats. # Note: hierarchical comparison requires identical observations # across all models -- prepare a complete-case subset first so # R's listwise deletion does not produce different `nobs` per # model (which the function rejects). sochealth_cc <- na.omit( sochealth[, c("wellbeing_score", "age", "sex", "smoking")] ) m1 <- lm(wellbeing_score ~ age, data = sochealth_cc) m2 <- lm(wellbeing_score ~ age + sex, data = sochealth_cc) m3 <- lm(wellbeing_score ~ age + sex + smoking, data = sochealth_cc) table_regression( list("Step 1" = m1, "Step 2" = m2, "Step 3" = m3), nested = TRUE ) # ---- Side-by-side variance comparison ---------------------------- # Same fit, three vcovs in one wide table. Useful for showing the # sensitivity of inference to the variance assumption. table_regression( list("Classical" = fit, "HC3" = fit, "CR2" = fit), vcov = list("classical", "HC3", "CR2"), cluster = list(NULL, NULL, ~region) ) # ---- Tidy long format for downstream pipelines ------------------- broom::tidy(table_regression(fit)) ## Not run: # ---- Rich-format outputs (require optional Suggests packages) ---- table_regression(fit, output = "gt") table_regression(fit, output = "flextable") table_regression(fit, output = "tinytable") # ---- File outputs ------------------------------------------------ table_regression(fit, output = "excel", excel_path = tempfile(fileext = ".xlsx")) table_regression(fit, output = "word", word_path = tempfile(fileext = ".docx")) # ---- System clipboard (interactive use) -------------------------- table_regression(fit, output = "clipboard") ## End(Not run)
uncertainty_coef() computes the Uncertainty Coefficient
(Theil's U) for a two-way contingency table, based on
information entropy.
uncertainty_coef( x, direction = c("symmetric", "row", "column"), detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )uncertainty_coef( x, direction = c("symmetric", "row", "column"), detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE )
x |
A contingency table (of class |
direction |
Direction of prediction:
|
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
The uncertainty coefficient measures association using Shannon
entropy. Let and be the marginal entropies
of the row and column variables respectively, and
the joint entropy.
direction = "row" (column predicts row):
.
direction = "column" (row predicts column):
.
direction = "symmetric":
.
The entropy terms use the standard mathematical convention
, matching SPSS / PSPP CROSSTABS and the
definition in Cover & Thomas (2006). Note that
DescTools::UncertCoef() applies an additional Laplace
correction (replacing zero cells with ) before the
entropy computation, which produces slightly different point
estimates on tables with empty cells; that correction is
uncommon in the information-theory literature and is not used
here. The asymptotic standard errors follow the DescTools delta
method; see cramer_v() for full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: U = 0 (Wald z-test).
lambda_gk(), goodman_kruskal_tau(), assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
phi(),
somers_d(),
yule_q()
tab <- table(sochealth$smoking, sochealth$education) uncertainty_coef(tab) uncertainty_coef(tab, direction = "row", detail = TRUE)tab <- table(sochealth$smoking, sochealth$education) uncertainty_coef(tab) uncertainty_coef(tab, direction = "row", detail = TRUE)
varlist() lists the variables of a data frame and extracts
essential metadata: variable names, labels, summary values,
classes, number of distinct values, number of valid (non-missing)
observations, and number of missing values. Tidyselect-style
selectors can be supplied to pick or reorder columns dynamically.
vl() is a convenient shorthand for varlist() that offers identical
functionality with a shorter name.
varlist( x, ..., values = FALSE, tbl = FALSE, include_na = FALSE, factor_levels = c("observed", "all") ) vl( x, ..., values = FALSE, tbl = FALSE, include_na = FALSE, factor_levels = c("observed", "all") )varlist( x, ..., values = FALSE, tbl = FALSE, include_na = FALSE, factor_levels = c("observed", "all") ) vl( x, ..., values = FALSE, tbl = FALSE, include_na = FALSE, factor_levels = c("observed", "all") )
x |
A data frame, or a transformation of one. |
... |
Optional tidyselect-style column selectors (e.g.
|
values |
Logical. If |
tbl |
Logical. If |
include_na |
Logical. If |
factor_levels |
Character. Controls how factor values are displayed
in |
In an interactive session (RStudio, Positron, ...), the summary
opens in the Viewer pane with a contextual title like
vl: sochealth. If the data frame has been transformed or
subsetted, the title is suffixed with * (e.g. vl: sochealth*);
anonymous or ambiguous calls fall back to vl: <data>. Pass
tbl = TRUE to return a tibble instead.
The default factor_levels = "observed" mirrors what is actually
in the data; code_book() defaults to "all" to document the
declared schema. See @param factor_levels to override either
default.
A tibble with one row per selected variable, containing the following columns:
Variable: variable names
Label: variable labels (if available via the label attribute)
Values: a summary of the variable's values, depending on the values
and include_na arguments. If values = FALSE, a compact summary is
shown: all unique values when there are at most four, otherwise
3 + ... + last. If values = TRUE, all unique non-missing values are
displayed. For labelled variables, prefixed labels are displayed using
labelled::to_factor(levels = "prefixed").
For factors, levels are displayed according to factor_levels.
Matrix and array columns are summarized by their dimensions.
Missing value markers (<NA>, <NaN>) are optionally appended at the
end (controlled via include_na). Literal strings "NA", "NaN", and
"" are quoted to distinguish them from missing markers.
Class: the class of each variable (possibly multiple, e.g.
"labelled", "numeric")
N_distinct: number of distinct non-missing values
N_valid: number of non-missing observations
NAs: number of missing observations
For matrix and array columns, observations are counted per row:
a row is treated as missing if any of its cells is NA. N_valid
/ NAs therefore count complete vs. incomplete rows, not
individual cells.
With tbl = FALSE (the default) the tibble is sent to the
Viewer (interactive) or surfaced via a message (non-interactive)
and the function returns invisibly NULL. Set tbl = TRUE to
return the tibble directly for downstream use.
Other variable inspection:
code_book(),
label_from_names()
varlist(sochealth, tbl = TRUE) sochealth |> varlist(tbl = TRUE) varlist(sochealth, where(is.numeric), values = TRUE, tbl = TRUE) varlist( sochealth, starts_with("bmi"), values = TRUE, include_na = TRUE, tbl = TRUE ) df <- data.frame( group = factor(c("A", "B", NA), levels = c("A", "B", "C")) ) varlist( df, values = TRUE, include_na = TRUE, factor_levels = "all", tbl = TRUE ) vl(sochealth, tbl = TRUE) sochealth |> vl(tbl = TRUE) vl(sochealth, starts_with("bmi"), tbl = TRUE) vl(sochealth, where(is.numeric), values = TRUE, tbl = TRUE)varlist(sochealth, tbl = TRUE) sochealth |> varlist(tbl = TRUE) varlist(sochealth, where(is.numeric), values = TRUE, tbl = TRUE) varlist( sochealth, starts_with("bmi"), values = TRUE, include_na = TRUE, tbl = TRUE ) df <- data.frame( group = factor(c("A", "B", NA), levels = c("A", "B", "C")) ) varlist( df, values = TRUE, include_na = TRUE, factor_levels = "all", tbl = TRUE ) vl(sochealth, tbl = TRUE) sochealth |> vl(tbl = TRUE) vl(sochealth, starts_with("bmi"), tbl = TRUE) vl(sochealth, where(is.numeric), values = TRUE, tbl = TRUE)
yule_q() computes Yule's Q coefficient of association for a 2x2
contingency table.
yule_q(x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE)yule_q(x, detail = FALSE, conf_level = 0.95, digits = 3L, .include_se = FALSE)
x |
A contingency table (of class |
detail |
Logical. If |
conf_level |
A number between 0 and 1 giving the confidence
level (default |
digits |
Number of decimal places used when printing the
result (default |
.include_se |
Internal parameter; do not use. |
For a 2x2 table with cells , Yule's Q is
. It is equivalent to the
Goodman-Kruskal Gamma for 2x2 tables. The asymptotic standard
error is
.
Edge cases: when ad + bc = 0, Q itself is undefined and the
function returns NA with a spicy_undefined_stat warning.
When any cell is zero (and ad + bc > 0), Q is well-defined
but the SE formula divides by zero – the point estimate is
returned, and se, ci_lower, ci_upper, and p_value are
all NA.
Standard error formulas follow the DescTools implementations
(Signorell et al., 2024); see cramer_v() for full references.
Same structure as cramer_v(): a scalar when
detail = FALSE, a named vector when detail = TRUE.
The p-value tests H0: Q = 0 (Wald z-test).
phi(), gamma_gk(), assoc_measures()
Other association measures:
assoc_measures(),
contingency_coef(),
cramer_v(),
gamma_gk(),
goodman_kruskal_tau(),
kendall_tau_b(),
kendall_tau_c(),
lambda_gk(),
phi(),
somers_d(),
uncertainty_coef()
tab <- table(sochealth$smoking, sochealth$sex) yule_q(tab)tab <- table(sochealth$smoking, sochealth$sex) yule_q(tab)