| Title: | Publication-Ready Summary Tables and Forest Plots |
|---|---|
| Description: | A comprehensive framework for descriptive statistics and regression analysis that produces publication-ready tables and forest plots. Provides a unified interface from descriptive statistics through multivariable modeling, with support for linear models, generalized linear models, Cox proportional hazards, and mixed-effects models. Also includes univariable screening, multivariate regression, model comparison, and export to multiple formats including PDF, DOCX, PPTX, 'LaTeX', HTML, and RTF. Built on 'data.table' for computational efficiency. |
| Authors: | Paul Hsin-ti McClelland [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-3119-6531>) |
| Maintainer: | Paul Hsin-ti McClelland <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.11.5 |
| Built: | 2026-06-07 10:52:17 UTC |
| Source: | https://github.com/phmcc/summata |
A convenience wrapper function that automatically detects the input type and routes to the appropriate specialized forest plot function. This eliminates the need to remember which forest function to call for different model types or analysis objects, making it ideal for exploratory analysis and rapid prototyping.
autoforest(x, data = NULL, title = NULL, ...)autoforest(x, data = NULL, title = NULL, ...)
x |
One of the following:
|
data |
Data frame or data.table containing the original data. Required
when |
title |
Character string for plot title. If
|
... |
Additional arguments passed to the specific forest plot function. Common arguments include:
See the documentation for the specific forest function for all available options. |
This function provides a convenient wrapper around the specialized forest plot functions, automatically routing to the appropriate function based on the model class or result type. All parameters are passed through to the underlying function, so the full range of options remains available.
For model-specific advanced features, individual forest functions may be called directly.
Automatic Detection Logic:
The function uses the following priority order for detection:
uniscreen results: Detected by class "uniscreen_result" or
presence of attributes outcome, predictors, model_type,
and model_scope = "Univariable". Routes to uniforest().
multifit results: Detected by presence of attributes
predictor, outcomes, model_type, and raw_data.
Routes to multiforest().
Cox models: Classes coxph or clogit. Routes to
coxforest().
GLM models: Class glm. Routes to glmforest().
Linear models: Class lm (but not glm). Routes to
lmforest().
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly: print(plot)
Saved to file: ggsave("forest.pdf", plot, width = 12, height = 8)
Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
Numeric. Recommended plot width in specified units
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
glmforest for GLM forest plots,
coxforest for Cox model forest plots,
lmforest for linear model forest plots,
uniforest for univariable screening forest plots,
multiforest for multi-outcome forest plots,
fit for single-model regression,
fullfit for combined univariable/multivariable regression,
uniscreen for univariable screening,
multifit for multi-outcome analysis
Other visualization functions:
coxforest(),
glmforest(),
lmforest(),
multiforest(),
uniforest()
data(clintrial) data(clintrial_labels) library(survival) # Create example model glm_model <- glm(surgery ~ age + sex + bmi + smoking, family = binomial, data = clintrial) # Example 1: Logistic regression model p <- autoforest(glm_model, data = clintrial) # Automatically detects GLM and routes to glmforest() # Example 2: Cox proportional hazards model cox_model <- coxph(Surv(os_months, os_status) ~ age + sex + treatment + stage, data = clintrial) plot2 <- autoforest(cox_model, data = clintrial) # Automatically detects coxph and routes to coxforest() # Example 3: Linear regression model lm_model <- lm(biomarker_x ~ age + sex + bmi + treatment, data = clintrial) plot3 <- autoforest(lm_model, data = clintrial) # Automatically detects lm and routes to lmforest() # Example 4: With custom labels and formatting options plot4 <- autoforest( cox_model, data = clintrial, labels = clintrial_labels, title = "Prognostic Factors for Overall Survival", zebra_stripes = TRUE, indent_groups = TRUE ) # Example 5: From fit() result - data and labels extracted automatically fit_result <- fit( data = clintrial, outcome = "surgery", predictors = c("age", "sex", "bmi", "treatment"), labels = clintrial_labels ) plot5 <- autoforest(fit_result) # No need to pass data or labels - extracted from fit_result # Save with recommended dimensions dims <- attr(plot5, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "forest.pdf"), plot5, width = dims$width, height = dims$height)data(clintrial) data(clintrial_labels) library(survival) # Create example model glm_model <- glm(surgery ~ age + sex + bmi + smoking, family = binomial, data = clintrial) # Example 1: Logistic regression model p <- autoforest(glm_model, data = clintrial) # Automatically detects GLM and routes to glmforest() # Example 2: Cox proportional hazards model cox_model <- coxph(Surv(os_months, os_status) ~ age + sex + treatment + stage, data = clintrial) plot2 <- autoforest(cox_model, data = clintrial) # Automatically detects coxph and routes to coxforest() # Example 3: Linear regression model lm_model <- lm(biomarker_x ~ age + sex + bmi + treatment, data = clintrial) plot3 <- autoforest(lm_model, data = clintrial) # Automatically detects lm and routes to lmforest() # Example 4: With custom labels and formatting options plot4 <- autoforest( cox_model, data = clintrial, labels = clintrial_labels, title = "Prognostic Factors for Overall Survival", zebra_stripes = TRUE, indent_groups = TRUE ) # Example 5: From fit() result - data and labels extracted automatically fit_result <- fit( data = clintrial, outcome = "surgery", predictors = c("age", "sex", "bmi", "treatment"), labels = clintrial_labels ) plot5 <- autoforest(fit_result) # No need to pass data or labels - extracted from fit_result # Save with recommended dimensions dims <- attr(plot5, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "forest.pdf"), plot5, width = dims$width, height = dims$height)
Automatically detects the output format based on file extension and exports the table using the appropriate specialized function. Provides a unified interface for table export across all supported formats.
autotable(table, file, ...)autotable(table, file, ...)
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output filename. The file extension determines the export format:
|
... |
Additional arguments passed to the format-specific function. See the documentation for individual functions for available parameters:
Common parameters across formats include:
|
This function provides a convenient wrapper around format-specific export functions, automatically routing to the appropriate function based on the file extension. All parameters are passed through to the underlying function, so the full range of format-specific options remains available.
For format-specific advanced features, you may prefer to use the individual export functions directly:
PDF exports support orientation, paper size, margins, and auto-sizing
DOCX/PPTX/RTF support font customization and flextable formatting
HTML supports CSS styling, responsive design, and custom themes
TeX generates standalone LaTeX source with booktabs styling
Invisibly returns the file path. Called primarily for its side effect of creating the output file.
table2pdf, table2docx, table2pptx,
table2html, table2rtf, table2tex
Other export functions:
table2docx(),
table2html(),
table2pdf(),
table2pptx(),
table2rtf(),
table2tex()
# Create example data data(clintrial) data(clintrial_labels) tbl <- desctable(clintrial, by = "treatment", variables = c("age", "sex"), labels = clintrial_labels) # Auto-detect format from extension if (requireNamespace("xtable", quietly = TRUE)) { autotable(tbl, file.path(tempdir(), "example.html")) } # Load example data data(clintrial) data(clintrial_labels) # Create a regression table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment"), labels = clintrial_labels ) # Test that LaTeX can actually compile (needed for PDF export) has_latex <- local({ if (!nzchar(Sys.which("pdflatex"))) return(FALSE) test_tex <- file.path(tempdir(), "summata_latex_test.tex") writeLines(c("\\documentclass{article}", "\\usepackage{booktabs}", "\\begin{document}", "test", "\\end{document}"), test_tex) result <- tryCatch( system2("pdflatex", c("-interaction=nonstopmode", paste0("-output-directory=", tempdir()), test_tex), stdout = FALSE, stderr = FALSE), error = function(e) 1L) result == 0L }) # Export automatically detects format from extension autotable(results, file.path(tempdir(), "results.html")) # Creates HTML file autotable(results, file.path(tempdir(), "results.docx")) # Creates Word document autotable(results, file.path(tempdir(), "results.pptx")) # Creates PowerPoint slide autotable(results, file.path(tempdir(), "results.tex")) # Creates LaTeX source autotable(results, file.path(tempdir(), "results.rtf")) # Creates RTF document if (has_latex) { autotable(results, file.path(tempdir(), "results.pdf")) # Creates PDF } # Pass format-specific parameters if (has_latex) { autotable(results, file.path(tempdir(), "results.pdf"), orientation = "landscape", paper = "a4", font_size = 10) } autotable(results, file.path(tempdir(), "results.docx"), caption = "Table 1: Logistic Regression Results", font_family = "Times New Roman", condense_table = TRUE) autotable(results, file.path(tempdir(), "results.html"), zebra_stripes = TRUE, dark_header = TRUE, bold_significant = TRUE) # Works with any summata table output desc <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi")) if (has_latex) { autotable(desc, file.path(tempdir(), "demographics.pdf")) } comparison <- compfit( data = clintrial, outcome = "os_status", model_list = list( base = c("age", "sex"), full = c("age", "sex", "treatment", "stage") ) ) autotable(comparison, file.path(tempdir(), "model_comparison.docx"))# Create example data data(clintrial) data(clintrial_labels) tbl <- desctable(clintrial, by = "treatment", variables = c("age", "sex"), labels = clintrial_labels) # Auto-detect format from extension if (requireNamespace("xtable", quietly = TRUE)) { autotable(tbl, file.path(tempdir(), "example.html")) } # Load example data data(clintrial) data(clintrial_labels) # Create a regression table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment"), labels = clintrial_labels ) # Test that LaTeX can actually compile (needed for PDF export) has_latex <- local({ if (!nzchar(Sys.which("pdflatex"))) return(FALSE) test_tex <- file.path(tempdir(), "summata_latex_test.tex") writeLines(c("\\documentclass{article}", "\\usepackage{booktabs}", "\\begin{document}", "test", "\\end{document}"), test_tex) result <- tryCatch( system2("pdflatex", c("-interaction=nonstopmode", paste0("-output-directory=", tempdir()), test_tex), stdout = FALSE, stderr = FALSE), error = function(e) 1L) result == 0L }) # Export automatically detects format from extension autotable(results, file.path(tempdir(), "results.html")) # Creates HTML file autotable(results, file.path(tempdir(), "results.docx")) # Creates Word document autotable(results, file.path(tempdir(), "results.pptx")) # Creates PowerPoint slide autotable(results, file.path(tempdir(), "results.tex")) # Creates LaTeX source autotable(results, file.path(tempdir(), "results.rtf")) # Creates RTF document if (has_latex) { autotable(results, file.path(tempdir(), "results.pdf")) # Creates PDF } # Pass format-specific parameters if (has_latex) { autotable(results, file.path(tempdir(), "results.pdf"), orientation = "landscape", paper = "a4", font_size = 10) } autotable(results, file.path(tempdir(), "results.docx"), caption = "Table 1: Logistic Regression Results", font_family = "Times New Roman", condense_table = TRUE) autotable(results, file.path(tempdir(), "results.html"), zebra_stripes = TRUE, dark_header = TRUE, bold_significant = TRUE) # Works with any summata table output desc <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi")) if (has_latex) { autotable(desc, file.path(tempdir(), "demographics.pdf")) } comparison <- compfit( data = clintrial, outcome = "os_status", model_list = list( base = c("age", "sex"), full = c("age", "sex", "treatment", "stage") ) ) autotable(comparison, file.path(tempdir(), "model_comparison.docx"))
A simulated dataset from a hypothetical multi-center oncology clinical trial comparing two experimental drugs against control. Designed to demonstrate the full capabilities of descriptive and regression analysis functions.
clintrialclintrial
A data frame with 850 observations and 32 variables:
Unique patient identifier (character)
Age at enrollment in years (numeric: 18-90)
Biological sex (factor: Female, Male)
Self-reported race (factor: White, Black, Asian, Other)
Hispanic ethnicity (factor: Non-Hispanic, Hispanic)
Body mass index in kg/m (numeric)
Smoking history (factor: Never, Former, Current)
Hypertension diagnosis (factor: No, Yes)
Diabetes diagnosis (factor: No, Yes)
ECOG performance status (factor: 0, 1, 2, 3)
Baseline creatinine in mg/dL (numeric)
Baseline hemoglobin in g/dL (numeric)
Serum biomarker A in ng/mL (numeric)
Serum biomarker B in U/L (numeric)
Enrolling site (factor: Site Alpha through Site Kappa)
Tumor grade (factor: Well/Moderately/Poorly differentiated)
Disease stage at diagnosis (factor: I, II, III, IV)
Randomized treatment (factor: Control, Drug A, Drug B)
Surgical resection (factor: No, Yes)
Any post-operative complication (factor: No, Yes)
Post-operative wound infection (factor: No, Yes)
ICU admission required (factor: No, Yes)
Hospital readmission within 30 days (factor: No, Yes)
Pain score at discharge (numeric: 0-10)
Days to functional recovery (numeric)
Hospital length of stay in days (numeric)
Adverse event count (integer). Overdispersed count suitable for negative binomial or quasipoisson regression.
Follow-up visit count (integer). Equidispersed count suitable for standard Poisson regression.
Progression-Free Survival Time (months)
Progression or Death Event
Overall survival time in months (numeric)
Death indicator (numeric: 0=censored, 1=death)
This dataset includes realistic correlations between variables:
- Survival is worse with higher stage, ECOG, age, and biomarker_x
- Treatment effects show Drug B > Drug A > Control
- ae_count is overdispersed (variance > mean) for negative binomial demos
- fu_count is equidispersed (variance mean) for Poisson demos
- Approximately 2% of values are missing at random
- Median follow-up is approximately 30 months
Simulated data for demonstration purposes
Other sample data:
clintrial_labels
data(clintrial) data(clintrial_labels) # Descriptive statistics by treatment arm desctable(clintrial, by = "treatment", variables = c("age", "sex", "stage", "ecog", "biomarker_x", "Surv(os_months, os_status)"), labels = clintrial_labels) # Poisson regression for equidispersed counts fit(clintrial, outcome = "fu_count", predictors = c("age", "stage", "treatment"), model_type = "glm", family = "poisson", labels = clintrial_labels) # Negative binomial for overdispersed counts fit(clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes"), model_type = "negbin", labels = clintrial_labels) # Complete analysis pipeline fullfit(clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "stage", "grade", "ecog", "smoking", "biomarker_x", "biomarker_y", "treatment"), method = "screen", p_threshold = 0.20, model_type = "coxph", labels = clintrial_labels)data(clintrial) data(clintrial_labels) # Descriptive statistics by treatment arm desctable(clintrial, by = "treatment", variables = c("age", "sex", "stage", "ecog", "biomarker_x", "Surv(os_months, os_status)"), labels = clintrial_labels) # Poisson regression for equidispersed counts fit(clintrial, outcome = "fu_count", predictors = c("age", "stage", "treatment"), model_type = "glm", family = "poisson", labels = clintrial_labels) # Negative binomial for overdispersed counts fit(clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes"), model_type = "negbin", labels = clintrial_labels) # Complete analysis pipeline fullfit(clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "stage", "grade", "ecog", "smoking", "biomarker_x", "biomarker_y", "treatment"), method = "screen", p_threshold = 0.20, model_type = "coxph", labels = clintrial_labels)
A named character vector providing descriptive labels for all variables in the clinical_trial dataset. Use with labels parameter in functions.
clintrial_labelsclintrial_labels
Named character vector with 24 elements
Other sample data:
clintrial
Fits multiple regression models and provides a comprehensive comparison table with model quality metrics, convergence diagnostics, and selection guidance. Computes a composite score combining multiple quality metrics to facilitate rapid model comparison and selection.
compfit( data, outcome, model_list, model_names = NULL, interactions_list = NULL, random = NULL, model_type = "auto", family = "binomial", conf_level = 0.95, p_digits = 3, include_coefficients = FALSE, scoring_weights = NULL, labels = NULL, number_format = NULL, verbose = NULL, ... )compfit( data, outcome, model_list, model_names = NULL, interactions_list = NULL, random = NULL, model_type = "auto", family = "binomial", conf_level = 0.95, p_digits = 3, include_coefficients = FALSE, scoring_weights = NULL, labels = NULL, number_format = NULL, verbose = NULL, ... )
data |
Data frame or data.table containing the dataset. |
outcome |
Character string specifying the outcome variable. For survival
analysis, use |
model_list |
List of character vectors, each containing predictor names for one model. Can also be a single character vector to auto-generate nested models. |
model_names |
Character vector of names for each model. If |
interactions_list |
List of character vectors specifying interaction
terms for each model. Each element corresponds to one model in model_list.
Use |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
model_type |
Character string specifying model type. If
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Common options include:
For negative binomial, use |
conf_level |
Numeric confidence level for intervals. Default is 0.95. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
include_coefficients |
Logical. If TRUE, includes a second table with coefficient estimates. Default is FALSE. |
scoring_weights |
Named list of scoring weights. Each weight should be
between 0 and 1, and they should sum to 1. Available metrics depend on model
type. If |
labels |
Named character vector providing custom display labels for
variables. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to model fitting functions. |
This function fits all specified models and computes comprehensive quality metrics for comparison. It generates a Composite Model Score (CMS) that combines multiple metrics: lower AIC/BIC (information criteria), higher concordance (discrimination), and model convergence status.
For GLMs, McFadden's pseudo-R-squared is calculated as 1 - (logLik/logLik_null). For survival models, the global p-value comes from the log-rank test.
Models that fail to converge are flagged and penalized in the composite score.
Interaction Terms:
When interactions_list is provided, each element specifies the
interaction terms for the corresponding model in model_list. This is
particularly useful for testing whether adding interactions improves model fit:
Use NULL for models without interactions
Specify interactions using colon notation: c("age:treatment", "sex:stage")
Main effects for all variables in interactions must be in the predictor list
Common pattern: Compare main effects model vs model with interactions
Scoring weights can be customized based on model type:
GLM: "convergence", "aic", "concordance", "pseudo_r2", "brier"
Cox: "convergence", "aic", "concordance", "global_p"
Linear: "convergence", "aic", "pseudo_r2", "rmse"
Default weights emphasize discrimination (concordance) and model fit (AIC).
The composite score is designed as a tool to quickly rank models by their quality metrics. It should be used alongside traditional model selection criteria rather than as a definitive model selection method.
A data.table with class "compfit_result" containing:
Model name/identifier
Composite Model Score for model selection (higher is better)
Sample size
Number of events (for survival/logistic)
Number of predictors
Whether model converged properly
Akaike Information Criterion
Bayesian Information Criterion
/ Pseudo-R
McFadden pseudo-R-squared (GLM)
C-statistic (logistic/survival)
Brier accuracy score (logistic)
Overall model p-value
Attributes include:
List of fitted model objects
Coefficient comparison table (if requested)
Name of recommended model
fit for individual model fitting,
fullfit for automated variable selection,
table2pdf for exporting results
Other regression functions:
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
# Load example data data(clintrial) data(clintrial_labels) # Example 1: Compare nested logistic regression models models <- list( base = c("age", "sex"), clinical = c("age", "sex", "smoking", "diabetes"), full = c("age", "sex", "smoking", "diabetes", "stage", "ecog") ) comparison <- compfit( data = clintrial, outcome = "os_status", model_list = models, model_names = c("Base", "Clinical", "Full") ) comparison # Example 2: Compare Cox survival models library(survival) surv_models <- list( simple = c("age", "sex"), clinical = c("age", "sex", "stage", "grade") ) surv_comparison <- compfit( data = clintrial, outcome = "Surv(os_months, os_status)", model_list = surv_models, model_type = "coxph" ) surv_comparison # Example 3: Test effect of adding interaction terms interaction_models <- list( main = c("age", "treatment", "sex"), interact = c("age", "treatment", "sex") ) interaction_comp <- compfit( data = clintrial, outcome = "os_status", model_list = interaction_models, model_names = c("Main Effects", "With Interaction"), interactions_list = list( NULL, c("treatment:sex") ) ) interaction_comp # Example 4: Include coefficient comparison table detailed <- compfit( data = clintrial, outcome = "os_status", model_list = models, include_coefficients = TRUE, labels = clintrial_labels ) # Access coefficient table coef_table <- attr(detailed, "coefficients") coef_table # Example 5: Access fitted model objects fitted_models <- attr(comparison, "models") names(fitted_models) # Example 6: Get best model recommendation best <- attr(comparison, "best_model") cat("Recommended model:", best, "\n")# Load example data data(clintrial) data(clintrial_labels) # Example 1: Compare nested logistic regression models models <- list( base = c("age", "sex"), clinical = c("age", "sex", "smoking", "diabetes"), full = c("age", "sex", "smoking", "diabetes", "stage", "ecog") ) comparison <- compfit( data = clintrial, outcome = "os_status", model_list = models, model_names = c("Base", "Clinical", "Full") ) comparison # Example 2: Compare Cox survival models library(survival) surv_models <- list( simple = c("age", "sex"), clinical = c("age", "sex", "stage", "grade") ) surv_comparison <- compfit( data = clintrial, outcome = "Surv(os_months, os_status)", model_list = surv_models, model_type = "coxph" ) surv_comparison # Example 3: Test effect of adding interaction terms interaction_models <- list( main = c("age", "treatment", "sex"), interact = c("age", "treatment", "sex") ) interaction_comp <- compfit( data = clintrial, outcome = "os_status", model_list = interaction_models, model_names = c("Main Effects", "With Interaction"), interactions_list = list( NULL, c("treatment:sex") ) ) interaction_comp # Example 4: Include coefficient comparison table detailed <- compfit( data = clintrial, outcome = "os_status", model_list = models, include_coefficients = TRUE, labels = clintrial_labels ) # Access coefficient table coef_table <- attr(detailed, "coefficients") coef_table # Example 5: Access fitted model objects fitted_models <- attr(comparison, "models") names(fitted_models) # Example 6: Get best model recommendation best <- attr(comparison, "best_model") cat("Recommended model:", best, "\n")
Generates a publication-ready forest plot that combines a formatted data table with a graphical representation of hazard ratios from a Cox proportional hazards survival model. The plot integrates variable names, group levels, sample sizes, event counts, hazard ratios with confidence intervals, p-values, and model diagnostics in a single comprehensive visualization designed for manuscripts and presentations.
coxforest( x, data = NULL, title = "Cox Proportional Hazards Model", effect_label = "Hazard Ratio", digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, show_events = TRUE, indent_groups = FALSE, condense_table = FALSE, bold_variables = FALSE, center_padding = 4, zebra_stripes = TRUE, ref_label = "reference", labels = NULL, color = "#8A61D8", qc_footer = TRUE, units = "in", number_format = NULL )coxforest( x, data = NULL, title = "Cox Proportional Hazards Model", effect_label = "Hazard Ratio", digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, show_events = TRUE, indent_groups = FALSE, condense_table = FALSE, bold_variables = FALSE, center_padding = 4, zebra_stripes = TRUE, ref_label = "reference", labels = NULL, color = "#8A61D8", qc_footer = TRUE, units = "in", number_format = NULL )
x |
Either a fitted Cox model object (class |
data |
Data frame or data.table containing the original data used to
fit the model. If |
title |
Character string specifying the plot title displayed at the top.
Default is |
effect_label |
Character string for the effect measure label on the
forest plot axis. Default is |
digits |
Integer specifying the number of decimal places for hazard ratios and confidence intervals. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Default is 1.0. |
annot_size |
Numeric value controlling the relative font size for data annotations. Default is 3.88. |
header_size |
Numeric value controlling the relative font size for column headers. Default is 5.82. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of total plot width allocated to the data table. Default is 0.6 (60% table, 40% forest plot). |
show_n |
Logical. If |
show_events |
Logical. If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying horizontal spacing between table and forest plot. Default is 4. |
zebra_stripes |
Logical. If |
ref_label |
Character string to display for reference categories.
Default is |
labels |
Named character vector providing custom display labels for
variables. Example: |
color |
Character string specifying the color for hazard ratio point
estimates in the forest plot. Default is |
qc_footer |
Logical. If |
units |
Character string specifying units for plot dimensions:
|
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Survival-Specific Features:
The Cox forest plot includes several survival analysis-specific components:
Event counts: Number of events (deaths, failures) shown for each predictor category, critical for assessing statistical power
Hazard ratios: Always exponentiated coefficients (never raw), interpreted as the multiplicative change in hazard
Log scale: Forest plot uses log scale for HR (reference line at 1)
Model diagnostics: Includes concordance (C-index), global log-rank test p-value, and AIC
Plot Components:
Title: Centered at top
Data Table (left): Contains:
Variable and Group columns
n: Sample sizes by group
Events: Event counts by group (critical for survival)
aHR (95% CI); p-value: Adjusted hazard ratios with CIs and p-values
Forest Plot (right):
Point estimates (squares sized by sample size)
95% confidence intervals
Reference line at HR = 1
Log scale for hazard ratios
Model Statistics (footer):
Events analyzed (with percentage of total)
Global log-rank test p-value
Concordance (C-index) with standard error
AIC
Interpreting Hazard Ratios:
HR = 1: No effect on hazard (reference)
HR > 1: Increased hazard (worse survival)
HR < 1: Decreased hazard (better survival)
Example: HR = 2.0 means twice the hazard of the event at any time
Event Counts:
The "Events" column is particularly important in survival analysis:
Indicates the number of actual events (not censored observations) in each group
Essential for assessing statistical power
Categories with very few events may have unreliable HR estimates
The footer shows total events analyzed and percentage of all events in the original data
Concordance (C-index):
The concordance statistic displayed in the footer indicates discrimination:
Range: 0.5 to 1.0
0.5 = random prediction (coin flip)
0.7-0.8 = acceptable discrimination
> 0.8 = excellent discrimination
Standard error provided for confidence interval calculation
Global Log-Rank Test:
The global p-value tests the null hypothesis that all coefficients are zero:
Significant p-value (< 0.05) indicates the model as a whole predicts survival
Non-significant global test doesn't preclude significant individual predictors
Based on the score (log-rank) test
Stratification and Clustering:
If the model includes stratification (strata()) or clustering
(cluster()):
Stratified variables are not shown in the forest plot (they don't have HRs)
Clustering affects standard errors but not point estimates
Both are handled automatically by the function
Proportional Hazards Assumption:
The forest plot assumes proportional hazards (constant HR over time). Users should verify this assumption using:
cox.zph(model) for testing
Stratification for variables violating the assumption
Time-dependent coefficients if needed
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly: print(plot)
Saved to file: ggsave("forest.pdf", plot, width = 12, height = 8)
Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
Numeric. Recommended plot width in specified units
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
autoforest for automatic model detection,
glmforest for logistic/GLM forest plots,
lmforest for linear model forest plots,
uniforest for univariable screening forest plots,
multiforest for multi-outcome forest plots,
coxph for fitting Cox models,
fit for regression modeling
Other visualization functions:
autoforest(),
glmforest(),
lmforest(),
multiforest(),
uniforest()
data(clintrial) data(clintrial_labels) library(survival) # Create example model model1 <- coxph( survival::Surv(os_months, os_status) ~ age + sex + treatment, data = clintrial) # Example 1: Basic Cox model forest plot p <- coxforest(model1, data = clintrial) old_width <- options(width = 180) # Example 2: With custom labels and title plot2 <- coxforest( x = model1, data = clintrial, title = "Prognostic Factors for Overall Survival", labels = clintrial_labels ) # Example 3: Comprehensive model with indented layout model3 <- coxph( Surv(os_months, os_status) ~ age + sex + bmi + smoking + treatment + stage + grade, data = clintrial ) plot3 <- coxforest( x = model3, data = clintrial, labels = clintrial_labels, indent_groups = TRUE, zebra_stripes = TRUE ) # Example 4: Condensed layout for many binary predictors model4 <- coxph( Surv(os_months, os_status) ~ age + sex + smoking + hypertension + diabetes + surgery, data = clintrial ) plot4 <- coxforest( x = model4, data = clintrial, condense_table = TRUE, labels = clintrial_labels ) # Example 5: Stratified Cox model model5 <- coxph( Surv(os_months, os_status) ~ age + sex + treatment + strata(site), data = clintrial ) plot5 <- coxforest( x = model5, data = clintrial, title = "Stratified by Study Site", labels = clintrial_labels ) # Example 6: Save with recommended dimensions dims <- attr(plot5, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "survival_forest.pdf"), plot5, width = dims$width, height = dims$height) options(old_width)data(clintrial) data(clintrial_labels) library(survival) # Create example model model1 <- coxph( survival::Surv(os_months, os_status) ~ age + sex + treatment, data = clintrial) # Example 1: Basic Cox model forest plot p <- coxforest(model1, data = clintrial) old_width <- options(width = 180) # Example 2: With custom labels and title plot2 <- coxforest( x = model1, data = clintrial, title = "Prognostic Factors for Overall Survival", labels = clintrial_labels ) # Example 3: Comprehensive model with indented layout model3 <- coxph( Surv(os_months, os_status) ~ age + sex + bmi + smoking + treatment + stage + grade, data = clintrial ) plot3 <- coxforest( x = model3, data = clintrial, labels = clintrial_labels, indent_groups = TRUE, zebra_stripes = TRUE ) # Example 4: Condensed layout for many binary predictors model4 <- coxph( Surv(os_months, os_status) ~ age + sex + smoking + hypertension + diabetes + surgery, data = clintrial ) plot4 <- coxforest( x = model4, data = clintrial, condense_table = TRUE, labels = clintrial_labels ) # Example 5: Stratified Cox model model5 <- coxph( Surv(os_months, os_status) ~ age + sex + treatment + strata(site), data = clintrial ) plot5 <- coxforest( x = model5, data = clintrial, title = "Stratified by Study Site", labels = clintrial_labels ) # Example 6: Save with recommended dimensions dims <- attr(plot5, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "survival_forest.pdf"), plot5, width = dims$width, height = dims$height) options(old_width)
Generates comprehensive descriptive statistics tables with automatic variable type detection, group comparisons, and appropriate statistical testing. This function is designed to create "Table 1"-style summaries commonly used in clinical and epidemiological research, with full support for continuous, categorical, and time-to-event variables.
desctable( data, by = NULL, variables, stats_continuous = c("median_iqr"), stats_categorical = "n_percent", digits = 1, p_digits = 3, conf_level = 0.95, p_per_stat = FALSE, na_include = FALSE, na_label = "Unknown", na_percent = FALSE, test = TRUE, test_continuous = "auto", test_categorical = "auto", total = TRUE, total_label = "Total", labels = NULL, number_format = NULL, ... )desctable( data, by = NULL, variables, stats_continuous = c("median_iqr"), stats_categorical = "n_percent", digits = 1, p_digits = 3, conf_level = 0.95, p_per_stat = FALSE, na_include = FALSE, na_label = "Unknown", na_percent = FALSE, test = TRUE, test_continuous = "auto", test_categorical = "auto", total = TRUE, total_label = "Total", labels = NULL, number_format = NULL, ... )
data |
Data frame or data.table containing the dataset to summarize. Automatically converted to a data.table for efficient processing. |
by |
Character string specifying the column name of the grouping
variable for stratified analysis (e.g., treatment arm, exposure
status). When |
variables |
Character vector of variable names to summarize. Can
include standard column names for continuous or categorical variables,
and survival expressions using |
stats_continuous |
Character vector specifying which statistics to compute for continuous variables. Multiple values create separate rows for each variable. Options:
Default is |
stats_categorical |
Character string specifying the format for categorical variable summaries:
|
digits |
Integer specifying the number of decimal places for continuous statistics. Default is 1. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals in survival variable summaries (median survival time with CI). Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
p_per_stat |
Logical. If |
na_include |
Logical. If |
na_label |
Character string used to label the missing values row when
|
na_percent |
Logical. Controls how percentages are calculated for
categorical variables when
Only affects categorical variables. Default is |
test |
Logical. If |
test_continuous |
Character string specifying the statistical test for continuous variables:
|
test_categorical |
Character string specifying the statistical test for categorical variables:
|
total |
Logical or character string controlling the total column:
|
total_label |
Character string for the total column header.
Default is |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names (or |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
... |
Additional arguments passed to the underlying statistical test
functions (e.g., |
Variable Type Detection:
The function automatically detects variable types and applies appropriate summaries:
Continuous: Numeric variables (integer or double) receive
statistics specified in stats_continuous
Categorical: Character, factor, or logical variables receive frequency counts and percentages
Time-to-Event: Variables specified as
Surv(time, event) display median survival with confidence
intervals (level controlled by conf_level)
Statistical Testing:
When test = TRUE and by is specified:
Continuous with "auto": Parametric tests (t-test, ANOVA) for mean-based statistics; non-parametric tests (Wilcoxon, Kruskal-Wallis) for median-based statistics
Categorical with "auto": Fisher exact test when any
expected cell frequency < 5; test otherwise
Survival: Log-rank test for comparing survival curves
Range statistics: No p-value computed (ranges are descriptive)
Missing Data Handling:
Missing values are handled differently by variable type:
Continuous: NAs excluded from calculations; optionally
shown as count when na_include = TRUE
Categorical: NAs can be included as a category when
na_include = TRUE. The na_percent parameter controls
whether percentages are calculated with or without NAs in the
denominator
Survival: NAs in time or event excluded from analysis
Formatting Conventions:
All numeric output respects the number_format parameter. Separators
within ranges and confidence intervals adapt automatically to avoid
ambiguity:
Mean SD: "45.2 \eqn{\pm} 12.3" (US) or
"45,2 \eqn{\pm} 12,3" (EU)
Median [IQR]: "38.0 [28.0-52.0]" (US) or
"38,0 [28,0-52,0]" (EU, en-dash separator)
Range: "18.0-75.0" (positive, US),
"-5.0 to 10.0" (when bounds are negative)
Survival: "24.5 (21.2-28.9)" (US) or
"24,5 (21,2-28,9)" (EU)
Counts 1000: "1,234" (US) or "1.234" (EU)
p-values: "< 0.001" (US) or "< 0,001" (EU)
A data.table with S3 class "desctable" containing formatted
descriptive statistics. The table structure includes:
Variable name or label (from labels)
For continuous variables: statistic type (e.g.,
"Mean SD", "Median [IQR]"). For categorical variables:
category level. Empty for variable name rows.
Statistics for the total sample (if
total = TRUE)
Statistics for each group level (when by
is specified). Column names match group levels.
Formatted p-values from statistical tests
(when test = TRUE and by is specified)
The first row always shows sample sizes for each column. All numeric
output (counts, statistics, p-values) respects the
number_format setting for locale-appropriate formatting.
The returned object includes the following attributes accessible via
attr():
A data.table containing unformatted numeric values suitable for further statistical analysis or custom formatting. Includes additional columns for standard deviations, quartiles, etc.
The grouping variable name used (value of
by)
The variables analyzed (value of
variables)
survtable for detailed survival summary tables,
fit for regression modeling,
table2pdf for PDF export,
table2docx for Word export,
table2html for HTML export
Other descriptive functions:
print.survtable(),
survtable()
# Load example clinical trial data data(clintrial) # Example 1: Basic descriptive table without grouping desctable(clintrial, variables = c("age", "sex", "bmi")) # Example 2: Grouped comparison with default tests desctable(clintrial, by = "treatment", variables = c("age", "sex", "race", "bmi")) # Example 3: Customize continuous statistics desctable(clintrial, by = "treatment", variables = c("age", "bmi", "creatinine"), stats_continuous = c("median_iqr", "range")) # Example 4: Change categorical display format desctable(clintrial, by = "treatment", variables = c("sex", "race", "smoking"), stats_categorical = "n") # Show counts only # Example 5: Include missing values desctable(clintrial, by = "treatment", variables = c("age", "smoking", "hypertension"), na_include = TRUE, na_label = "Missing") # Example 6: Disable statistical testing desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), test = FALSE) # Example 7: Force specific tests desctable(clintrial, by = "surgery", variables = c("age", "sex"), test_continuous = "t", # t-test instead of auto test_categorical = "fisher") # Fisher test instead of auto # Example 8: Adjust decimal places desctable(clintrial, by = "treatment", variables = c("age", "bmi"), digits = 2, # 2 decimals for continuous p_digits = 4) # 4 decimals for p-values # Example 9: Custom variable labels labels <- c( age = "Age (years)", sex = "Sex", bmi = "Body Mass Index (kg/m\u00b2)", treatment = "Treatment Arm" ) desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), labels = labels) # Example 10: Position total column last desctable(clintrial, by = "treatment", variables = c("age", "sex"), total = "last") # Example 11: Exclude total column desctable(clintrial, by = "treatment", variables = c("age", "sex"), total = FALSE) # Example 12: Survival analysis desctable(clintrial, by = "treatment", variables = "Surv(os_months, os_status)") # Example 13: Multiple survival endpoints desctable(clintrial, by = "treatment", variables = c( "Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)" ), labels = c( "Surv(pfs_months, pfs_status)" = "Progression-Free Survival", "Surv(os_months, os_status)" = "Overall Survival" )) # Example 14: Mixed variable types desctable(clintrial, by = "treatment", variables = c( "age", "sex", "race", # Demographics "bmi", "creatinine", # Labs "smoking", "hypertension", # Risk factors "Surv(os_months, os_status)" # Survival )) # Example 15: Three or more groups desctable(clintrial, by = "stage", # Assuming stage has 3+ levels variables = c("age", "sex", "bmi")) # Automatically uses ANOVA/Kruskal-Wallis and chi-squared # Example 16: Access raw unformatted data result <- desctable(clintrial, by = "treatment", variables = c("age", "bmi")) raw_data <- attr(result, "raw_data") print(raw_data) # Raw data includes unformatted numbers, SDs, quartiles, etc. # Example 17: Check which grouping variable was used result <- desctable(clintrial, by = "treatment", variables = c("age", "sex")) attr(result, "by_variable") # "treatment" # Example 18: NA percentage calculation options # Include NAs in percentage denominator (all sum to 100%) desctable(clintrial, by = "treatment", variables = "smoking", na_include = TRUE, na_percent = TRUE) # Exclude NAs from denominator (non-missing sum to 100%) desctable(clintrial, by = "treatment", variables = "smoking", na_include = TRUE, na_percent = FALSE) # Example 19: Passing additional test arguments # Equal variance t-test desctable(clintrial, by = "sex", variables = "age", test_continuous = "t", var.equal = TRUE) # Example 20: European number formatting desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), number_format = "eu") # Example 21: Complete Table 1 for publication table1 <- desctable( data = clintrial, by = "treatment", variables = c( "age", "sex", "race", "ethnicity", "bmi", "smoking", "hypertension", "diabetes", "ecog", "creatinine", "hemoglobin", "site", "stage", "grade", "Surv(os_months, os_status)" ), labels = clintrial_labels, stats_continuous = c("median_iqr", "range"), total = TRUE, na_include = FALSE ) print(table1)# Load example clinical trial data data(clintrial) # Example 1: Basic descriptive table without grouping desctable(clintrial, variables = c("age", "sex", "bmi")) # Example 2: Grouped comparison with default tests desctable(clintrial, by = "treatment", variables = c("age", "sex", "race", "bmi")) # Example 3: Customize continuous statistics desctable(clintrial, by = "treatment", variables = c("age", "bmi", "creatinine"), stats_continuous = c("median_iqr", "range")) # Example 4: Change categorical display format desctable(clintrial, by = "treatment", variables = c("sex", "race", "smoking"), stats_categorical = "n") # Show counts only # Example 5: Include missing values desctable(clintrial, by = "treatment", variables = c("age", "smoking", "hypertension"), na_include = TRUE, na_label = "Missing") # Example 6: Disable statistical testing desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), test = FALSE) # Example 7: Force specific tests desctable(clintrial, by = "surgery", variables = c("age", "sex"), test_continuous = "t", # t-test instead of auto test_categorical = "fisher") # Fisher test instead of auto # Example 8: Adjust decimal places desctable(clintrial, by = "treatment", variables = c("age", "bmi"), digits = 2, # 2 decimals for continuous p_digits = 4) # 4 decimals for p-values # Example 9: Custom variable labels labels <- c( age = "Age (years)", sex = "Sex", bmi = "Body Mass Index (kg/m\u00b2)", treatment = "Treatment Arm" ) desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), labels = labels) # Example 10: Position total column last desctable(clintrial, by = "treatment", variables = c("age", "sex"), total = "last") # Example 11: Exclude total column desctable(clintrial, by = "treatment", variables = c("age", "sex"), total = FALSE) # Example 12: Survival analysis desctable(clintrial, by = "treatment", variables = "Surv(os_months, os_status)") # Example 13: Multiple survival endpoints desctable(clintrial, by = "treatment", variables = c( "Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)" ), labels = c( "Surv(pfs_months, pfs_status)" = "Progression-Free Survival", "Surv(os_months, os_status)" = "Overall Survival" )) # Example 14: Mixed variable types desctable(clintrial, by = "treatment", variables = c( "age", "sex", "race", # Demographics "bmi", "creatinine", # Labs "smoking", "hypertension", # Risk factors "Surv(os_months, os_status)" # Survival )) # Example 15: Three or more groups desctable(clintrial, by = "stage", # Assuming stage has 3+ levels variables = c("age", "sex", "bmi")) # Automatically uses ANOVA/Kruskal-Wallis and chi-squared # Example 16: Access raw unformatted data result <- desctable(clintrial, by = "treatment", variables = c("age", "bmi")) raw_data <- attr(result, "raw_data") print(raw_data) # Raw data includes unformatted numbers, SDs, quartiles, etc. # Example 17: Check which grouping variable was used result <- desctable(clintrial, by = "treatment", variables = c("age", "sex")) attr(result, "by_variable") # "treatment" # Example 18: NA percentage calculation options # Include NAs in percentage denominator (all sum to 100%) desctable(clintrial, by = "treatment", variables = "smoking", na_include = TRUE, na_percent = TRUE) # Exclude NAs from denominator (non-missing sum to 100%) desctable(clintrial, by = "treatment", variables = "smoking", na_include = TRUE, na_percent = FALSE) # Example 19: Passing additional test arguments # Equal variance t-test desctable(clintrial, by = "sex", variables = "age", test_continuous = "t", var.equal = TRUE) # Example 20: European number formatting desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), number_format = "eu") # Example 21: Complete Table 1 for publication table1 <- desctable( data = clintrial, by = "treatment", variables = c( "age", "sex", "race", "ethnicity", "bmi", "smoking", "hypertension", "diabetes", "ecog", "creatinine", "hemoglobin", "site", "stage", "grade", "Surv(os_months, os_status)" ), labels = clintrial_labels, stats_continuous = c("median_iqr", "range"), total = TRUE, na_include = FALSE ) print(table1)
Provides a unified interface for fitting various types of regression models with automatic formatting of results for publication. Supports generalized linear models, linear models, survival models, and mixed-effects models with consistent syntax and output formatting. Handles both univariable and multivariable models automatically.
fit( data = NULL, outcome = NULL, predictors = NULL, model = NULL, model_type = "glm", family = "binomial", random = NULL, interactions = NULL, strata = NULL, cluster = NULL, weights = NULL, conf_level = 0.95, reference_rows = TRUE, show_n = TRUE, show_events = TRUE, digits = 2, p_digits = 3, labels = NULL, keep_qc_stats = TRUE, exponentiate = NULL, conf_method = NULL, number_format = NULL, verbose = NULL, ... )fit( data = NULL, outcome = NULL, predictors = NULL, model = NULL, model_type = "glm", family = "binomial", random = NULL, interactions = NULL, strata = NULL, cluster = NULL, weights = NULL, conf_level = 0.95, reference_rows = TRUE, show_n = TRUE, show_events = TRUE, digits = 2, p_digits = 3, labels = NULL, keep_qc_stats = TRUE, exponentiate = NULL, conf_method = NULL, number_format = NULL, verbose = NULL, ... )
data |
Data frame or data.table containing the analysis dataset. Required for formula-based workflow; optional for model-based workflow (extracted from model if not provided). |
outcome |
Character string specifying the outcome variable name. For
survival analysis, use |
predictors |
Character vector of predictor variable names to include in
the model. All predictors are included simultaneously (multivariable model).
For univariable models, provide a single predictor. Can include continuous,
categorical (factor), or binary variables. Required for formula-based
workflow; ignored if |
model |
Optional pre-fitted model object to format. When provided,
|
model_type |
Character string specifying the type of regression model.
Ignored if
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
interactions |
Character vector of interaction terms using colon
notation (e.g., |
strata |
For Cox or conditional logistic models, character string naming
the stratification variable. Creates separate baseline hazards for each
stratum level without estimating stratum effects. Default is |
cluster |
For Cox models, character string naming the variable for
robust clustered standard errors. Accounts for within-cluster correlation
(e.g., patients within hospitals). Default is |
weights |
Character string naming the weights variable in |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
reference_rows |
Logical. If |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names, values are display
labels. Default is |
keep_qc_stats |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients. Default
is |
conf_method |
Character string controlling the confidence interval method.
If
Cox and mixed-effects models use Wald intervals regardless of this setting. |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting
function ( |
Model Scope Detection:
The function automatically detects whether the model is:
Univariable: Single predictor (e.g., predictors = "age").
Effect estimates are labeled as unadjusted ("OR", "HR", etc.), representing
crude (unadjusted) association
Multivariable: Multiple predictors (e.g.,
predictors = c("age", "sex", "treatment"))
Effect estimates are labeled as adjusted ("aOR", "aHR", etc.), representing
associations adjusted for confounding
Interaction Terms:
Interactions are specified using colon notation and added to the model:
interactions = c("age:treatment") creates interaction
between age and treatment
Main effects for both variables are automatically included
Multiple interactions can be specified:
c("age:sex", "treatment:stage")
For interactions between categorical variables, separate terms are created for each combination of levels
Stratification (Cox/Conditional Logistic):
The strata parameter creates separate baseline hazards:
Allows baseline hazard to vary across strata without estimating stratum effects
Useful when proportional hazards assumption violated across strata
Example: strata = "center" for multicenter studies
Stratification variable is not included as a predictor
Clustering (Cox Models):
The cluster parameter computes robust standard errors:
Accounts for within-cluster correlation (e.g., multiple observations per patient)
Uses sandwich variance estimator
Does not change point estimates, only standard errors and p-values
Weighting:
The weights parameter enables weighted regression:
For survey data with sampling weights
Inverse probability weighting for causal inference
Frequency weights for aggregated data
Weights should be in a column of data
Mixed-Effects Models (lmer/glmer/coxme):
Mixed effects models handle hierarchical or clustered data:
Use model_type = "lmer" for continuous/normal outcomes
Use model_type = "glmer" with appropriate family for GLM outcomes
Use model_type = "coxme" for survival outcomes with clustering
Random effects are specified in predictors using lme4 syntax:
"(1|site)" - Random intercepts by site
"(treatment|site)" - Random slopes for treatment by site
"(1 + treatment|site)" - Both random intercepts and slopes
Include random effects as part of the predictors vector
Example: predictors = c("age", "treatment", "(1|site)")
Effect Measures by Model Type:
Logistic (family = "binomial"/"quasibinomial"): Odds ratios (OR/aOR)
Cox (model_type = "coxph"): Hazard ratios (HR/aHR)
Poisson/Count (family = "poisson"/"quasipoisson"): Rate ratios (RR/aRR)
Negative binomial (model_type = "negbin"): Rate ratios (RR/aRR)
Gamma/Log-link: Ratios (multiplicative effects)
Linear/Gaussian: Raw coefficient estimates (additive effects)
Confidence Intervals:
Confidence interval computation is tailored to each model class using the best available method:
GLM and negative binomial: Profile likelihood intervals via
MASS::confint.glm(), which invert the profile deviance and account
for asymmetry in the likelihood surface. More accurate than the Wald
approximation when subgroup sizes are small or estimates are near boundary
values. Quasi-likelihood families (quasibinomial, quasipoisson)
fall back to Wald intervals because they lack a true likelihood function.
Linear models: Exact t-distribution intervals via
confint.lm(), based on the known sampling distribution under
normality.
Cox proportional hazards: Wald intervals (i.e.,
coefficient z SE), the standard approach in
the survival analysis literature.
Mixed-effects models (lmer, glmer, coxme): Wald intervals.
Profile likelihood is available for lme4 models via
confint(model, method = "profile") but can be prohibitively slow
for complex random-effects structures and is not used by default.
If profile likelihood computation fails for any reason (e.g., non-convergence during profiling), the function falls back silently to Wald intervals.
A data.table with S3 class "fit_result" containing formatted
regression results. The table structure includes:
Character. Predictor name or custom label
Character. For factor variables: category level. For interactions: interaction term. For continuous: typically empty
Integer. Total sample size (if show_n = TRUE)
Integer. Sample size for this factor level
Integer. Total number of events (if show_events = TRUE)
Integer. Events for this factor level
Character. Formatted effect estimate with confidence interval. Column name depends on model type and scope. Univariable models use: OR, HR, RR, Coefficient. Multivariable models use adjusted notation: aOR, aHR, aRR, Adj. Coefficient
Character. Formatted p-value from Wald test
The returned object includes the following attributes accessible via attr():
The fitted model object (glm, lm, coxph, etc.). Access for diagnostics, predictions, or further analysis
data.table. Unformatted numeric results with columns for coefficients, standard errors, confidence bounds, quality statistics, etc.
Character. The outcome variable name
Character vector. The predictor variable names
Character. The complete model formula as a string
Character. "Univariable" (one predictor) or "Multivariable" (multiple predictors)
Character. The regression model type used
Character vector (if interactions specified). The interaction terms included
Character (if stratification used). The stratification variable
Character (if clustering used). The cluster variable
Character (if weighting used). The weights variable
Character vector. Names of predictors with p-value below 0.05, suitable for downstream variable selection workflows
uniscreen for univariable screening of multiple predictors,
fullfit for complete univariable-to-multivariable workflow,
compfit for comparing multiple models,
m2dt for model-to-table conversion
Other regression functions:
compfit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
# Load example data data(clintrial) data(clintrial_labels) library(survival) # Example 1: Univariable logistic regression uni_model <- fit( data = clintrial, outcome = "os_status", predictors = "age" ) print(uni_model) # Labeled as "Univariable OR" # Example 2: Multivariable logistic regression multi_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "treatment"), labels = clintrial_labels ) print(multi_model) # Example 3: Cox proportional hazards model cox_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), model_type = "coxph", labels = clintrial_labels ) print(cox_model) # Example 4: Model with interaction terms interact_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "treatment", "sex"), interactions = c("age:treatment"), labels = clintrial_labels ) print(interact_model) # Example 5: Cox model with stratification strat_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment"), model_type = "coxph", strata = "site", # Separate baseline hazards by site labels = clintrial_labels ) print(strat_model) # Example 6: Cox model with clustering cluster_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "treatment"), model_type = "coxph", cluster = "site", # Robust SEs accounting for site clustering labels = clintrial_labels ) print(cluster_model) # Example 7: Linear regression linear_model <- fit( data = clintrial, outcome = "bmi", predictors = c("age", "sex", "smoking"), model_type = "lm", labels = clintrial_labels ) print(linear_model) # Example 8: Poisson regression for equidispersed count data # fu_count has variance ~= mean, appropriate for standard Poisson poisson_model <- fit( data = clintrial, outcome = "fu_count", predictors = c("age", "stage", "treatment", "surgery"), model_type = "glm", family = "poisson", labels = clintrial_labels ) print(poisson_model) # Returns rate ratios (RR/aRR) # Example 9: Negative binomial regression for overdispersed counts # ae_count has variance > mean (overdispersed), use negbin or quasipoisson if (requireNamespace("MASS", quietly = TRUE)) { nb_result <- fit( data = clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes", "surgery"), model_type = "negbin", labels = clintrial_labels ) print(nb_result) } # Example 10: Gamma regression for positive continuous outcomes gamma_model <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "surgery"), model_type = "glm", family = Gamma(link = "log"), labels = clintrial_labels ) print(gamma_model) # Example 11: Access the underlying fitted model result <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi") ) # Get the model object model_obj <- attr(result, "model") summary(model_obj) # Model diagnostics plot(model_obj) # Predictions preds <- predict(model_obj, type = "response") # Example 12: Access raw numeric data raw_data <- attr(result, "raw_data") print(raw_data) # Contains unformatted coefficients, SEs, CIs, AIC, BIC, etc. # Example 13: Multiple interactions complex_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "bmi"), interactions = c("age:treatment", "sex:bmi"), labels = clintrial_labels ) print(complex_model) # Example 14: Customize output columns minimal <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment"), show_n = FALSE, show_events = FALSE, reference_rows = FALSE ) print(minimal) # Example 15: Different confidence levels ci90 <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "treatment"), conf_level = 0.90 # 90% confidence intervals ) print(ci90) # Example 16: Force coefficient display instead of OR coef_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "bmi"), exponentiate = FALSE # Show log odds instead of OR ) print(coef_model) # Example 17: Confidence interval method # Default: profile likelihood CIs for GLM (more accurate) profile_result <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "treatment"), p_digits = 4, conf_method = "profile" ) print(profile_result) # Wald CIs (faster, suitable for simulation or exploratory work) wald_result <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "treatment"), p_digits = 4, conf_method = "wald" ) print(wald_result) # Example 18: Check model quality statistics result <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), keep_qc_stats = TRUE ) raw <- attr(result, "raw_data") cat("AIC:", raw$AIC[1], "\n") cat("BIC:", raw$BIC[1], "\n") cat("C-statistic:", raw$c_statistic[1], "\n") # Example 19: Interaction effects - treatment effect modified by stage interaction_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "treatment", "stage"), interactions = c("treatment:stage"), model_type = "coxph", labels = clintrial_labels ) print(interaction_model) # Shows main effects plus all treatment×stage interaction terms # Example 20: Multiple interactions in logistic regression multi_interaction <- fit( data = clintrial, outcome = "readmission_30d", predictors = c("age", "sex", "surgery", "diabetes"), interactions = c("surgery:diabetes", "age:sex"), labels = clintrial_labels ) print(multi_interaction) # Example 21: Quasipoisson for overdispersed count data # Alternative to negative binomial when MASS not available quasi_model <- fit( data = clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes", "surgery"), model_type = "glm", family = "quasipoisson", labels = clintrial_labels ) print(quasi_model) # Adjusts standard errors for overdispersion # Example 22: Quasibinomial for overdispersed binary data quasi_logistic <- fit( data = clintrial, outcome = "any_complication", predictors = c("age", "bmi", "diabetes", "surgery"), model_type = "glm", family = "quasibinomial", labels = clintrial_labels ) print(quasi_logistic) # Example 23: Gamma regression with identity link for additive effects gamma_identity <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "surgery", "any_complication"), model_type = "glm", family = Gamma(link = "identity"), labels = clintrial_labels ) print(gamma_identity) # Shows additive effects (coefficients) instead of multiplicative (ratios) # Example 24: Inverse Gaussian regression for highly skewed data inverse_gaussian <- fit( data = clintrial, outcome = "recovery_days", predictors = c("age", "surgery", "pain_score"), model_type = "glm", family = inverse.gaussian(link = "log"), labels = clintrial_labels ) print(inverse_gaussian) # Example 25: Linear mixed effects with random intercepts # Accounts for clustering of patients within sites if (requireNamespace("lme4", quietly = TRUE)) { lmer_model <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "stage", "(1|site)"), model_type = "lmer", labels = clintrial_labels ) print(lmer_model) } # Example 26: Generalized linear mixed effects (logistic with random effects) if (requireNamespace("lme4", quietly = TRUE)) { glmer_model <- fit( data = clintrial, outcome = "readmission_30d", predictors = c("age", "surgery", "los_days", "(1|site)"), model_type = "glmer", family = "binomial", labels = clintrial_labels ) print(glmer_model) } # Example 27: Cox mixed effects for clustered survival data if (requireNamespace("coxme", quietly = TRUE)) { coxme_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "treatment", "stage", "(1|site)"), model_type = "coxme", labels = clintrial_labels ) print(coxme_model) } # Example 28: Random slopes - treatment effect varies by site if (requireNamespace("lme4", quietly = TRUE)) { random_slopes <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "stage", "(treatment|site)"), model_type = "lmer", labels = clintrial_labels ) print(random_slopes) } # Example 29: Format a pre-fitted model (model-based workflow) # Useful for models fitted outside of fit() pre_fitted <- glm(os_status ~ age + sex + treatment, family = binomial, data = clintrial) result <- fit(model = pre_fitted, data = clintrial, labels = clintrial_labels) print(result)# Load example data data(clintrial) data(clintrial_labels) library(survival) # Example 1: Univariable logistic regression uni_model <- fit( data = clintrial, outcome = "os_status", predictors = "age" ) print(uni_model) # Labeled as "Univariable OR" # Example 2: Multivariable logistic regression multi_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "treatment"), labels = clintrial_labels ) print(multi_model) # Example 3: Cox proportional hazards model cox_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), model_type = "coxph", labels = clintrial_labels ) print(cox_model) # Example 4: Model with interaction terms interact_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "treatment", "sex"), interactions = c("age:treatment"), labels = clintrial_labels ) print(interact_model) # Example 5: Cox model with stratification strat_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment"), model_type = "coxph", strata = "site", # Separate baseline hazards by site labels = clintrial_labels ) print(strat_model) # Example 6: Cox model with clustering cluster_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "treatment"), model_type = "coxph", cluster = "site", # Robust SEs accounting for site clustering labels = clintrial_labels ) print(cluster_model) # Example 7: Linear regression linear_model <- fit( data = clintrial, outcome = "bmi", predictors = c("age", "sex", "smoking"), model_type = "lm", labels = clintrial_labels ) print(linear_model) # Example 8: Poisson regression for equidispersed count data # fu_count has variance ~= mean, appropriate for standard Poisson poisson_model <- fit( data = clintrial, outcome = "fu_count", predictors = c("age", "stage", "treatment", "surgery"), model_type = "glm", family = "poisson", labels = clintrial_labels ) print(poisson_model) # Returns rate ratios (RR/aRR) # Example 9: Negative binomial regression for overdispersed counts # ae_count has variance > mean (overdispersed), use negbin or quasipoisson if (requireNamespace("MASS", quietly = TRUE)) { nb_result <- fit( data = clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes", "surgery"), model_type = "negbin", labels = clintrial_labels ) print(nb_result) } # Example 10: Gamma regression for positive continuous outcomes gamma_model <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "surgery"), model_type = "glm", family = Gamma(link = "log"), labels = clintrial_labels ) print(gamma_model) # Example 11: Access the underlying fitted model result <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi") ) # Get the model object model_obj <- attr(result, "model") summary(model_obj) # Model diagnostics plot(model_obj) # Predictions preds <- predict(model_obj, type = "response") # Example 12: Access raw numeric data raw_data <- attr(result, "raw_data") print(raw_data) # Contains unformatted coefficients, SEs, CIs, AIC, BIC, etc. # Example 13: Multiple interactions complex_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "bmi"), interactions = c("age:treatment", "sex:bmi"), labels = clintrial_labels ) print(complex_model) # Example 14: Customize output columns minimal <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment"), show_n = FALSE, show_events = FALSE, reference_rows = FALSE ) print(minimal) # Example 15: Different confidence levels ci90 <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "treatment"), conf_level = 0.90 # 90% confidence intervals ) print(ci90) # Example 16: Force coefficient display instead of OR coef_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "bmi"), exponentiate = FALSE # Show log odds instead of OR ) print(coef_model) # Example 17: Confidence interval method # Default: profile likelihood CIs for GLM (more accurate) profile_result <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "treatment"), p_digits = 4, conf_method = "profile" ) print(profile_result) # Wald CIs (faster, suitable for simulation or exploratory work) wald_result <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "treatment"), p_digits = 4, conf_method = "wald" ) print(wald_result) # Example 18: Check model quality statistics result <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), keep_qc_stats = TRUE ) raw <- attr(result, "raw_data") cat("AIC:", raw$AIC[1], "\n") cat("BIC:", raw$BIC[1], "\n") cat("C-statistic:", raw$c_statistic[1], "\n") # Example 19: Interaction effects - treatment effect modified by stage interaction_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "treatment", "stage"), interactions = c("treatment:stage"), model_type = "coxph", labels = clintrial_labels ) print(interaction_model) # Shows main effects plus all treatment×stage interaction terms # Example 20: Multiple interactions in logistic regression multi_interaction <- fit( data = clintrial, outcome = "readmission_30d", predictors = c("age", "sex", "surgery", "diabetes"), interactions = c("surgery:diabetes", "age:sex"), labels = clintrial_labels ) print(multi_interaction) # Example 21: Quasipoisson for overdispersed count data # Alternative to negative binomial when MASS not available quasi_model <- fit( data = clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes", "surgery"), model_type = "glm", family = "quasipoisson", labels = clintrial_labels ) print(quasi_model) # Adjusts standard errors for overdispersion # Example 22: Quasibinomial for overdispersed binary data quasi_logistic <- fit( data = clintrial, outcome = "any_complication", predictors = c("age", "bmi", "diabetes", "surgery"), model_type = "glm", family = "quasibinomial", labels = clintrial_labels ) print(quasi_logistic) # Example 23: Gamma regression with identity link for additive effects gamma_identity <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "surgery", "any_complication"), model_type = "glm", family = Gamma(link = "identity"), labels = clintrial_labels ) print(gamma_identity) # Shows additive effects (coefficients) instead of multiplicative (ratios) # Example 24: Inverse Gaussian regression for highly skewed data inverse_gaussian <- fit( data = clintrial, outcome = "recovery_days", predictors = c("age", "surgery", "pain_score"), model_type = "glm", family = inverse.gaussian(link = "log"), labels = clintrial_labels ) print(inverse_gaussian) # Example 25: Linear mixed effects with random intercepts # Accounts for clustering of patients within sites if (requireNamespace("lme4", quietly = TRUE)) { lmer_model <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "stage", "(1|site)"), model_type = "lmer", labels = clintrial_labels ) print(lmer_model) } # Example 26: Generalized linear mixed effects (logistic with random effects) if (requireNamespace("lme4", quietly = TRUE)) { glmer_model <- fit( data = clintrial, outcome = "readmission_30d", predictors = c("age", "surgery", "los_days", "(1|site)"), model_type = "glmer", family = "binomial", labels = clintrial_labels ) print(glmer_model) } # Example 27: Cox mixed effects for clustered survival data if (requireNamespace("coxme", quietly = TRUE)) { coxme_model <- fit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "treatment", "stage", "(1|site)"), model_type = "coxme", labels = clintrial_labels ) print(coxme_model) } # Example 28: Random slopes - treatment effect varies by site if (requireNamespace("lme4", quietly = TRUE)) { random_slopes <- fit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "stage", "(treatment|site)"), model_type = "lmer", labels = clintrial_labels ) print(random_slopes) } # Example 29: Format a pre-fitted model (model-based workflow) # Useful for models fitted outside of fit() pre_fitted <- glm(os_status ~ age + sex + treatment, family = binomial, data = clintrial) result <- fit(model = pre_fitted, data = clintrial, labels = clintrial_labels) print(result)
Executes a comprehensive regression analysis pipeline that combines univariable screening, automatic/manual variable selection, and multivariable modeling in a single function call. This function is designed to streamline the complete analytical workflow from initial exploration to final adjusted models, with publication-ready formatted output showing both univariable and multivariable results side-by-side if desired.
fullfit( data, outcome, predictors, method = "screen", multi_predictors = NULL, p_threshold = 0.05, columns = "both", model_type = "glm", family = "binomial", random = NULL, conf_level = 0.95, reference_rows = TRUE, show_n = TRUE, show_events = TRUE, digits = 2, p_digits = 3, labels = NULL, metrics = "both", return_type = "table", keep_models = FALSE, exponentiate = NULL, conf_method = NULL, parallel = TRUE, n_cores = NULL, number_format = NULL, verbose = NULL, ... )fullfit( data, outcome, predictors, method = "screen", multi_predictors = NULL, p_threshold = 0.05, columns = "both", model_type = "glm", family = "binomial", random = NULL, conf_level = 0.95, reference_rows = TRUE, show_n = TRUE, show_events = TRUE, digits = 2, p_digits = 3, labels = NULL, metrics = "both", return_type = "table", keep_models = FALSE, exponentiate = NULL, conf_method = NULL, parallel = TRUE, n_cores = NULL, number_format = NULL, verbose = NULL, ... )
data |
Data frame or data.table containing the analysis dataset. The function automatically converts data frames to data.tables for efficient processing. |
outcome |
Character string specifying the outcome variable name. For
time-to-event analysis, use |
predictors |
Character vector of predictor variable names to analyze.
All predictors are tested in univariable models. The subset included in
the multivariable model depends on the |
method |
Character string specifying the variable selection strategy:
|
multi_predictors |
Character vector of predictors to include in the
multivariable model when |
p_threshold |
Numeric p-value threshold for automatic variable
selection when |
columns |
Character string specifying which result columns to display:
|
model_type |
Character string specifying the regression model type:
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% CI). |
reference_rows |
Logical. If |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying decimal places for effect estimates. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names, values are
display labels. Default is |
metrics |
Character specification for which statistics to display:
Can also be a character vector: |
return_type |
Character string specifying what to return:
|
keep_models |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients. Default
is |
conf_method |
Character string controlling the confidence interval method.
If
Cox and mixed-effects models use Wald intervals regardless of this setting.
Set globally with |
parallel |
Logical. If |
n_cores |
Integer specifying the number of CPU cores to use for
parallel processing. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to model fitting functions (e.g.,
|
Analysis Workflow:
The function implements a complete regression analysis pipeline:
Univariable screening: Fits separate models for each predictor (outcome ~ predictor). Each predictor is tested independently to understand crude associations.
Variable selection: Based on the method parameter:
"screen": Automatically selects predictors with univariable
p p_threshold
"all": Includes all predictors (no selection)
"custom": Uses predictors specified in multi_predictors
Multivariable modeling: Fits a single model with selected predictors (outcome ~ predictor1 + predictor2 + ...). Estimates are adjusted for all other variables in the model.
Output formatting: Combines results into publication-ready table with appropriate effect measures and formatting.
Variable Selection Strategies:
"Screen" Method (method = "screen"):
Uses p-value threshold for automatic selection
Liberal thresholds (e.g., 0.20) cast a wide net to avoid missing important predictors
Stricter thresholds (e.g., 0.05) focus on strongly associated predictors
Helps reduce overfitting and multicollinearity
Common in exploratory analyses and when sample size is limited
"All" Method (method = "all"):
No variable selection - includes all predictors
Appropriate when all variables are theoretically important
Risk of overfitting with many predictors relative to sample size
Useful for confirmatory analyses with pre-specified models
"Custom" Method (method = "custom"):
Manual selection based on subject matter knowledge
Runs univariable analysis for all predictors (for comparison)
Includes only specified predictors in multivariable model
Ideal for theory-driven model building
Allows comparison of unadjusted vs adjusted effects for all variables
Interpreting Results:
When columns = "both" (default), tables show:
Univariable columns: Crude associations, unadjusted for other variables. Labeled as "OR/HR/RR/Coefficient (95% CI)" and "Uni p"
Multivariable columns: Adjusted associations, accounting for all other predictors in the model. Labeled as "aOR/aHR/aRR/Adj. Coefficient (95% CI)" and "Multi p" ("a" = adjusted)
Variables not meeting selection criteria show "-" in multivariable columns
Comparing univariable and multivariable results helps identify:
Confounding: Large changes in effect estimates
Independent effects: Similar univariable and multivariable estimates
Mediation: Attenuated effects in multivariable model
Suppression: Effects that emerge only after adjustment
Sample Size Considerations:
Rule of thumb for multivariable models:
Logistic regression: 10 events per predictor variable
Cox regression: 10 events per predictor variable
Linear regression: 10-20 observations per predictor
Use screening methods to reduce predictor count when these ratios are not met.
Depends on return_type parameter:
When return_type = "table" (default): A data.table with S3 class
"fullfit_result" containing:
Character. Predictor name or custom label
Character. Category level for factors, empty for continuous
Integer. Sample sizes (if show_n = TRUE). For
variables included in the multivariable model, reflects the
complete-case sample size from the fitted model (listwise deletion
across all included predictors). For variables not selected into the
multivariable model, reflects the per-variable sample size from the
univariable analysis. This follows STROBE guideline item 12,
which recommends reporting the number of participants included at
each stage of analysis.
Integer. Event counts (if show_events = TRUE).
Same complete-case convention as n: multivariable rows show
events from the fitted model, univariable-only rows show
per-variable counts.
Character. Unadjusted effect
(if columns includes "uni" and metrics includes "effect")
Character. Univariable p-value (if columns includes
"uni" and metrics includes "p")
Character. Adjusted effect
(if columns includes "multi" and metrics includes "effect")
Character. Multivariable p-value (if columns
includes "multi" and metrics includes "p")
When return_type = "model": The fitted multivariable model object
(glm, lm, coxph, etc.).
When return_type = "both": A list with two elements:
The formatted results data.table
The fitted multivariable model object
The table includes the following attributes:
Character. The outcome variable name
Character. The regression model type
Character. The variable selection method used
Character. Which columns were displayed
The multivariable model object (if fitted)
The complete univariable screening results
Integer. Number of predictors in multivariable model
Character vector. Names of predictors that passed univariable screening at the specified p-value threshold
Character vector. Names of variables with p < 0.05 in the multivariable model (or univariable if multivariable was not fitted)
uniscreen for univariable screening only,
fit for fitting a single multivariable model,
compfit for comparing multiple models,
desctable for descriptive statistics
Other regression functions:
compfit(),
fit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
# Load example data data(clintrial) data(clintrial_labels) # Example 1: Basic screening with p < 0.05 threshold result1 <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension", "diabetes", "treatment", "stage"), method = "screen", p_threshold = 0.05, labels = clintrial_labels ) print(result1) # Shows both univariable and multivariable results # Only significant univariable predictors in multivariable model # Example 2: Include all predictors (no selection) result2 <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), method = "all", labels = clintrial_labels ) print(result2) # Example 3: Custom variable selection result3 <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "treatment", "stage"), method = "custom", multi_predictors = c("age", "treatment", "stage"), labels = clintrial_labels ) print(result3) # Univariable for all, multivariable for selected only # Example 4: Cox regression with screening library(survival) cox_result <- fullfit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), model_type = "coxph", method = "screen", p_threshold = 0.10, labels = clintrial_labels ) print(cox_result) # Example 5: Linear regression without screening linear_result <- fullfit( data = clintrial, outcome = "bmi", predictors = c("age", "sex", "smoking", "creatinine"), model_type = "lm", method = "all", labels = clintrial_labels ) print(linear_result) # Example 6: Poisson regression for count outcomes poisson_result <- fullfit( data = clintrial, outcome = "fu_count", predictors = c("age", "stage", "treatment", "surgery"), model_type = "glm", family = "poisson", method = "all", labels = clintrial_labels ) print(poisson_result) # Example 7: Show only multivariable results multi_only <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), method = "all", columns = "multi", labels = clintrial_labels ) print(multi_only) # Example 8: Return both table and model object both <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), method = "all", return_type = "both" ) print(both$table) summary(both$model) # Example 9: Keep univariable models for diagnostics with_models <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "bmi", "creatinine"), keep_models = TRUE ) uni_results <- attr(with_models, "uni_results") uni_models <- attr(uni_results, "models") summary(uni_models[["age"]]) # Example 10: Linear mixed effects with site clustering if (requireNamespace("lme4", quietly = TRUE)) { lmer_result <- fullfit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "surgery", "stage"), random = "(1|site)", model_type = "lmer", method = "all", labels = clintrial_labels ) print(lmer_result) }# Load example data data(clintrial) data(clintrial_labels) # Example 1: Basic screening with p < 0.05 threshold result1 <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension", "diabetes", "treatment", "stage"), method = "screen", p_threshold = 0.05, labels = clintrial_labels ) print(result1) # Shows both univariable and multivariable results # Only significant univariable predictors in multivariable model # Example 2: Include all predictors (no selection) result2 <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), method = "all", labels = clintrial_labels ) print(result2) # Example 3: Custom variable selection result3 <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "treatment", "stage"), method = "custom", multi_predictors = c("age", "treatment", "stage"), labels = clintrial_labels ) print(result3) # Univariable for all, multivariable for selected only # Example 4: Cox regression with screening library(survival) cox_result <- fullfit( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), model_type = "coxph", method = "screen", p_threshold = 0.10, labels = clintrial_labels ) print(cox_result) # Example 5: Linear regression without screening linear_result <- fullfit( data = clintrial, outcome = "bmi", predictors = c("age", "sex", "smoking", "creatinine"), model_type = "lm", method = "all", labels = clintrial_labels ) print(linear_result) # Example 6: Poisson regression for count outcomes poisson_result <- fullfit( data = clintrial, outcome = "fu_count", predictors = c("age", "stage", "treatment", "surgery"), model_type = "glm", family = "poisson", method = "all", labels = clintrial_labels ) print(poisson_result) # Example 7: Show only multivariable results multi_only <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), method = "all", columns = "multi", labels = clintrial_labels ) print(multi_only) # Example 8: Return both table and model object both <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), method = "all", return_type = "both" ) print(both$table) summary(both$model) # Example 9: Keep univariable models for diagnostics with_models <- fullfit( data = clintrial, outcome = "os_status", predictors = c("age", "bmi", "creatinine"), keep_models = TRUE ) uni_results <- attr(with_models, "uni_results") uni_models <- attr(uni_results, "models") summary(uni_models[["age"]]) # Example 10: Linear mixed effects with site clustering if (requireNamespace("lme4", quietly = TRUE)) { lmer_result <- fullfit( data = clintrial, outcome = "los_days", predictors = c("age", "treatment", "surgery", "stage"), random = "(1|site)", model_type = "lmer", method = "all", labels = clintrial_labels ) print(lmer_result) }
Generates a publication-ready forest plot that combines a formatted data table with a graphical representation of effect estimates (odds ratios, risk ratios, or coefficients) from a generalized linear model. The plot integrates variable names, group levels, sample sizes, effect estimates with confidence intervals, p-values, and model diagnostics in a single comprehensive visualization designed for manuscripts and presentations.
glmforest( x, data = NULL, title = "Generalized Linear Model", effect_label = NULL, digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, show_events = TRUE, indent_groups = FALSE, condense_table = FALSE, bold_variables = FALSE, center_padding = 4, zebra_stripes = TRUE, ref_label = "reference", labels = NULL, color = NULL, exponentiate = NULL, qc_footer = TRUE, units = "in", number_format = NULL )glmforest( x, data = NULL, title = "Generalized Linear Model", effect_label = NULL, digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, show_events = TRUE, indent_groups = FALSE, condense_table = FALSE, bold_variables = FALSE, center_padding = 4, zebra_stripes = TRUE, ref_label = "reference", labels = NULL, color = NULL, exponentiate = NULL, qc_footer = TRUE, units = "in", number_format = NULL )
x |
Either a fitted GLM object (class |
data |
Data frame or data.table containing the original data used to
fit the model. If |
title |
Character string specifying the plot title displayed at the top.
Default is |
effect_label |
Character string for the effect measure label on the
forest plot axis. If |
digits |
Integer specifying the number of decimal places for effect estimates and confidence intervals in the data table. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Values > 1 increase all fonts proportionally, values < 1 decrease them. Default is 1.0. Useful for adjusting readability across different output sizes. |
annot_size |
Numeric value controlling the relative font size for
data annotations (variable names, values in table cells). Default is 3.88.
Adjust relative to |
header_size |
Numeric value controlling the relative font size for column headers ("Variable", "Group", "n", etc.). Default is 5.82. Headers are typically larger than annotations for hierarchy. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. The title is typically the largest text element. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of
total plot width allocated to the data table (left side). The forest plot
occupies |
show_n |
Logical. If |
show_events |
Logical. If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying the horizontal spacing (in character units) between the data table and forest plot. Increase for more separation, decrease to fit more content. Default is 4. |
zebra_stripes |
Logical. If |
ref_label |
Character string to display for reference categories of
factor variables. Typically shown in place of effect estimates.
Default is |
labels |
Named character vector or list providing custom display
labels for variables. Names should match variable names in the model,
values are the labels to display. Example:
|
color |
Character string specifying the color for effect estimate point
markers in the forest plot. Use hex codes or R color names. Default is
Gaussian with log link), and |
exponentiate |
Logical. If |
qc_footer |
Logical. If |
units |
Character string specifying the units for plot dimensions.
Options: |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Plot Components:
The forest plot consists of several integrated components:
Title: Centered at top, describes the analysis
Data Table (left side): Contains columns for:
Variable: Predictor names (or custom labels)
Group: Factor levels (optional, hidden when indenting)
n: Sample sizes by group (optional)
Events: Event counts by group (optional)
Effect (95% CI); p-value: Formatted estimates with p-values
Forest Plot (right side): Graphical display with:
Point estimates (squares sized by sample size)
95% confidence intervals (error bars)
Reference line (at OR/RR = 1 or coefficient = 0)
Log scale for odds/risk ratios
Labeled axis
Model Statistics (footer): Summary of:
Observations analyzed (with percentage of total data)
Model family (Binomial, Poisson, etc.)
Deviance statistics
Pseudo-R (McFadden)
AIC
Automatic Effect Measure Selection:
When effect_label = NULL and exponentiate = NULL, the function
intelligently selects the appropriate effect measure:
Logistic regression (family = binomial(link = "logit")):
Odds Ratios (OR)
Log-link models (link = "log"): Risk Ratios (RR)
or Rate Ratios
Other exponential families: exp(coefficient)
Identity link: Raw coefficients
Reference Categories:
For factor variables, the first level (determined by factor ordering or alphabetically for character variables) serves as the reference category:
Displayed with the ref_label instead of an estimate
No confidence interval or p-value shown
Visually aligned with other categories
When condense_table = TRUE, reference-only variables may be
omitted entirely
Layout Optimization:
The function automatically optimizes layout based on content:
Calculates appropriate axis ranges to accommodate all confidence intervals
Selects meaningful tick marks on log or linear scales
Sizes point markers proportional to sample size (larger = more data)
Adjusts table width based on variable name lengths when table_width = NULL
Recommends overall dimensions based on number of rows
Visual Grouping Options:
Three display modes are available:
Standard (indent_groups = FALSE,
condense_table = FALSE):
Separate "Variable" and "Group" columns, all categories shown
Indented (indent_groups = TRUE,
condense_table = FALSE):
Hierarchical display with groups indented under variables
Condensed (condense_table = TRUE):
Binary variables shown in single rows, automatically indented
Zebra Striping:
When zebra_stripes = TRUE, alternating variables (not individual rows)
receive light gray backgrounds. This helps visually group all levels of a
factor variable together, making the plot easier to read especially with
many multi-level factors.
Model Statistics Display:
The footer shows key diagnostic information:
Observations analyzed: Total N and percentage of original data (accounting for missing values)
Null/Residual Deviance: Model fit improvement
Pseudo-R: McFadden R = 1 - (log L_1 / log L_2)
AIC: For model comparison (lower is better)
For logistic regression, concordance (C-statistic/AUC) may also be displayed if available.
Saving Plots:
Use ggplot2::ggsave() with recommended dimensions:
p <- glmforest(model, data)
dims <- attr(p, "rec_dims")
ggplot2::ggsave("forest.pdf", p, width = dims$width, height = dims$height)
Or specify custom dimensions:
ggplot2::ggsave("forest.png", p, width = 12, height = 8, dpi = 300)
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly: print(plot)
Saved to file: ggsave("forest.pdf", plot, width = 12, height = 8)
Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
Numeric. Recommended plot width in specified units
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
autoforest for automatic model detection,
coxforest for Cox proportional hazards forest plots,
lmforest for linear model forest plots,
uniforest for univariable screening forest plots,
multiforest for multi-outcome forest plots,
glm for fitting GLMs,
fit for regression modeling
Other visualization functions:
autoforest(),
coxforest(),
lmforest(),
multiforest(),
uniforest()
data(clintrial) data(clintrial_labels) # Create example model model1 <- glm(os_status ~ age + sex + bmi + treatment, data = clintrial, family = binomial) # Example 1: Basic logistic regression forest plot p <- glmforest(model1, data = clintrial) old_width <- options(width = 180) # Example 2: With custom variable labels plot2 <- glmforest( x = model1, data = clintrial, title = "Risk Factors for Mortality", labels = clintrial_labels ) # Example 3: Indented layout with formatting options plot3 <- glmforest( x = model1, data = clintrial, indent_groups = TRUE, zebra_stripes = TRUE, color = "#D62728", labels = clintrial_labels ) # Example 4: Condensed layout for many binary variables model4 <- glm(os_status ~ age + sex + smoking + hypertension + diabetes + surgery, data = clintrial, family = binomial) plot4 <- glmforest( x = model4, data = clintrial, condense_table = TRUE, labels = clintrial_labels ) # Binary variables shown in single rows # Example 5: Poisson regression for count data model5 <- glm(ae_count ~ age + treatment + diabetes + surgery, data = clintrial, family = poisson) plot5 <- glmforest( x = model5, data = clintrial, title = "Rate Ratios for Adverse Events", labels = clintrial_labels ) # Example 6: Save with recommended dimensions dims <- attr(plot5, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "forest.pdf"), plot5, width = dims$width, height = dims$height) options(old_width)data(clintrial) data(clintrial_labels) # Create example model model1 <- glm(os_status ~ age + sex + bmi + treatment, data = clintrial, family = binomial) # Example 1: Basic logistic regression forest plot p <- glmforest(model1, data = clintrial) old_width <- options(width = 180) # Example 2: With custom variable labels plot2 <- glmforest( x = model1, data = clintrial, title = "Risk Factors for Mortality", labels = clintrial_labels ) # Example 3: Indented layout with formatting options plot3 <- glmforest( x = model1, data = clintrial, indent_groups = TRUE, zebra_stripes = TRUE, color = "#D62728", labels = clintrial_labels ) # Example 4: Condensed layout for many binary variables model4 <- glm(os_status ~ age + sex + smoking + hypertension + diabetes + surgery, data = clintrial, family = binomial) plot4 <- glmforest( x = model4, data = clintrial, condense_table = TRUE, labels = clintrial_labels ) # Binary variables shown in single rows # Example 5: Poisson regression for count data model5 <- glm(ae_count ~ age + treatment + diabetes + surgery, data = clintrial, family = poisson) plot5 <- glmforest( x = model5, data = clintrial, title = "Rate Ratios for Adverse Events", labels = clintrial_labels ) # Example 6: Save with recommended dimensions dims <- attr(plot5, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "forest.pdf"), plot5, width = dims$width, height = dims$height) options(old_width)
Generates a publication-ready forest plot that combines a formatted data table
with a graphical representation of regression coefficients from a linear model.
The plot integrates variable names, group levels, sample sizes, coefficients
with confidence intervals, p-values, and model diagnostics (R,
F-statistic, AIC) in a single comprehensive visualization designed for
manuscripts and presentations.
lmforest( x, data = NULL, title = "Linear Model", effect_label = "Coefficient", digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, indent_groups = FALSE, condense_table = FALSE, bold_variables = FALSE, center_padding = 4, zebra_stripes = TRUE, ref_label = "reference", labels = NULL, units = "in", color = "#5A8F5A", qc_footer = TRUE, number_format = NULL )lmforest( x, data = NULL, title = "Linear Model", effect_label = "Coefficient", digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, indent_groups = FALSE, condense_table = FALSE, bold_variables = FALSE, center_padding = 4, zebra_stripes = TRUE, ref_label = "reference", labels = NULL, units = "in", color = "#5A8F5A", qc_footer = TRUE, number_format = NULL )
x |
Either a fitted linear model object (class |
data |
Data frame or data.table containing the original data used to
fit the model. If |
title |
Character string specifying the plot title displayed at the top.
Default is |
effect_label |
Character string for the effect measure label on the
forest plot axis. Default is |
digits |
Integer specifying the number of decimal places for coefficients and confidence intervals. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Default is 1.0. |
annot_size |
Numeric value controlling the relative font size for data annotations. Default is 3.88. |
header_size |
Numeric value controlling the relative font size for column headers. Default is 5.82. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of total plot width allocated to the data table. Default is 0.6. |
show_n |
Logical. If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying horizontal spacing between table and forest plot. Default is 4. |
zebra_stripes |
Logical. If |
ref_label |
Character string to display for reference categories of
factor variables. Default is |
labels |
Named character vector providing custom display labels for
variables. Example: |
units |
Character string specifying units for plot dimensions:
|
color |
Character string specifying the color for coefficient point
estimates in the forest plot. Default is |
qc_footer |
Logical. If |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Linear Model-Specific Features:
The linear model forest plot differs from logistic and Cox plots in several ways:
Coefficients: Raw regression coefficients shown (not exponentiated)
Reference line: At coefficient = 0 (not at 1)
Linear scale: Forest plot uses linear scale (not log scale)
No events column: Only sample sizes shown (no event counts)
R statistics: Model fit assessed by R and adjusted R
F-test: Overall model significance from F-statistic
Plot Components:
Title: Centered at top
Data Table (left): Contains:
Variable: Predictor names
Group: Factor levels (if applicable)
n: Sample sizes by group
Coefficient (95% CI); p-value: Raw coefficients with CIs and p-values
Forest Plot (right):
Point estimates (squares sized by sample size)
95% confidence intervals (error bars)
Reference line at coefficient = 0
Linear scale
Model Statistics (footer):
Observations analyzed (with percentage of total data)
R and adjusted R
F-statistic with degrees of freedom and p-value
AIC
Interpreting Coefficients:
Linear regression coefficients represent the change in the outcome variable for a one-unit change in the predictor:
Continuous predictors: Coefficient = change in Y per unit of X
Binary predictors: Coefficient = difference in Y between groups
Factor predictors: Coefficients = differences from reference category
Sign matters: Positive = increase in Y, Negative = decrease in Y
Zero crossing: CI crossing zero suggests no significant effect
Example: If the coefficient for "age" is 0.50 when predicting BMI,
BMI increases by 0.50 kg/m for each additional year of age.
Model Fit Statistics:
The footer displays key diagnostics:
R: Proportion of variance explained (0 to 1)
0.0-0.3: Weak explanatory power
0.3-0.5: Moderate
0.5-0.7: Good
> 0.7: Strong (rare in social/biological sciences)
Adjusted R: R penalized for number of predictors
Always R
Preferred for model comparison
Accounts for model complexity
F-statistic: Tests null hypothesis that all coefficients = 0
Degrees of freedom: df1 = # predictors, df2 = # observations - # predictors - 1
Significant p-value indicates model explains variance better than intercept-only
AIC: For model comparison (lower is better)
Assumptions:
Linear regression assumes:
Linearity of relationships
Independence of observations
Homoscedasticity (constant variance)
Normality of residuals
No multicollinearity
Check assumptions using:
plot(model) for diagnostic plots
car::vif(model) for multicollinearity
lmtest::bptest(model) for heteroscedasticity
shapiro.test(residuals(model)) for normality
Reference Categories:
For factor variables:
First level is the reference (coefficient = 0)
Other levels show difference from reference
Reference displayed with ref_label
Relevel factors before modeling if needed:
factor(x, levels = c("desired_ref", ...))
Sample Size Reporting:
The "n" column shows:
For continuous variables: Total observations with non-missing data
For factor variables: Number of observations in each category
Footer shows total observations analyzed and percentage of original data (accounting for missing values)
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly: print(plot)
Saved to file: ggsave("forest.pdf", plot, width = 12, height = 8)
Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
Numeric. Recommended plot width in specified units
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
autoforest for automatic model detection,
glmforest for logistic/GLM forest plots,
coxforest for Cox model forest plots,
uniforest for univariable screening forest plots,
multiforest for multi-outcome forest plots,
lm for fitting linear models,
fit for regression modeling
Other visualization functions:
autoforest(),
coxforest(),
glmforest(),
multiforest(),
uniforest()
data(clintrial) data(clintrial_labels) # Create example model model1 <- lm(bmi ~ age + sex + smoking, data = clintrial) # Example 1: Basic linear model forest plot p <- lmforest(model1, data = clintrial) old_width <- options(width = 180) # Example 2: With custom labels and title plot2 <- lmforest( x = model1, data = clintrial, title = "Predictors of Body Mass Index", effect_label = "Change in BMI (kg/m^2)", labels = clintrial_labels ) # Example 3: Comprehensive model with indented layout model3 <- lm( bmi ~ age + sex + smoking + hypertension + diabetes + creatinine, data = clintrial ) plot3 <- lmforest( x = model3, data = clintrial, labels = clintrial_labels, indent_groups = TRUE, zebra_stripes = TRUE ) # Example 4: Condensed layout plot4 <- lmforest( x = model3, data = clintrial, condense_table = TRUE, labels = clintrial_labels ) # Example 5: Different outcome (hemoglobin) model5 <- lm( hemoglobin ~ age + sex + bmi + smoking + creatinine, data = clintrial ) plot5 <- lmforest( x = model5, data = clintrial, title = "Predictors of Baseline Hemoglobin", effect_label = "Change in Hemoglobin (g/dL)", labels = clintrial_labels ) # Example 6: Save with recommended dimensions dims <- attr(plot5, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "linear_forest.pdf"), plot5, width = dims$width, height = dims$height) options(old_width)data(clintrial) data(clintrial_labels) # Create example model model1 <- lm(bmi ~ age + sex + smoking, data = clintrial) # Example 1: Basic linear model forest plot p <- lmforest(model1, data = clintrial) old_width <- options(width = 180) # Example 2: With custom labels and title plot2 <- lmforest( x = model1, data = clintrial, title = "Predictors of Body Mass Index", effect_label = "Change in BMI (kg/m^2)", labels = clintrial_labels ) # Example 3: Comprehensive model with indented layout model3 <- lm( bmi ~ age + sex + smoking + hypertension + diabetes + creatinine, data = clintrial ) plot3 <- lmforest( x = model3, data = clintrial, labels = clintrial_labels, indent_groups = TRUE, zebra_stripes = TRUE ) # Example 4: Condensed layout plot4 <- lmforest( x = model3, data = clintrial, condense_table = TRUE, labels = clintrial_labels ) # Example 5: Different outcome (hemoglobin) model5 <- lm( hemoglobin ~ age + sex + bmi + smoking + creatinine, data = clintrial ) plot5 <- lmforest( x = model5, data = clintrial, title = "Predictors of Baseline Hemoglobin", effect_label = "Change in Hemoglobin (g/dL)", labels = clintrial_labels ) # Example 6: Save with recommended dimensions dims <- attr(plot5, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "linear_forest.pdf"), plot5, width = dims$width, height = dims$height) options(old_width)
Extracts coefficients, confidence intervals, and comprehensive model statistics from fitted regression models and converts them to a standardized data.table format suitable for further analysis or publication. This is a core utility function frequently used internally by other summata regression functions, although it can be used as a standalone function as well.
m2dt( data, model, conf_level = 0.95, keep_qc_stats = TRUE, include_intercept = TRUE, terms_to_exclude = NULL, reference_rows = TRUE, reference_label = "reference", skip_counts = FALSE, conf_method = NULL )m2dt( data, model, conf_level = 0.95, keep_qc_stats = TRUE, include_intercept = TRUE, terms_to_exclude = NULL, reference_rows = TRUE, reference_label = "reference", skip_counts = FALSE, conf_method = NULL )
data |
Data frame or data.table containing the dataset used to fit the model. Required for computing group-level sample sizes and event counts. |
model |
Fitted model object. Supported classes include:
|
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% CI). |
keep_qc_stats |
Logical. If |
include_intercept |
Logical. If |
terms_to_exclude |
Character vector of term names to exclude from output.
Useful for removing specific unwanted parameters (e.g., nuisance variables,
spline terms). Default is |
reference_rows |
Logical. If |
reference_label |
Character string used to label reference category rows
in the output. Appears in the |
skip_counts |
Logical. If |
conf_method |
Character string controlling the confidence interval method.
If
Cox and mixed-effects models use Wald intervals regardless of this setting.
Set globally with |
This function is the core extraction utility used by fit() and other
regression functions. It handles the complexities of different model classes
and provides a consistent output format suitable for tables and forest plots.
Model Type Detection: The function automatically detects model type and applies appropriate:
Effect measure naming (OR, HR, RR, Coefficient)
Confidence interval calculation (see below)
Event counting for binary/survival outcomes
Confidence Interval Methods:
The CI method is selected per model class using stats::confint()
dispatch:
GLM/negative binomial: Profile likelihood via
MASS::confint.glm(), except quasi-families which use Wald
Linear models: Exact t-distribution via
confint.lm()
Cox PH: Wald intervals (coefficient
z SE)
Mixed-effects models: Wald intervals
Falls back to Wald on profiling failure.
Mixed Effects Models: For lme4 models (glmer, lmer), the function extracts fixed effects only. Random effects variance components are not included in the output table, as they represent clustering structure rather than predictor effects.
A data.table containing extracted model information with the
following standard columns:
Character. Either "Univariable" (unadjusted model with single predictor) or "Multivariable" (adjusted model with multiple predictors)
Character. Type of regression (e.g., "Logistic", "Linear", "Cox PH", "Poisson", etc.)
Character. Variable name (for factor variables, the base variable name without the level)
Character. Group/level name for factor variables; empty string for continuous variables
Integer. Total sample size used in the model
Integer. Sample size for this specific variable level (factor variables only)
Integer. Total number of events in the model (for survival and logistic models)
Integer. Number of events for this specific variable level (for survival and logistic models with factor variables)
Numeric. Raw regression coefficient (log odds, log hazard, etc.)
Numeric. Standard error of the coefficient
Numeric. Effect estimate - column name depends on model type:
OR for logistic regression (odds ratio)
HR for Cox models (hazard ratio)
RR for Poisson regression (rate/risk ratio)
Coefficient for linear models or other GLMs
Numeric. Lower bound of confidence interval for effect estimate
Numeric. Upper bound of confidence interval for effect estimate
Numeric. Test statistic (z-value for GLM/Cox, t-value for LM)
Numeric. p-value for coefficient test
Character. Significance markers: *** (p < 0.001), **
(p < 0.01), * (p < 0.05), . (p < 0.10).
Logical. Binary indicator: TRUE if p < 0.05,
FALSE otherwise
Character. Contains reference_label for reference
category rows when reference_rows = TRUE, empty string otherwise
fit for the main regression interface,
glmforest, coxforest, lmforest for
forest plot visualization
# Load example data data(clintrial) # Example 1: Extract from logistic regression glm_model <- glm(os_status ~ age + sex + treatment, data = clintrial, family = binomial) glm_result <- m2dt(clintrial, glm_model) glm_result # Example 2: Extract from linear model lm_model <- lm(los_days ~ age + sex + surgery, data = clintrial) lm_result <- m2dt(clintrial, lm_model) lm_result # Example 3: Cox proportional hazards model library(survival) cox_model <- coxph(Surv(os_months, os_status) ~ age + sex + stage, data = clintrial) cox_result <- m2dt(clintrial, cox_model) cox_result # Example 4: Exclude intercept for cleaner tables clean_result <- m2dt(clintrial, glm_model, include_intercept = FALSE) clean_result # Example 5: Change confidence level result_90ci <- m2dt(clintrial, glm_model, conf_level = 0.90) result_90ci# Load example data data(clintrial) # Example 1: Extract from logistic regression glm_model <- glm(os_status ~ age + sex + treatment, data = clintrial, family = binomial) glm_result <- m2dt(clintrial, glm_model) glm_result # Example 2: Extract from linear model lm_model <- lm(los_days ~ age + sex + surgery, data = clintrial) lm_result <- m2dt(clintrial, lm_model) lm_result # Example 3: Cox proportional hazards model library(survival) cox_model <- coxph(Surv(os_months, os_status) ~ age + sex + stage, data = clintrial) cox_result <- m2dt(clintrial, cox_model) cox_result # Example 4: Exclude intercept for cleaner tables clean_result <- m2dt(clintrial, glm_model, include_intercept = FALSE) clean_result # Example 5: Change confidence level result_90ci <- m2dt(clintrial, glm_model, conf_level = 0.90) result_90ci
Performs regression analyses of a single predictor (exposure) across multiple outcomes. This function is designed for studies where a single exposure variable is tested against multiple endpoints, such as complication screening, biomarker associations, or phenome-wide association studies. Returns publication-ready formatted results with optional covariate adjustment. Supports interactions, mixed-effects models, stratification, and clustered standard errors.
multifit( data, outcomes, predictor, covariates = NULL, interactions = NULL, random = NULL, strata = NULL, cluster = NULL, model_type = "glm", family = "binomial", columns = "adjusted", p_threshold = 1, conf_level = 0.95, show_n = TRUE, show_events = TRUE, digits = 2, p_digits = 3, labels = NULL, predictor_label = NULL, include_predictor = TRUE, keep_models = FALSE, exponentiate = NULL, conf_method = NULL, parallel = TRUE, n_cores = NULL, number_format = NULL, verbose = NULL, ... )multifit( data, outcomes, predictor, covariates = NULL, interactions = NULL, random = NULL, strata = NULL, cluster = NULL, model_type = "glm", family = "binomial", columns = "adjusted", p_threshold = 1, conf_level = 0.95, show_n = TRUE, show_events = TRUE, digits = 2, p_digits = 3, labels = NULL, predictor_label = NULL, include_predictor = TRUE, keep_models = FALSE, exponentiate = NULL, conf_method = NULL, parallel = TRUE, n_cores = NULL, number_format = NULL, verbose = NULL, ... )
data |
Data frame or data.table containing the analysis dataset. The function automatically converts data frames to data.tables for efficient processing. |
outcomes |
Character vector of outcome variable names to analyze. Each
outcome is tested in its own model with the predictor. For time-to-event
analysis, use |
predictor |
Character string specifying the predictor (exposure) variable name. This variable is tested against each outcome. Can be continuous or categorical (factor). |
covariates |
Optional character vector of covariate variable names to
include in adjusted models. When specified, models are fit as
|
interactions |
Optional character vector of interaction terms to include
in adjusted models, using colon notation (e.g., |
random |
Optional character string specifying random effects formula for
mixed effects models (e.g., |
strata |
Optional character string naming the stratification variable for
Cox or conditional logistic models. Creates separate baseline hazards for
each stratum. Default is |
cluster |
Optional character string naming the clustering variable for
Cox models. Computes robust clustered standard errors. Default is |
model_type |
Character string specifying the type of regression model to fit. Options include:
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
columns |
Character string specifying which result columns to display when
both unadjusted and adjusted models are fit (i.e., when
Ignored when |
p_threshold |
Numeric value between 0 and 1 specifying a p-value threshold for filtering results. Only outcomes with p-value less than or equal to the threshold are included in the output. Default is 1 (no filtering; all outcomes returned). |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Can include labels for outcomes, predictors, and
covariates. Names should match variable names, values are the display labels.
Labels are applied to: (1) outcome names in the Outcome column, (2) predictor
variable name when displayed, and (3) variable names in formatted interaction
terms. Variables not in |
predictor_label |
Optional character string providing a custom display
label for the predictor variable. Takes precedence over |
include_predictor |
Logical. If |
keep_models |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients (display
OR/HR/RR instead of log odds/log hazards). Default is |
conf_method |
Character string controlling the confidence interval method.
If
Cox and mixed-effects models use Wald intervals regardless of this setting.
Set globally with |
parallel |
Logical. If |
n_cores |
Integer specifying the number of CPU cores to use for
parallel processing. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting functions. |
Analysis Approach:
The function implements a multivariate (multi-outcome) screening workflow that inverts the typical regression paradigm:
For each outcome in outcomes, fits a separate model with the
predictor as the main exposure
If covariates specified, fits adjusted model:
outcome ~ predictor + covariates + interactions
Extracts only the predictor effect(s) from each model, ignoring covariate coefficients
Combines results into a single table for comparison across outcomes
Optionally filters by p-value threshold
This is conceptually opposite to uniscreen(), which tests multiple
predictors against a single outcome. Use multifit() when you have one
exposure of interest and want to screen across multiple endpoints.
When to Use Multivariate Regression Analysis:
Complication screening: Test one exposure (e.g., operative time, BMI, biomarker level) against multiple postoperative complications
Treatment effects: Test one treatment against multiple efficacy and safety endpoints simultaneously
Biomarker studies: Test one biomarker against multiple clinical outcomes to understand its prognostic value
Phenome-wide association studies (PheWAS): Test genetic variants or exposures against many phenotypes
Risk factor profiling: Understand how one risk factor relates to a spectrum of outcomes
Handling Categorical Predictors:
When the predictor is a factor variable with multiple levels:
Each non-reference level gets its own row for each outcome
Reference category is determined by factor level ordering
The Predictor column shows "Variable (Level)" format (e.g., "Treatment (Drug A)", "Treatment (Drug B)")
For binary variables with affirmative non-reference levels (Yes, 1, True, Present, Positive, +), shows just "Variable" (e.g., "Diabetes" instead of "Diabetes (Yes)")
Effect estimates compare each level to the reference
Adjusted vs. Unadjusted Results:
When covariates is specified, the function fits both models but only
extracts predictor effects:
columns = "adjusted": Reports only covariate-adjusted effects.
Column labeled "aOR/aHR," etc.
columns = "unadjusted": Reports only crude effects. Column
labeled "OR/HR," etc.
columns = "both": Reports both side-by-side. Useful for
identifying confounding (large change in effect) or independent effects
(similar estimates)
Interaction Terms:
When interactions includes terms involving the predictor:
Main effect of predictor is always reported
Interaction effects are extracted and displayed with formatted names
Format: Variable (Level) × Variable (Level) using multiplication sign notation
Useful for testing effect modification (e.g., does treatment effect differ by sex?)
Mixed-Effects Models:
For clustered or hierarchical data (e.g., patients within hospitals):
Use model_type = "glmer" with random = "(1|cluster)" for
random intercept models
Nested random effects: random = "(1|site/patient)"
Crossed random effects: random = "(1|site) + (1|doctor)"
For survival outcomes, use model_type = "coxme"
Stratification and Clustering (Cox models):
For Cox proportional hazards models:
strata: Creates separate baseline hazards for each stratum level.
Use when hazards are non-proportional across strata but stratum effects do
not need to be estimated
cluster: Computes robust (sandwich) standard errors accounting
for within-cluster correlation. Alternative to mixed effects when only
robust SEs are needed
Filtering based on p-value:
The p_threshold parameter filters results after fitting all models:
Only outcomes with p less than or equal to the threshold are retained in output
For factor predictors, outcome is kept if any level is significant
Useful for focusing on significant associations in exploratory analyses
Default is 1 (no filtering) - recommended for confirmatory analyses
Outcome Homogeneity:
All outcomes in a single multifit() call should be of the same type
(all binary, all continuous, or all survival). Mixing outcome types produces
tables with incompatible effect measures (e.g., odds ratios alongside regression
coefficients), which can mislead readers. The function validates outcome
compatibility and issues a warning when mixed types are detected.
For analyses involving multiple outcome types, run separate multifit()
calls for each type:
# Binary outcomes
binary_results <- multifit(data, outcomes = c("death", "readmission"),
predictor = "treatment", model_type = "glm")
# Continuous outcomes
continuous_results <- multifit(data, outcomes = c("los_days", "cost"),
predictor = "treatment", model_type = "lm")
Effect Measures by Model Type:
Logistic (model_type = "glm", family = "binomial"):
Odds ratios (OR/aOR)
Cox (model_type = "coxph"): Hazard ratios (HR/aHR)
Poisson (model_type = "glm", family = "poisson"):
Rate ratios (RR/aRR)
Linear (model_type = "lm"): Coefficient estimates
Mixed effects: Same as fixed-effects counterparts
Memory and Performance:
parallel = TRUE (default) uses multiple cores for faster fitting
keep_models = FALSE (default) discards model objects to save memory
For many outcomes, parallel processing provides substantial speedup
Set keep_models = TRUE only when you need model diagnostics
A data.table with S3 class "multifit_result" containing formatted
multivariate regression results. The table structure includes:
Character. Outcome variable name or custom label
Character. For factor predictors: formatted as "Variable (Level)" showing the level being compared to reference. For binary variables where the non-reference level is an affirmative value (Yes, 1, True, Present, Positive, +), shows just "Variable". For continuous predictors: the variable name. For interactions: the formatted interaction term (e.g., "Treatment (Drug A) × Sex (Male)")
Integer. Sample size used in the model (if show_n = TRUE)
Integer. Number of events (if show_events = TRUE)
Character. Unadjusted effect
estimate with CI (if columns = "unadjusted" or "both")
Character. Adjusted
effect estimate with CI (if columns = "adjusted" or "both")
Character. Formatted p-value(s). Column
names depend on columns setting
The returned object includes the following attributes accessible via attr():
data.table. Unformatted numeric results with separate columns for effect estimates, standard errors, confidence intervals, and p-values. Suitable for custom analysis or visualization
list (if keep_models = TRUE). Named list of fitted
model objects, with outcome names as list names. Each element contains
$unadjusted and/or $adjusted models depending on settings
Character. The predictor variable name
Character vector. The outcome variable names
Character vector or NULL. The covariate variable names
Character vector or NULL. The interaction terms
Character or NULL. The random effects formula
Character or NULL. The stratification variable
Character or NULL. The clustering variable
Character. The regression model type used
Character. Which columns were displayed
Character. "multi_outcome" to identify analysis type
Character vector. Names of outcomes with p < 0.05 for the predictor (uses adjusted p-values when available)
uniscreen for screening multiple predictors against one outcome,
multiforest for creating forest plots from multifit results,
fit for single-outcome regression with full coefficient output,
fullfit for complete univariable-to-multivariable workflow
Other regression functions:
compfit(),
fit(),
fullfit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result(),
uniscreen()
# Load example data data(clintrial) data(clintrial_labels) # Example 1: Basic multivariate analysis (unadjusted) # Test treatment effect on multiple binary outcomes result1 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", labels = clintrial_labels, parallel = FALSE ) print(result1) # Shows odds ratios comparing Drug A and Drug B to Control # Example 2: Adjusted analysis with covariates # Adjust for age, sex, and disease stage result2 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), labels = clintrial_labels, parallel = FALSE ) print(result2) # Shows adjusted odds ratios (aOR) # Example 3: Compare unadjusted and adjusted results result3 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), columns = "both", labels = clintrial_labels, parallel = FALSE ) print(result3) # Useful for identifying confounding effects # Example 4: Continuous predictor across outcomes # Test age effect on multiple outcomes result4 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "age", covariates = c("sex", "treatment", "stage"), labels = clintrial_labels, parallel = FALSE ) print(result4) # One row per outcome for continuous predictor # Example 5: Cox regression for survival outcomes library(survival) cox_result <- multifit( data = clintrial, outcomes = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex", "stage"), model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) print(cox_result) # Returns hazard ratios (HR/aHR) # Example 6: Cox with stratification by site cox_strat <- multifit( data = clintrial, outcomes = c("Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex"), strata = "site", model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) print(cox_strat) # Example 7: Cox with clustered standard errors cox_cluster <- multifit( data = clintrial, outcomes = c("Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex", "stage"), cluster = "site", model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) print(cox_cluster) # Example 8: Interaction between predictor and covariate # Test if treatment effect differs by sex result_int <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), interactions = c("treatment:sex"), labels = clintrial_labels, parallel = FALSE ) print(result_int) # Shows main effects and interaction terms with × notation # Example 9: Linear model for continuous outcomes linear_result <- multifit( data = clintrial, outcomes = c("los_days", "biomarker_x"), predictor = "treatment", covariates = c("age", "sex"), model_type = "lm", labels = clintrial_labels, parallel = FALSE ) print(linear_result) # Returns coefficient estimates, not ratios # Example 10: Poisson regression for equidispersed count outcomes # fu_count has variance ~= mean, appropriate for standard Poisson poisson_result <- multifit( data = clintrial, outcomes = c("fu_count"), predictor = "treatment", covariates = c("age", "stage"), model_type = "glm", family = "poisson", labels = clintrial_labels, parallel = FALSE ) print(poisson_result) # Returns rate ratios (RR) # For overdispersed counts (ae_count), use model_type = "negbin" instead # Example 11: Filter to significant results only sig_results <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "stage", p_threshold = 0.05, labels = clintrial_labels, parallel = FALSE ) print(sig_results) # Only outcomes with significant associations shown # Example 12: Custom outcome labels result_labeled <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", labels = c( surgery = "Surgical Resection", pfs_status = "Disease Progression", os_status = "Death", treatment = "Treatment Group" ), parallel = FALSE ) print(result_labeled) # Example 13: Keep models for diagnostics result_models <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "treatment", covariates = c("age", "sex"), keep_models = TRUE, parallel = FALSE ) # Access stored models models <- attr(result_models, "models") names(models) # Get adjusted model for surgery outcome surgery_model <- models$surgery$adjusted summary(surgery_model) # Example 14: Access raw numeric data result <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "age", parallel = FALSE ) # Get unformatted results for custom analysis raw_data <- attr(result, "raw_data") print(raw_data) # Contains exp_coef, ci_lower, ci_upper, p_value, \emph{etc.} # Example 15: Hide sample size and event columns result_minimal <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "treatment", show_n = FALSE, show_events = FALSE, parallel = FALSE ) print(result_minimal) # Example 16: Customize decimal places result_digits <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "age", digits = 3, p_digits = 4, parallel = FALSE ) print(result_digits) # Example 17: Force coefficient display (no exponentiation) result_coef <- multifit( data = clintrial, outcomes = c("surgery"), predictor = "age", exponentiate = FALSE, parallel = FALSE ) print(result_coef) # Example 18: Complete publication workflow final_table <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage", "grade"), columns = "both", labels = clintrial_labels, digits = 2, p_digits = 3, parallel = FALSE ) print(final_table) # Example 19: Gamma regression for positive continuous outcomes gamma_result <- multifit( data = clintrial, outcomes = c("los_days", "recovery_days"), predictor = "treatment", covariates = c("age", "surgery"), model_type = "glm", family = Gamma(link = "log"), labels = clintrial_labels, parallel = FALSE ) print(gamma_result) # Returns multiplicative effects on positive continuous data # Example 20: Quasipoisson for overdispersed counts quasi_result <- multifit( data = clintrial, outcomes = c("ae_count"), predictor = "treatment", covariates = c("age", "diabetes"), model_type = "glm", family = "quasipoisson", labels = clintrial_labels, parallel = FALSE ) print(quasi_result) # Adjusts standard errors for overdispersion # Example 21: Generalized linear mixed effects (GLMER) # Test treatment across outcomes with site clustering if (requireNamespace("lme4", quietly = TRUE)) { glmer_result <- suppressWarnings(multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex"), random = "(1|site)", model_type = "glmer", family = "binomial", labels = clintrial_labels, parallel = FALSE )) print(glmer_result) } # Example 22: Cox mixed effects with random site effects if (requireNamespace("coxme", quietly = TRUE)) { coxme_result <- multifit( data = clintrial, outcomes = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex", "stage"), random = "(1|site)", model_type = "coxme", labels = clintrial_labels, parallel = FALSE ) print(coxme_result) } # Example 23: Multiple interactions across outcomes multi_int <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), interactions = c("treatment:stage", "treatment:sex"), labels = clintrial_labels, parallel = FALSE ) print(multi_int) # Shows how treatment effects vary by stage and sex across outcomes# Load example data data(clintrial) data(clintrial_labels) # Example 1: Basic multivariate analysis (unadjusted) # Test treatment effect on multiple binary outcomes result1 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", labels = clintrial_labels, parallel = FALSE ) print(result1) # Shows odds ratios comparing Drug A and Drug B to Control # Example 2: Adjusted analysis with covariates # Adjust for age, sex, and disease stage result2 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), labels = clintrial_labels, parallel = FALSE ) print(result2) # Shows adjusted odds ratios (aOR) # Example 3: Compare unadjusted and adjusted results result3 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), columns = "both", labels = clintrial_labels, parallel = FALSE ) print(result3) # Useful for identifying confounding effects # Example 4: Continuous predictor across outcomes # Test age effect on multiple outcomes result4 <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "age", covariates = c("sex", "treatment", "stage"), labels = clintrial_labels, parallel = FALSE ) print(result4) # One row per outcome for continuous predictor # Example 5: Cox regression for survival outcomes library(survival) cox_result <- multifit( data = clintrial, outcomes = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex", "stage"), model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) print(cox_result) # Returns hazard ratios (HR/aHR) # Example 6: Cox with stratification by site cox_strat <- multifit( data = clintrial, outcomes = c("Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex"), strata = "site", model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) print(cox_strat) # Example 7: Cox with clustered standard errors cox_cluster <- multifit( data = clintrial, outcomes = c("Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex", "stage"), cluster = "site", model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) print(cox_cluster) # Example 8: Interaction between predictor and covariate # Test if treatment effect differs by sex result_int <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), interactions = c("treatment:sex"), labels = clintrial_labels, parallel = FALSE ) print(result_int) # Shows main effects and interaction terms with × notation # Example 9: Linear model for continuous outcomes linear_result <- multifit( data = clintrial, outcomes = c("los_days", "biomarker_x"), predictor = "treatment", covariates = c("age", "sex"), model_type = "lm", labels = clintrial_labels, parallel = FALSE ) print(linear_result) # Returns coefficient estimates, not ratios # Example 10: Poisson regression for equidispersed count outcomes # fu_count has variance ~= mean, appropriate for standard Poisson poisson_result <- multifit( data = clintrial, outcomes = c("fu_count"), predictor = "treatment", covariates = c("age", "stage"), model_type = "glm", family = "poisson", labels = clintrial_labels, parallel = FALSE ) print(poisson_result) # Returns rate ratios (RR) # For overdispersed counts (ae_count), use model_type = "negbin" instead # Example 11: Filter to significant results only sig_results <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "stage", p_threshold = 0.05, labels = clintrial_labels, parallel = FALSE ) print(sig_results) # Only outcomes with significant associations shown # Example 12: Custom outcome labels result_labeled <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", labels = c( surgery = "Surgical Resection", pfs_status = "Disease Progression", os_status = "Death", treatment = "Treatment Group" ), parallel = FALSE ) print(result_labeled) # Example 13: Keep models for diagnostics result_models <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "treatment", covariates = c("age", "sex"), keep_models = TRUE, parallel = FALSE ) # Access stored models models <- attr(result_models, "models") names(models) # Get adjusted model for surgery outcome surgery_model <- models$surgery$adjusted summary(surgery_model) # Example 14: Access raw numeric data result <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "age", parallel = FALSE ) # Get unformatted results for custom analysis raw_data <- attr(result, "raw_data") print(raw_data) # Contains exp_coef, ci_lower, ci_upper, p_value, \emph{etc.} # Example 15: Hide sample size and event columns result_minimal <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "treatment", show_n = FALSE, show_events = FALSE, parallel = FALSE ) print(result_minimal) # Example 16: Customize decimal places result_digits <- multifit( data = clintrial, outcomes = c("surgery", "os_status"), predictor = "age", digits = 3, p_digits = 4, parallel = FALSE ) print(result_digits) # Example 17: Force coefficient display (no exponentiation) result_coef <- multifit( data = clintrial, outcomes = c("surgery"), predictor = "age", exponentiate = FALSE, parallel = FALSE ) print(result_coef) # Example 18: Complete publication workflow final_table <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage", "grade"), columns = "both", labels = clintrial_labels, digits = 2, p_digits = 3, parallel = FALSE ) print(final_table) # Example 19: Gamma regression for positive continuous outcomes gamma_result <- multifit( data = clintrial, outcomes = c("los_days", "recovery_days"), predictor = "treatment", covariates = c("age", "surgery"), model_type = "glm", family = Gamma(link = "log"), labels = clintrial_labels, parallel = FALSE ) print(gamma_result) # Returns multiplicative effects on positive continuous data # Example 20: Quasipoisson for overdispersed counts quasi_result <- multifit( data = clintrial, outcomes = c("ae_count"), predictor = "treatment", covariates = c("age", "diabetes"), model_type = "glm", family = "quasipoisson", labels = clintrial_labels, parallel = FALSE ) print(quasi_result) # Adjusts standard errors for overdispersion # Example 21: Generalized linear mixed effects (GLMER) # Test treatment across outcomes with site clustering if (requireNamespace("lme4", quietly = TRUE)) { glmer_result <- suppressWarnings(multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex"), random = "(1|site)", model_type = "glmer", family = "binomial", labels = clintrial_labels, parallel = FALSE )) print(glmer_result) } # Example 22: Cox mixed effects with random site effects if (requireNamespace("coxme", quietly = TRUE)) { coxme_result <- multifit( data = clintrial, outcomes = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), predictor = "treatment", covariates = c("age", "sex", "stage"), random = "(1|site)", model_type = "coxme", labels = clintrial_labels, parallel = FALSE ) print(coxme_result) } # Example 23: Multiple interactions across outcomes multi_int <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), interactions = c("treatment:stage", "treatment:sex"), labels = clintrial_labels, parallel = FALSE ) print(multi_int) # Shows how treatment effects vary by stage and sex across outcomes
Generates a publication-ready forest plot from a multifit() output
object. The plot displays effect estimates (OR, HR, RR, or coefficients) with
confidence intervals across multiple outcomes, organized by outcome with the
predictor levels shown for each.
multiforest( x, title = "Multivariate Analysis", effect_label = NULL, column = "adjusted", digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, show_events = NULL, show_predictor = NULL, covariates_footer = TRUE, indent_predictor = FALSE, bold_variables = TRUE, center_padding = 4, zebra_stripes = TRUE, color = NULL, null_line = NULL, log_scale = NULL, labels = NULL, units = "in", number_format = NULL )multiforest( x, title = "Multivariate Analysis", effect_label = NULL, column = "adjusted", digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, show_events = NULL, show_predictor = NULL, covariates_footer = TRUE, indent_predictor = FALSE, bold_variables = TRUE, center_padding = 4, zebra_stripes = TRUE, color = NULL, null_line = NULL, log_scale = NULL, labels = NULL, units = "in", number_format = NULL )
x |
Multifit result object (data.table with class attributes from
|
title |
Character string specifying the plot title. Default is
|
effect_label |
Character string for the effect measure label on the
forest plot axis. Default is |
column |
Character string specifying which results to plot when
|
digits |
Integer specifying the number of decimal places for effect estimates and confidence intervals. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Default is 1.0. |
annot_size |
Numeric value controlling the relative font size for data annotations. Default is 3.88. |
header_size |
Numeric value controlling the relative font size for column headers. Default is 5.82. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of total plot width allocated to the data table. Default is 0.6 (60% table, 40% forest plot). |
show_n |
Logical. If |
show_events |
Logical. If |
show_predictor |
Logical. If |
covariates_footer |
Logical. If |
indent_predictor |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying horizontal spacing between table and forest plot. Default is 4. |
zebra_stripes |
Logical. If |
color |
Character string specifying the color for point estimates in
the forest plot. Default is |
null_line |
Numeric value for the reference line position. Default is
|
log_scale |
Logical. If |
labels |
Named character vector providing custom display labels for
outcomes and variables. Applied to outcome names in the plot.
Default is |
units |
Character string specifying units for plot dimensions:
|
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
Plot Layout:
The forest plot is organized with outcomes as grouping headers and predictor levels (or interaction terms) as rows within each outcome. This provides a clear visual comparison of how a single predictor affects multiple outcomes.
Title: Centered at top
Data Table (left): Contains:
Outcome column (or grouped headers)
Predictor/Group column
n: Sample sizes (optional)
Events: Event counts (optional, for applicable models)
Effect (95% CI); p-value
Forest Plot (right):
Point estimates (squares)
95% confidence intervals
Reference line at null value (1 or 0)
Log scale for ratio measures
Data Source:
The function extracts effect estimates directly from the multifit output
object's raw_data attribute, which contains the numeric values
needed for plotting. This approach is efficient and ensures consistency
with the formatted table output.
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly: print(plot)
Saved to file: ggsave("forest.pdf", plot, width = 12, height = 8)
Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
Numeric. Recommended plot width in specified units
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
autoforest for automatic model detection,
multifit for multi-outcome regression analysis,
glmforest for single GLM forest plots,
coxforest for single Cox model forest plots,
lmforest for single linear model forest plots,
uniforest for univariable screening forest plots
Other visualization functions:
autoforest(),
coxforest(),
glmforest(),
lmforest(),
uniforest()
data(clintrial) data(clintrial_labels) library(survival) # Create example multifit result result <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), parallel = FALSE ) # Example 1: Basic multivariate forest plot p <- multiforest(result) old_width <- options(width = 180) # Example 2: With custom title and labels plot2 <- multiforest( result, title = "Treatment Effects Across Clinical Outcomes", labels = clintrial_labels ) # Example 3: Customize appearance plot3 <- multiforest( result, color = "#E74C3C", zebra_stripes = TRUE, labels = clintrial_labels ) # Example 4: Save with recommended dimensions dims <- attr(plot3, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "multioutcome_forest.pdf"), plot3, width = dims$width, height = dims$height) options(old_width)data(clintrial) data(clintrial_labels) library(survival) # Create example multifit result result <- multifit( data = clintrial, outcomes = c("surgery", "pfs_status", "os_status"), predictor = "treatment", covariates = c("age", "sex", "stage"), parallel = FALSE ) # Example 1: Basic multivariate forest plot p <- multiforest(result) old_width <- options(width = 180) # Example 2: With custom title and labels plot2 <- multiforest( result, title = "Treatment Effects Across Clinical Outcomes", labels = clintrial_labels ) # Example 3: Customize appearance plot3 <- multiforest( result, color = "#E74C3C", zebra_stripes = TRUE, labels = clintrial_labels ) # Example 4: Save with recommended dimensions dims <- attr(plot3, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "multioutcome_forest.pdf"), plot3, width = dims$width, height = dims$height) options(old_width)
Generates comprehensive survival summary tables with survival probabilities at specified time points, median survival times, and optional group comparisons with statistical testing. Designed for creating survival summaries commonly used in clinical and epidemiological research publications.
survtable( data, outcome, by = NULL, times = NULL, probs = 0.5, stats = c("survival", "ci"), type = "survival", conf_level = 0.95, conf_type = "log", digits = 0, time_digits = 1, p_digits = 3, percent = TRUE, test = TRUE, test_type = "logrank", total = TRUE, total_label = "Total", time_unit = NULL, time_label = NULL, median_label = NULL, labels = NULL, by_label = NULL, na_rm = TRUE, number_format = NULL, ... )survtable( data, outcome, by = NULL, times = NULL, probs = 0.5, stats = c("survival", "ci"), type = "survival", conf_level = 0.95, conf_type = "log", digits = 0, time_digits = 1, p_digits = 3, percent = TRUE, test = TRUE, test_type = "logrank", total = TRUE, total_label = "Total", time_unit = NULL, time_label = NULL, median_label = NULL, labels = NULL, by_label = NULL, na_rm = TRUE, number_format = NULL, ... )
data |
Data frame or data.table containing the survival dataset. Automatically converted to a data.table for efficient processing. |
outcome |
Character string or character vector specifying one or more
survival outcomes using |
by |
Character string specifying the column name of the stratifying
variable for group comparisons (e.g., treatment arm, risk group). When
|
times |
Numeric vector of time points at which to estimate survival
probabilities. For example, |
probs |
Numeric vector of survival probabilities for which to estimate
corresponding survival times (quantiles). Values must be between 0 and 1.
For example, |
stats |
Character vector specifying which statistics to display:
Default is |
type |
Character string specifying the type of probability to report:
|
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
conf_type |
Character string specifying the confidence interval type for survival estimates:
|
digits |
Integer specifying the number of decimal places for survival probabilities (as percentages). Default is 0 (whole percentages). |
time_digits |
Integer specifying the number of decimal places for survival time estimates (median, quantiles). Default is 1. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
percent |
Logical. If |
test |
Logical. If |
test_type |
Character string specifying the statistical test for comparing survival curves:
|
total |
Logical or character string controlling the total/overall column:
|
total_label |
Character string for the total/overall row label.
Default is |
time_unit |
Character string specifying the time unit for display
in column headers and labels (e.g., |
time_label |
Character string template for time column headers when
|
median_label |
Character string for the median survival row label.
Default is |
labels |
Named character vector or list providing custom display
labels. For stratified analyses, names should match levels of the
|
by_label |
Character string providing a custom label for the
stratifying variable (used in output attributes and headers).
Default is |
na_rm |
Logical. If |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
... |
Additional arguments passed to
|
Survival Probability Estimation:
Survival probabilities are estimated using the Kaplan-Meier method via
survfit. At each specified time point, the function
reports the estimated probability of surviving beyond that time.
Confidence Intervals:
The default "log" transformation for confidence intervals is
recommended as it ensures intervals remain within [0, 1] and has good
statistical properties. The "log-log" transformation is also
commonly used and may perform better in the tails.
Statistical Testing:
The log-rank test (default) tests the null hypothesis that survival curves are identical across groups. Alternative tests weight different parts of the survival curve:
Log-rank: Equal weights (best for proportional hazards)
Wilcoxon: Weights by number at risk (sensitive to early differences)
Tarone-Ware: Weights by square root of number at risk
Peto-Peto: Modified Wilcoxon weights
Formatting:
All numeric output respects the number_format parameter.
Separators within confidence intervals adapt automatically to avoid
ambiguity:
Survival probabilities: "85% (80%-89%)" (US) or
"85% (80%-89%)" (EU, en-dash separator)
Median survival: "24.5 (21.2-28.9)" (US) or
"24,5 (21,2-28,9)" (EU)
Counts 1000: "1,234" (US) or
"1.234" (EU)
p-values: "< 0.001" (US) or
"< 0,001" (EU)
A data.table with S3 class "survtable" containing formatted
survival statistics. The table structure depends on parameters:
When times is specified (survival at time points):
Row identifier – stratifying variable levels
Survival statistics at each requested time point
Test p-value (if test = TRUE and
by specified)
When only probs is specified (survival quantiles):
Row identifier – stratifying variable levels
Time to reach each survival probability
Test p-value (if test = TRUE and
by specified)
All numeric output (probabilities, times, counts, p-values)
respects the number_format setting for locale-appropriate
formatting.
The returned object includes the following attributes:
Data.table with unformatted numeric values
List of survfit objects for each stratum
The stratifying variable name
The time points requested
The probability quantiles requested
Full test result object (if test performed)
desctable for baseline characteristics tables,
fit for regression analysis,
table2pdf for PDF export,
table2docx for Word export,
survfit for underlying survival estimation,
survdiff for survival curve comparison tests
Other descriptive functions:
desctable(),
print.survtable()
# Load example data data(clintrial) # Example 1: Survival at specific time points by treatment survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24, 36), time_unit = "months" ) # Example 2: Median survival only survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = NULL, probs = 0.5 ) # Example 3: Multiple quantiles (quartiles) survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "stage", times = NULL, probs = c(0.25, 0.5, 0.75) ) # Example 4: Both time points and median survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), probs = 0.5, time_unit = "months" ) # Example 5: Cumulative incidence (1 - survival) survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), type = "risk" ) # Example 6: Include number at risk survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), stats = c("survival", "ci", "n_risk") ) # Example 7: Overall survival without stratification survtable( data = clintrial, outcome = "Surv(os_months, os_status)", times = c(12, 24, 36, 48) ) # Example 8: Without total row survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), total = FALSE ) # Example 9: Custom labels survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), labels = c("Drug A" = "Treatment A", "Drug B" = "Treatment B"), time_unit = "months" ) # Example 10: Different confidence interval type survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), conf_type = "log-log" ) # Example 11: Wilcoxon test instead of log-rank survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), test_type = "wilcoxon" ) # Example 12: Access raw data for custom analysis result <- survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24) ) raw <- attr(result, "raw_data") print(raw) # Example 13: Access survfit objects for plotting fits <- attr(result, "survfit_objects") plot(fits$overall) # Plot overall survival curve # Example 14: Multiple survival outcomes stacked survtable( data = clintrial, outcome = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), by = "treatment", times = c(12, 24), probs = 0.5, time_unit = "months", total = FALSE, labels = c( "Surv(pfs_months, pfs_status)" = "Progression-Free Survival", "Surv(os_months, os_status)" = "Overall Survival" ) ) # Example 15: European number formatting survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), number_format = "eu" )# Load example data data(clintrial) # Example 1: Survival at specific time points by treatment survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24, 36), time_unit = "months" ) # Example 2: Median survival only survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = NULL, probs = 0.5 ) # Example 3: Multiple quantiles (quartiles) survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "stage", times = NULL, probs = c(0.25, 0.5, 0.75) ) # Example 4: Both time points and median survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), probs = 0.5, time_unit = "months" ) # Example 5: Cumulative incidence (1 - survival) survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), type = "risk" ) # Example 6: Include number at risk survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), stats = c("survival", "ci", "n_risk") ) # Example 7: Overall survival without stratification survtable( data = clintrial, outcome = "Surv(os_months, os_status)", times = c(12, 24, 36, 48) ) # Example 8: Without total row survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), total = FALSE ) # Example 9: Custom labels survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), labels = c("Drug A" = "Treatment A", "Drug B" = "Treatment B"), time_unit = "months" ) # Example 10: Different confidence interval type survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), conf_type = "log-log" ) # Example 11: Wilcoxon test instead of log-rank survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), test_type = "wilcoxon" ) # Example 12: Access raw data for custom analysis result <- survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24) ) raw <- attr(result, "raw_data") print(raw) # Example 13: Access survfit objects for plotting fits <- attr(result, "survfit_objects") plot(fits$overall) # Plot overall survival curve # Example 14: Multiple survival outcomes stacked survtable( data = clintrial, outcome = c("Surv(pfs_months, pfs_status)", "Surv(os_months, os_status)"), by = "treatment", times = c(12, 24), probs = 0.5, time_unit = "months", total = FALSE, labels = c( "Surv(pfs_months, pfs_status)" = "Progression-Free Survival", "Surv(os_months, os_status)" = "Overall Survival" ) ) # Example 15: European number formatting survtable( data = clintrial, outcome = "Surv(os_months, os_status)", by = "treatment", times = c(12, 24), number_format = "eu" )
Converts a data frame, data.table, or matrix to a fully editable Microsoft Word
document (.docx) using the flextable and officer packages.
Creates publication-ready tables with extensive formatting options including
typography, alignment, colors, and page layout. Tables can be further edited in
Microsoft Word after creation.
table2docx( table, file, caption = NULL, font_size = 8, font_family = "Arial", format_headers = TRUE, bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, dark_header = FALSE, paper = "letter", orientation = "portrait", width = NULL, align = NULL, return_ft = FALSE, ... )table2docx( table, file, caption = NULL, font_size = 8, font_family = "Arial", format_headers = TRUE, bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, dark_header = FALSE, paper = "letter", orientation = "portrait", width = NULL, align = NULL, return_ft = FALSE, ... )
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output DOCX filename. Must have
|
caption |
Character string. Optional caption displayed above the table
in the Word document. Default is |
font_size |
Numeric. Base font size in points for table content. Default is 8. Typical range: 8-12 points. Headers use slightly larger size. |
font_family |
Character string. Font family name for the table. Must be
a font installed on the system. Default is |
format_headers |
Logical. If |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
dark_header |
Logical. If |
paper |
Character string specifying paper size:
|
orientation |
Character string specifying page orientation:
|
width |
Numeric. Table width in inches. If |
align |
Character vector specifying column alignment for each column.
Options: |
return_ft |
Logical. If |
... |
Additional arguments passed to |
Package Requirements:
This function requires:
flextable - For creating formatted tables
officer - For Word document manipulation
Install if needed:
install.packages(c("flextable", "officer"))
Output Features:
The generated Word document contains:
Fully editable table (native Word table, not image)
Professional typography and spacing
Proper page setup (size, orientation, margins)
Caption (if provided) as separate paragraph above table
All formatting preserved but editable
Compatible with Word 2007 and later
Further Customization:
For programmatic customization beyond the built-in options, access the
flextable object:
Method 1: Via attribute (default)
result <- table2docx(table, "output.docx") ft <- attr(result, "flextable") # Customize flextable ft <- flextable::bold(ft, i = 1, j = 1, part = "body") ft <- flextable::color(ft, i = 2, j = 3, color = "red") # Re-save if needed doc <- officer::read_docx() doc <- flextable::body_add_flextable(doc, ft) print(doc, target = "customized.docx")
Method 2: Direct return
ft <- table2docx(table, "output.docx", return_ft = TRUE) # Customize immediately ft <- flextable::bg(ft, bg = "yellow", part = "header") ft <- flextable::autofit(ft) # Save to new document doc <- officer::read_docx() doc <- flextable::body_add_flextable(doc, ft) print(doc, target = "custom.docx")
Page Layout:
The function automatically sets up the Word document with:
Specified paper size and orientation
Standard margins (1 inch by default)
Continuous section (no page breaks before table)
Left-aligned table placement
For landscape orientation:
Automatically swaps page width and height
Applies landscape property to section
Useful for wide tables with many columns
Table Width Management:
Width behavior:
width = NULL - Auto-fits to content and page width
width = 6 - Exactly 6 inches wide
Width distributed evenly across columns by default
Can adjust individual column widths in Word after creation
For very wide tables:
Use orientation = "landscape"
Use paper = "legal" for extra width
Reduce font_size
Use condense_table = TRUE
Consider breaking across multiple tables
Typography:
The function applies professional typography:
Column headers: Bold, slightly larger font
Body text: Regular weight, specified font size
Numbers: Right-aligned for easy comparison
Text: Left-aligned for readability
Consistent spacing: Adequate padding in cells
Font family must be installed on the system where Word opens the document. Common cross-platform choices:
Arial - Sans-serif, highly readable
Times New Roman - Serif, traditional
Calibri - Microsoft default, modern
Helvetica - Sans-serif, professional
Zebra Striping:
When zebra_stripes = TRUE:
Alternating variables receive light gray background
All rows of same variable share same shading
Improves visual grouping
Particularly useful for tables with many factor variables
Color can be changed in Word after creation
Dark Header:
When dark_header = TRUE:
Header row: Dark gray/black background
Header text: White for high contrast
Modern, professional appearance
Draws attention to column names
Integration with R Markdown/Quarto:
For R Markdown/Quarto Word output:
# Create flextable for inline display ft <- table2docx(results, "temp.docx", return_ft = TRUE) # Display in R Markdown chunk ft # Renders in Word output
Or use flextable directly in chunks:
flextable::flextable(results)
Behavior depends on return_ft:
return_ft = FALSEInvisibly returns a list with components:
file - Path to created file
caption - Caption text (if provided)
The flextable object is accessible via attr(result, "flextable")
return_ft = TRUEDirectly returns the flextable object for immediate further customization
In both cases, creates a .docx file at the specified location.
autotable for automatic format detection,
table2pptx for PowerPoint slides,
table2pdf for PDF output,
table2html for HTML tables,
table2rtf for Rich Text Format,
table2tex for LaTeX output,
flextable for the underlying table object,
read_docx for Word document manipulation
Other export functions:
autotable(),
table2html(),
table2pdf(),
table2pptx(),
table2rtf(),
table2tex()
data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Example 1: Basic Word export if (requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE)) { table2docx(results, file.path(tempdir(), "results.docx")) } old_width <- options(width = 180) # Example 2: With caption table2docx(results, file.path(tempdir(), "captioned.docx"), caption = "Table 1: Multivariable Logistic Regression Results") # Example 3: Landscape orientation for wide tables table2docx(results, file.path(tempdir(), "wide.docx"), orientation = "landscape") # Example 4: Custom font and size table2docx(results, file.path(tempdir(), "custom_font.docx"), font_family = "Times New Roman", font_size = 11) # Example 5: Hierarchical display table2docx(results, file.path(tempdir(), "indented.docx"), indent_groups = TRUE) # Example 6: Condensed table table2docx(results, file.path(tempdir(), "condensed.docx"), condense_table = TRUE) # Example 7: With zebra stripes table2docx(results, file.path(tempdir(), "striped.docx"), zebra_stripes = TRUE) # Example 8: Dark header style table2docx(results, file.path(tempdir(), "dark.docx"), dark_header = TRUE) # Example 9: A4 paper for international journals table2docx(results, file.path(tempdir(), "a4.docx"), paper = "a4") # Example 10: Get flextable for customization result <- table2docx(results, file.path(tempdir(), "base.docx")) ft <- attr(result, "flextable") # Customize the flextable ft <- flextable::bold(ft, i = 1, part = "body") ft <- flextable::color(ft, j = "p-value", color = "blue") # Example 11: Direct flextable return ft <- table2docx(results, file.path(tempdir(), "direct.docx"), return_ft = TRUE) ft <- flextable::bg(ft, bg = "yellow", part = "header") # Example 12: Publication-ready table table2docx(results, file.path(tempdir(), "publication.docx"), caption = "Table 2: Adjusted Odds Ratios for Mortality", font_family = "Times New Roman", font_size = 10, indent_groups = TRUE, zebra_stripes = FALSE, bold_significant = TRUE) # Example 13: Custom column alignment table2docx(results, file.path(tempdir(), "aligned.docx"), align = c("left", "left", "center", "right", "right")) # Example 14: Disable significance bolding table2docx(results, file.path(tempdir(), "no_bold.docx"), bold_significant = FALSE) # Example 15: Stricter significance threshold table2docx(results, file.path(tempdir(), "strict.docx"), bold_significant = TRUE, p_threshold = 0.01) options(old_width)data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Example 1: Basic Word export if (requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE)) { table2docx(results, file.path(tempdir(), "results.docx")) } old_width <- options(width = 180) # Example 2: With caption table2docx(results, file.path(tempdir(), "captioned.docx"), caption = "Table 1: Multivariable Logistic Regression Results") # Example 3: Landscape orientation for wide tables table2docx(results, file.path(tempdir(), "wide.docx"), orientation = "landscape") # Example 4: Custom font and size table2docx(results, file.path(tempdir(), "custom_font.docx"), font_family = "Times New Roman", font_size = 11) # Example 5: Hierarchical display table2docx(results, file.path(tempdir(), "indented.docx"), indent_groups = TRUE) # Example 6: Condensed table table2docx(results, file.path(tempdir(), "condensed.docx"), condense_table = TRUE) # Example 7: With zebra stripes table2docx(results, file.path(tempdir(), "striped.docx"), zebra_stripes = TRUE) # Example 8: Dark header style table2docx(results, file.path(tempdir(), "dark.docx"), dark_header = TRUE) # Example 9: A4 paper for international journals table2docx(results, file.path(tempdir(), "a4.docx"), paper = "a4") # Example 10: Get flextable for customization result <- table2docx(results, file.path(tempdir(), "base.docx")) ft <- attr(result, "flextable") # Customize the flextable ft <- flextable::bold(ft, i = 1, part = "body") ft <- flextable::color(ft, j = "p-value", color = "blue") # Example 11: Direct flextable return ft <- table2docx(results, file.path(tempdir(), "direct.docx"), return_ft = TRUE) ft <- flextable::bg(ft, bg = "yellow", part = "header") # Example 12: Publication-ready table table2docx(results, file.path(tempdir(), "publication.docx"), caption = "Table 2: Adjusted Odds Ratios for Mortality", font_family = "Times New Roman", font_size = 10, indent_groups = TRUE, zebra_stripes = FALSE, bold_significant = TRUE) # Example 13: Custom column alignment table2docx(results, file.path(tempdir(), "aligned.docx"), align = c("left", "left", "center", "right", "right")) # Example 14: Disable significance bolding table2docx(results, file.path(tempdir(), "no_bold.docx"), bold_significant = FALSE) # Example 15: Stricter significance threshold table2docx(results, file.path(tempdir(), "strict.docx"), bold_significant = TRUE, p_threshold = 0.01) options(old_width)
Converts a data frame, data.table, or matrix to HTML format with optional CSS styling for web display, HTML documents, or embedding in web applications. Generates clean, standards-compliant HTML with professional styling options including responsive design support, color schemes, and interactive features. Requires xtable for export.
table2html( table, file, caption = NULL, format_headers = TRUE, variable_padding = FALSE, bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, stripe_color = "#EEEEEE", dark_header = FALSE, include_css = TRUE, ... )table2html( table, file, caption = NULL, format_headers = TRUE, variable_padding = FALSE, bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, stripe_color = "#EEEEEE", dark_header = FALSE, include_css = TRUE, ... )
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output HTML filename. Must have
|
caption |
Character string. Optional caption displayed below the table.
Supports basic HTML formatting. Default is |
format_headers |
Logical. If |
variable_padding |
Logical. If |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
stripe_color |
Character string. HTML color specification for zebra
stripes. Can use hex codes ( |
dark_header |
Logical. If |
include_css |
Logical. If |
... |
Additional arguments passed to |
Output Format:
The function generates standards-compliant HTML5 markup with:
Semantic <table> structure
Proper <thead> and <tbody> sections
Accessible header cells (<th>)
Clean, readable markup
Optional embedded CSS styling
Standalone vs. Embedded:
Standalone HTML (include_css = TRUE):
Can be opened directly in web browsers
Includes all necessary styling
Self-contained, portable
Suitable for sharing via email or web hosting
Embedded HTML (include_css = FALSE):
For inclusion in existing HTML documents
No CSS included (use parent document's styles)
Smaller file size
Integrates with web frameworks (Shiny, R Markdown, Quarto)
CSS Styling:
When include_css = TRUE, the function applies professional styling:
Table: Border-collapse, sans-serif font (Arial), 20px margin
Cells: 8px vertical × 12px horizontal padding, left-aligned text
Borders: 1px solid #DDD (light gray)
Headers: Bold text, light gray background (#F2F2F2)
Numeric columns: Center-aligned (auto-detected)
Caption: Bold, 1.1em font, positioned below table
With dark_header = TRUE:
Header background: Black (#000000)
Header text: White (#FFFFFF)
Creates high contrast, modern appearance
With zebra_stripes = TRUE:
Alternating variable groups receive background color
Default color: #EEEEEE (light gray)
Applied via CSS class .zebra-stripe
Groups entire variable (all factor levels together)
Hierarchical Display:
The indent_groups option creates visual hierarchy using HTML
non-breaking spaces:
<td><b>Treatment</b></td> <!-- Variable name --> <td> Control</td> <!-- Indented level --> <td> Active</td> <!-- Indented level -->
Integration with R Markdown/Quarto:
For R Markdown or Quarto documents:
# Generate HTML fragment (no CSS) table2html(results, "table.html", include_css = FALSE)
Then include in your document chunk with results='asis':
cat(readLines("table.html"), sep = "\n")
Or directly render without file:
# For inline display
htmltools::HTML(
capture.output(
print(xtable::xtable(results), type = "html")
)
)
Integration with Shiny:
For Shiny applications:
# In server function
output$results_table <- renderUI({
table2html(results_data(), "temp.html", include_css = FALSE)
HTML(readLines("temp.html"))
})
# Or use directly with DT package for interactive tables
output$interactive_table <- DT::renderDT({
results_data()
})
Accessibility:
The generated HTML follows accessibility best practices:
Semantic table structure
Proper header cells (<th>) with scope attributes
Clear visual hierarchy
Adequate color contrast (when using default styles)
Screen reader friendly markup
Invisibly returns NULL. Creates an HTML file at the specified
location that can be opened in web browsers or embedded in HTML documents.
autotable for automatic format detection,
table2pdf for PDF output,
table2tex for LaTeX output,
table2docx for Word documents,
table2pptx for PowerPoint,
table2rtf for Rich Text Format,
fit for regression tables,
desctable for descriptive tables
Other export functions:
autotable(),
table2docx(),
table2pdf(),
table2pptx(),
table2rtf(),
table2tex()
data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Example 1: Basic HTML export (standalone) if (requireNamespace("xtable", quietly = TRUE)) { table2html(results, file.path(tempdir(), "results.html")) } # Example 2: With caption table2html(results, file.path(tempdir(), "captioned.html"), caption = "Table 1: Multivariable Logistic Regression Results") # Example 3: For embedding (no CSS) table2html(results, file.path(tempdir(), "embed.html"), include_css = FALSE) # Include in your HTML document # Example 4: Hierarchical display table2html(results, file.path(tempdir(), "indented.html"), indent_groups = TRUE) # Example 5: Condensed table table2html(results, file.path(tempdir(), "condensed.html"), condense_table = TRUE) # Example 6: With zebra stripes table2html(results, file.path(tempdir(), "striped.html"), zebra_stripes = TRUE, stripe_color = "#F0F0F0") # Example 7: Dark header style table2html(results, file.path(tempdir(), "dark.html"), dark_header = TRUE) # Example 8: Combination styling table2html(results, file.path(tempdir(), "styled.html"), zebra_stripes = TRUE, dark_header = TRUE, bold_significant = TRUE) # Example 9: Custom stripe color table2html(results, file.path(tempdir(), "blue_stripes.html"), zebra_stripes = TRUE, stripe_color = "#E3F2FD") # Light blue # Example 10: Disable significance bolding table2html(results, file.path(tempdir(), "no_bold.html"), bold_significant = FALSE) # Example 11: Stricter significance threshold table2html(results, file.path(tempdir(), "strict.html"), bold_significant = TRUE, p_threshold = 0.01) # Example 12: No header formatting table2html(results, file.path(tempdir(), "raw_headers.html"), format_headers = FALSE) # Example 13: Descriptive statistics table desc_table <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), labels = clintrial_labels) table2html(desc_table, file.path(tempdir(), "baseline.html"), caption = "Table 1: Baseline Characteristics by Treatment Group") # Example 14: For R Markdown (no CSS, for inline display) table2html(results, file.path(tempdir(), "rmd_table.html"), include_css = FALSE, indent_groups = TRUE) # Then in R Markdown, use a chunk with results='asis' to display inline: cat(readLines(file.path(tempdir(), "rmd_table.html")), sep = "\n") # Example 15: Email-friendly version table2html(results, file.path(tempdir(), "email.html"), include_css = TRUE, # Self-contained zebra_stripes = TRUE, caption = "Regression Results - See Attached") # Can be directly included in HTML emails # Example 16: Publication-ready web version table2html(results, file.path(tempdir(), "publication.html"), caption = "Table 2: Multivariable Analysis of Risk Factors", indent_groups = TRUE, zebra_stripes = FALSE, # Clean look bold_significant = TRUE, dark_header = FALSE) # Example 17: Modern dark theme table2html(results, file.path(tempdir(), "dark_theme.html"), dark_header = TRUE, stripe_color = "#2A2A2A", # Dark gray stripes zebra_stripes = TRUE) # Example 18: Minimal styling for custom CSS table2html(results, file.path(tempdir(), "minimal.html"), include_css = FALSE, format_headers = FALSE, bold_significant = FALSE) # Apply your own CSS classes and styling # Example 19: Model comparison table models <- list( base = c("age", "sex"), full = c("age", "sex", "treatment", "stage") ) comparison <- compfit( data = clintrial, outcome = "os_status", model_list = models ) table2html(comparison, file.path(tempdir(), "comparison.html"), caption = "Model Comparison Statistics")data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Example 1: Basic HTML export (standalone) if (requireNamespace("xtable", quietly = TRUE)) { table2html(results, file.path(tempdir(), "results.html")) } # Example 2: With caption table2html(results, file.path(tempdir(), "captioned.html"), caption = "Table 1: Multivariable Logistic Regression Results") # Example 3: For embedding (no CSS) table2html(results, file.path(tempdir(), "embed.html"), include_css = FALSE) # Include in your HTML document # Example 4: Hierarchical display table2html(results, file.path(tempdir(), "indented.html"), indent_groups = TRUE) # Example 5: Condensed table table2html(results, file.path(tempdir(), "condensed.html"), condense_table = TRUE) # Example 6: With zebra stripes table2html(results, file.path(tempdir(), "striped.html"), zebra_stripes = TRUE, stripe_color = "#F0F0F0") # Example 7: Dark header style table2html(results, file.path(tempdir(), "dark.html"), dark_header = TRUE) # Example 8: Combination styling table2html(results, file.path(tempdir(), "styled.html"), zebra_stripes = TRUE, dark_header = TRUE, bold_significant = TRUE) # Example 9: Custom stripe color table2html(results, file.path(tempdir(), "blue_stripes.html"), zebra_stripes = TRUE, stripe_color = "#E3F2FD") # Light blue # Example 10: Disable significance bolding table2html(results, file.path(tempdir(), "no_bold.html"), bold_significant = FALSE) # Example 11: Stricter significance threshold table2html(results, file.path(tempdir(), "strict.html"), bold_significant = TRUE, p_threshold = 0.01) # Example 12: No header formatting table2html(results, file.path(tempdir(), "raw_headers.html"), format_headers = FALSE) # Example 13: Descriptive statistics table desc_table <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), labels = clintrial_labels) table2html(desc_table, file.path(tempdir(), "baseline.html"), caption = "Table 1: Baseline Characteristics by Treatment Group") # Example 14: For R Markdown (no CSS, for inline display) table2html(results, file.path(tempdir(), "rmd_table.html"), include_css = FALSE, indent_groups = TRUE) # Then in R Markdown, use a chunk with results='asis' to display inline: cat(readLines(file.path(tempdir(), "rmd_table.html")), sep = "\n") # Example 15: Email-friendly version table2html(results, file.path(tempdir(), "email.html"), include_css = TRUE, # Self-contained zebra_stripes = TRUE, caption = "Regression Results - See Attached") # Can be directly included in HTML emails # Example 16: Publication-ready web version table2html(results, file.path(tempdir(), "publication.html"), caption = "Table 2: Multivariable Analysis of Risk Factors", indent_groups = TRUE, zebra_stripes = FALSE, # Clean look bold_significant = TRUE, dark_header = FALSE) # Example 17: Modern dark theme table2html(results, file.path(tempdir(), "dark_theme.html"), dark_header = TRUE, stripe_color = "#2A2A2A", # Dark gray stripes zebra_stripes = TRUE) # Example 18: Minimal styling for custom CSS table2html(results, file.path(tempdir(), "minimal.html"), include_css = FALSE, format_headers = FALSE, bold_significant = FALSE) # Apply your own CSS classes and styling # Example 19: Model comparison table models <- list( base = c("age", "sex"), full = c("age", "sex", "treatment", "stage") ) comparison <- compfit( data = clintrial, outcome = "os_status", model_list = models ) table2html(comparison, file.path(tempdir(), "comparison.html"), caption = "Model Comparison Statistics")
Converts a data frame, data.table, or matrix to a professionally formatted PDF document using LaTeX as an intermediate format. Provides extensive control over page layout, typography, and formatting for publication-ready output. Particularly well-suited for tables from regression analyses, descriptive statistics, and model comparisons. Requires xtable for export.
table2pdf( table, file, orientation = "portrait", paper = "letter", margins = NULL, fit_to_page = TRUE, font_size = 8, caption = NULL, caption_size = NULL, format_headers = TRUE, variable_padding = FALSE, cell_padding = "normal", bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, align = NULL, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, stripe_color = "gray!20", dark_header = FALSE, show_logs = FALSE, ... )table2pdf( table, file, orientation = "portrait", paper = "letter", margins = NULL, fit_to_page = TRUE, font_size = 8, caption = NULL, caption_size = NULL, format_headers = TRUE, variable_padding = FALSE, cell_padding = "normal", bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, align = NULL, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, stripe_color = "gray!20", dark_header = FALSE, show_logs = FALSE, ... )
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output PDF filename. Must have
|
orientation |
Character string specifying page orientation:
|
paper |
Character string specifying paper size:
|
margins |
Numeric vector of length 4 specifying margins in inches as
|
fit_to_page |
Logical. If |
font_size |
Numeric. Base font size in points. Default is 8. Smaller values accommodate more content; larger values improve readability. Typical range: 6-12 points. |
caption |
Character string. Optional caption displayed below the table.
Supports LaTeX formatting for multi-line captions, superscripts, italics, etc.
See Details for formatting guidance. Default is |
caption_size |
Numeric. Caption font size in points. If |
format_headers |
Logical. If |
variable_padding |
Logical. If |
cell_padding |
Character string or numeric specifying vertical padding within table cells:
Adjusts |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
align |
Character string or vector specifying column alignment. Options:
If |
indent_groups |
Logical. If |
condense_table |
Logical. If
Significantly reduces table height. Default is |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
stripe_color |
Character string. LaTeX color specification for zebra
stripes. Default is |
dark_header |
Logical. If |
show_logs |
Logical. If |
... |
Additional arguments passed to |
LaTeX Requirements:
This function requires a working LaTeX installation. The function checks for LaTeX availability and provides installation guidance if missing.
Recommended LaTeX distributions:
TinyTeX (lightweight, R-integrated): Install via
tinytex::install_tinytex()
TeX Live (comprehensive, cross-platform)
MiKTeX (Windows)
MacTeX (macOS)
Required LaTeX packages (auto-installed with most distributions):
fontenc, inputenc - Character encoding
array, booktabs, longtable - Table formatting
graphicx - Scaling tables
geometry - Page layout
pdflscape, lscape - Landscape orientation
helvet - Sans-serif fonts
standalone, varwidth - Auto-sizing (for paper = "auto")
float, caption - Floats and captions
xcolor, colortable - Colors (for zebra_stripes or dark_header)
Caption Formatting:
Captions support LaTeX commands for rich formatting:
# Multi-line caption with line breaks
caption = "Table 1: Multivariable Analysis\\
OR = odds ratio; CI = confidence interval"
# With superscripts (using LaTeX syntax)
caption = "Table 1: Results\\
Adjusted for age and sex\\
p-values from Wald tests"
# With special characters (must escape percent signs)
caption = "Results for income (in thousands)"
Auto-Sizing (paper = "auto"):
When paper = "auto", the function attempts to create a minimal PDF
sized exactly to the table content:
Using the standalone LaTeX class (cleanest output)
Fallback to pdfcrop utility if standalone unavailable
Fallback to minimal margins if neither available
Table Width Management:
For wide tables that don't fit on the page:
Use orientation = "landscape"
Use fit_to_page = TRUE (default) to auto-scale
Reduce font_size (e.g., 7 or 6)
Consider paper = "auto" for maximum flexibility
Troubleshooting:
If PDF compilation fails:
Check that LaTeX is installed: Run Sys.which("pdflatex")
Set show_logs = TRUE and examine the .log file
Common issues:
Missing LaTeX packages: Install via package manager
Special characters in text: Escape properly
Very wide tables: Use landscape or reduce font size
Caption formatting: Check LaTeX syntax
Invisibly returns NULL. Creates a PDF file at the specified
location. If compilation fails, check the .log file (if
show_logs = TRUE) for error details.
autotable for automatic format detection,
table2tex for LaTeX source files,
table2html for HTML output,
table2docx for Microsoft Word,
table2pptx for PowerPoint,
table2rtf for Rich Text Format,
desctable for descriptive tables,
fit for regression tables
Other export functions:
autotable(),
table2docx(),
table2html(),
table2pptx(),
table2rtf(),
table2tex()
data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Test that LaTeX can compile (needed for all PDF examples) has_latex <- local({ if (!nzchar(Sys.which("pdflatex"))) return(FALSE) test_tex <- file.path(tempdir(), "summata_latex_test.tex") writeLines(c("\\documentclass{article}", "\\usepackage{booktabs}", "\\begin{document}", "test", "\\end{document}"), test_tex) tryCatch( system2("pdflatex", c("-interaction=nonstopmode", paste0("-output-directory=", tempdir()), test_tex), stdout = FALSE, stderr = FALSE), error = function(e) 1L) == 0L }) # Example 1: Basic PDF export if(has_latex){ table2pdf(results, file.path(tempdir(), "basic_results.pdf")) } if(has_latex){ # Example 2: Landscape orientation for wide tables table2pdf(results, file.path(tempdir(), "wide_results.pdf"), orientation = "landscape") # Example 3: With caption table2pdf(results, file.path(tempdir(), "captioned.pdf"), caption = "Table 1: Multivariable logistic regression results") # Example 4: Multi-line caption with formatting table2pdf(results, file.path(tempdir(), "formatted_caption.pdf"), caption = "Table 1: Risk Factors for Mortality\\\\ aOR = adjusted odds ratio; CI = confidence interval") # Example 5: Auto-sized PDF (no fixed page dimensions) table2pdf(results, file.path(tempdir(), "autosize.pdf"), paper = "auto") # Example 6: A4 paper with custom margins table2pdf(results, file.path(tempdir(), "a4_custom.pdf"), paper = "a4", margins = c(0.75, 0.75, 0.75, 0.75)) # Example 7: Larger font for readability table2pdf(results, file.path(tempdir(), "large_font.pdf"), font_size = 11) # Example 8: Indented hierarchical display table2pdf(results, file.path(tempdir(), "indented.pdf"), indent_groups = TRUE) # Example 9: Condensed table (reduced height) table2pdf(results, file.path(tempdir(), "condensed.pdf"), condense_table = TRUE) # Example 10: With zebra stripes table2pdf(results, file.path(tempdir(), "striped.pdf"), zebra_stripes = TRUE, stripe_color = "gray!15") # Example 11: Dark header style table2pdf(results, file.path(tempdir(), "dark_header.pdf"), dark_header = TRUE) # Example 12: Combination of formatting options table2pdf(results, file.path(tempdir(), "publication_ready.pdf"), orientation = "portrait", paper = "letter", font_size = 9, caption = "Table 2: Multivariable Analysis\\\\ Model adjusted for age, sex, and clinical factors", indent_groups = TRUE, zebra_stripes = TRUE, bold_significant = TRUE, p_threshold = 0.05) # Example 13: Adjust cell padding table2pdf(results, file.path(tempdir(), "relaxed_padding.pdf"), cell_padding = "relaxed") # More spacious # Example 14: No scaling (natural table width) table2pdf(results, file.path(tempdir(), "no_scale.pdf"), fit_to_page = FALSE, font_size = 10) # Example 15: Hide significance bolding table2pdf(results, file.path(tempdir(), "no_bold.pdf"), bold_significant = FALSE) # Example 16: Custom column alignment table2pdf(results, file.path(tempdir(), "custom_align.pdf"), align = c("c", "c", "c", "c", "c", "c", "c")) # Example 17: Descriptive statistics table desc_table <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi", "stage"), labels = clintrial_labels) table2pdf(desc_table, file.path(tempdir(), "descriptive.pdf"), caption = "Table 1: Baseline Characteristics by Treatment Group", orientation = "landscape") # Example 18: Model comparison table models <- list( base = c("age", "sex"), full = c("age", "sex", "bmi", "treatment") ) comparison <- compfit( data = clintrial, outcome = "os_status", model_list = models ) table2pdf(comparison, file.path(tempdir(), "model_comparison.pdf"), caption = "Table 3: Model Comparison Statistics") # Example 19: Very wide table with aggressive fitting wide_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "race", "bmi", "smoking", "hypertension", "diabetes", "treatment", "stage") ) table2pdf(wide_model, file.path(tempdir(), "very_wide.pdf"), orientation = "landscape", font_size = 7, fit_to_page = TRUE, condense_table = TRUE) # Example 20: With caption size control table2pdf(results, file.path(tempdir(), "caption_size.pdf"), font_size = 8, caption_size = 6, caption = "Table 4: Results with Compact Caption\\\\ Smaller caption fits better on constrained pages") # Example 21: Troubleshooting - keep logs table2pdf(results, file.path(tempdir(), "debug.pdf"), show_logs = TRUE) # If it fails, check debug.log for error messages }data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Test that LaTeX can compile (needed for all PDF examples) has_latex <- local({ if (!nzchar(Sys.which("pdflatex"))) return(FALSE) test_tex <- file.path(tempdir(), "summata_latex_test.tex") writeLines(c("\\documentclass{article}", "\\usepackage{booktabs}", "\\begin{document}", "test", "\\end{document}"), test_tex) tryCatch( system2("pdflatex", c("-interaction=nonstopmode", paste0("-output-directory=", tempdir()), test_tex), stdout = FALSE, stderr = FALSE), error = function(e) 1L) == 0L }) # Example 1: Basic PDF export if(has_latex){ table2pdf(results, file.path(tempdir(), "basic_results.pdf")) } if(has_latex){ # Example 2: Landscape orientation for wide tables table2pdf(results, file.path(tempdir(), "wide_results.pdf"), orientation = "landscape") # Example 3: With caption table2pdf(results, file.path(tempdir(), "captioned.pdf"), caption = "Table 1: Multivariable logistic regression results") # Example 4: Multi-line caption with formatting table2pdf(results, file.path(tempdir(), "formatted_caption.pdf"), caption = "Table 1: Risk Factors for Mortality\\\\ aOR = adjusted odds ratio; CI = confidence interval") # Example 5: Auto-sized PDF (no fixed page dimensions) table2pdf(results, file.path(tempdir(), "autosize.pdf"), paper = "auto") # Example 6: A4 paper with custom margins table2pdf(results, file.path(tempdir(), "a4_custom.pdf"), paper = "a4", margins = c(0.75, 0.75, 0.75, 0.75)) # Example 7: Larger font for readability table2pdf(results, file.path(tempdir(), "large_font.pdf"), font_size = 11) # Example 8: Indented hierarchical display table2pdf(results, file.path(tempdir(), "indented.pdf"), indent_groups = TRUE) # Example 9: Condensed table (reduced height) table2pdf(results, file.path(tempdir(), "condensed.pdf"), condense_table = TRUE) # Example 10: With zebra stripes table2pdf(results, file.path(tempdir(), "striped.pdf"), zebra_stripes = TRUE, stripe_color = "gray!15") # Example 11: Dark header style table2pdf(results, file.path(tempdir(), "dark_header.pdf"), dark_header = TRUE) # Example 12: Combination of formatting options table2pdf(results, file.path(tempdir(), "publication_ready.pdf"), orientation = "portrait", paper = "letter", font_size = 9, caption = "Table 2: Multivariable Analysis\\\\ Model adjusted for age, sex, and clinical factors", indent_groups = TRUE, zebra_stripes = TRUE, bold_significant = TRUE, p_threshold = 0.05) # Example 13: Adjust cell padding table2pdf(results, file.path(tempdir(), "relaxed_padding.pdf"), cell_padding = "relaxed") # More spacious # Example 14: No scaling (natural table width) table2pdf(results, file.path(tempdir(), "no_scale.pdf"), fit_to_page = FALSE, font_size = 10) # Example 15: Hide significance bolding table2pdf(results, file.path(tempdir(), "no_bold.pdf"), bold_significant = FALSE) # Example 16: Custom column alignment table2pdf(results, file.path(tempdir(), "custom_align.pdf"), align = c("c", "c", "c", "c", "c", "c", "c")) # Example 17: Descriptive statistics table desc_table <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi", "stage"), labels = clintrial_labels) table2pdf(desc_table, file.path(tempdir(), "descriptive.pdf"), caption = "Table 1: Baseline Characteristics by Treatment Group", orientation = "landscape") # Example 18: Model comparison table models <- list( base = c("age", "sex"), full = c("age", "sex", "bmi", "treatment") ) comparison <- compfit( data = clintrial, outcome = "os_status", model_list = models ) table2pdf(comparison, file.path(tempdir(), "model_comparison.pdf"), caption = "Table 3: Model Comparison Statistics") # Example 19: Very wide table with aggressive fitting wide_model <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "race", "bmi", "smoking", "hypertension", "diabetes", "treatment", "stage") ) table2pdf(wide_model, file.path(tempdir(), "very_wide.pdf"), orientation = "landscape", font_size = 7, fit_to_page = TRUE, condense_table = TRUE) # Example 20: With caption size control table2pdf(results, file.path(tempdir(), "caption_size.pdf"), font_size = 8, caption_size = 6, caption = "Table 4: Results with Compact Caption\\\\ Smaller caption fits better on constrained pages") # Example 21: Troubleshooting - keep logs table2pdf(results, file.path(tempdir(), "debug.pdf"), show_logs = TRUE) # If it fails, check debug.log for error messages }
Converts a data frame, data.table, or matrix to a Microsoft PowerPoint slide
(.pptx) with a formatted table using the flextable and officer
packages. Creates presentation-ready slides with extensive control over table
formatting, positioning, and layout. Tables can be further edited in PowerPoint
after creation. Ideal for creating data-driven presentations and conference talks.
table2pptx( table, file, caption = NULL, font_size = 10, font_family = "Arial", format_headers = TRUE, bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, dark_header = FALSE, width = NULL, align = NULL, template = NULL, layout = "Title and Content", master = "Office Theme", left = 0.5, top = 1.5, return_ft = FALSE, ... )table2pptx( table, file, caption = NULL, font_size = 10, font_family = "Arial", format_headers = TRUE, bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, dark_header = FALSE, width = NULL, align = NULL, template = NULL, layout = "Title and Content", master = "Office Theme", left = 0.5, top = 1.5, return_ft = FALSE, ... )
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output PPTX filename. Must have
|
caption |
Character string. Optional title displayed in the slide's title
placeholder or as text box above the table. Default is |
font_size |
Numeric. Base font size in points for table content. Default is 10. Typical range for presentations: 10-14 points. Larger than print documents for visibility at distance. |
font_family |
Character string. Font family name for the table. Must be
installed on the system. Default is |
format_headers |
Logical. If |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
dark_header |
Logical. If |
width |
Numeric. Table width in inches. If |
align |
Character vector specifying column alignment. Options:
|
template |
Character string. Path to custom PPTX template file. If
|
layout |
Character string. Name of slide layout to use from template.
Default is
|
master |
Character string. Name of slide master to use. Default is
|
left |
Numeric. Horizontal position from left edge of slide in inches. Default is 0.5. Standard slide is 10 inches wide. |
top |
Numeric. Vertical position from top edge of slide in inches. Default is 1.5 (leaves room for title). Standard slide is 7.5 inches tall. Adjust based on table size and layout. |
return_ft |
Logical. If |
... |
Additional arguments passed to |
Package Requirements:
Requires:
flextable - Table creation and formatting
officer - PowerPoint manipulation
Install: install.packages(c("flextable", "officer"))
Slide Dimensions:
Standard PowerPoint slide:
Width: 10 inches (25.4 cm)
Height: 7.5 inches (19.05 cm)
Aspect ratio: 4:3 (standard) or 16:9 (widescreen)
Safe content area (with margins):
Width: ~9 inches
Height: ~6 inches (accounting for title)
Positioning:
The left and top parameters control table placement:
(0, 0) = Top-left corner of slide
Default (0.5, 1.5) = Standard position with title room
Center: left = (10 - table_width) / 2
When caption is provided:
Attempts to use title placeholder (if layout supports)
Falls back to text box above table
Automatically adjusts table position downward
Slide Layouts:
Different layouts serve different purposes:
Title and Content (default):
Has title and content placeholders
Caption goes in title area
Table in content area
Most common for data slides
Blank:
No predefined areas
Maximum flexibility
Use absolute positioning (left, top)
Good for custom layouts
Title-Only:
Title area only
Large space for table
Good for data-heavy slides
Custom Templates:
Use organizational or conference templates:
table2pptx(table, "branded.pptx",
template = "company_template.pptx",
layout = "Content Layout", # Name from template
master = "Company Theme") # Name from template
To find layout and master names in template:
pres <- officer::read_pptx("template.pptx")
officer::layout_summary(pres)
Multiple Slides:
Creating presentations with multiple tables:
# Each call creates new presentation - combine after
table2pptx(table1, "slide1.pptx", caption = "Results Part 1")
table2pptx(table2, "slide2.pptx", caption = "Results Part 2")
# Then manually combine in PowerPoint, or:
# Use officer to create multi-slide presentation
pres <- officer::read_pptx()
# Add first table
ft1 <- table2pptx(table1, "temp1.pptx", return_ft = TRUE)
pres <- officer::add_slide(pres)
pres <- officer::ph_with(pres, ft1,
location = officer::ph_location(left = 0.5, top = 1.5))
# Add second table
ft2 <- table2pptx(table2, "temp2.pptx", return_ft = TRUE)
pres <- officer::add_slide(pres)
pres <- officer::ph_with(pres, ft2,
location = officer::ph_location(left = 0.5, top = 1.5))
print(pres, target = "combined.pptx")
Further Customization:
Access the flextable object for advanced formatting:
ft <- table2pptx(table, "base.pptx", return_ft = TRUE)
# Customize
ft <- flextable::color(ft, j = "p-value", color = "red")
ft <- flextable::bg(ft, i = 1, bg = "yellow")
ft <- flextable::bold(ft, i = ~ estimate > 0, j = "estimate")
# Save to new slide
pres <- officer::read_pptx()
pres <- officer::add_slide(pres)
pres <- officer::ph_with(pres, ft,
location = officer::ph_location(left = 0.5, top = 1.5))
print(pres, target = "custom.pptx")
Behavior depends on return_ft:
return_ft = FALSEInvisibly returns a list with:
file - Path to created file
caption - Caption/title text
layout - Layout name used
master - Master name used
template - Template path (if provided)
position - List with left and top coordinates
Flextable accessible via attr(result, "flextable")
return_ft = TRUEDirectly returns the flextable object
Always creates a .pptx file at the specified location.
autotable for automatic format detection,
table2docx for Word documents,
table2pdf for PDF output,
table2html for HTML tables,
table2rtf for Rich Text Format,
table2tex for LaTeX output,
flextable for table customization,
read_pptx for PowerPoint manipulation
Other export functions:
autotable(),
table2docx(),
table2html(),
table2pdf(),
table2rtf(),
table2tex()
# Create example data data(clintrial) data(clintrial_labels) tbl <- desctable(clintrial, by = "treatment", variables = c("age", "sex"), labels = clintrial_labels) # Basic PowerPoint export if (requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE)) { table2pptx(tbl, file.path(tempdir(), "example.pptx")) } old_width <- options(width = 180) # Load data data(clintrial) data(clintrial_labels) # Create regression table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment"), labels = clintrial_labels ) # Example 1: Basic PowerPoint slide table2pptx(results, file.path(tempdir(), "results.pptx")) # Example 2: With title table2pptx(results, file.path(tempdir(), "titled.pptx"), caption = "Multivariable Regression Results") # Example 3: Larger font for visibility table2pptx(results, file.path(tempdir(), "large_font.pptx"), font_size = 12, caption = "Main Findings") # Example 4: Condensed for slide space table2pptx(results, file.path(tempdir(), "condensed.pptx"), condense_table = TRUE, caption = "Key Results") # Example 5: Dark header for emphasis table2pptx(results, file.path(tempdir(), "dark.pptx"), dark_header = TRUE, caption = "Risk Factors") # Example 6: With zebra stripes table2pptx(results, file.path(tempdir(), "striped.pptx"), zebra_stripes = TRUE) # Example 7: Blank layout with custom positioning table2pptx(results, file.path(tempdir(), "blank.pptx"), layout = "Blank", left = 1, top = 1.5, width = 8) # Example 8: Get flextable for customization ft <- table2pptx(results, file.path(tempdir(), "base.pptx"), return_ft = TRUE) # Customize the returned flextable object ft <- flextable::color(ft, j = "p-value", color = "darkred") # Example 9: Presentation-optimized table table2pptx(results, file.path(tempdir(), "presentation.pptx"), caption = "Main Analysis Results", font_size = 11, condense_table = TRUE, zebra_stripes = TRUE, dark_header = TRUE, bold_significant = TRUE) # Example 10: Descriptive statistics slide desc <- desctable( data = clintrial, by = "treatment", variables = c("age", "sex", "bmi"), labels = clintrial_labels ) table2pptx(desc, file.path(tempdir(), "baseline.pptx"), caption = "Baseline Characteristics", font_size = 10) # Example 11: Conference presentation style table2pptx(results, file.path(tempdir(), "conference.pptx"), caption = "Study Outcomes", font_family = "Calibri", font_size = 14, # Large for big rooms dark_header = TRUE, condense_table = TRUE) options(old_width)# Create example data data(clintrial) data(clintrial_labels) tbl <- desctable(clintrial, by = "treatment", variables = c("age", "sex"), labels = clintrial_labels) # Basic PowerPoint export if (requireNamespace("flextable", quietly = TRUE) && requireNamespace("officer", quietly = TRUE)) { table2pptx(tbl, file.path(tempdir(), "example.pptx")) } old_width <- options(width = 180) # Load data data(clintrial) data(clintrial_labels) # Create regression table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment"), labels = clintrial_labels ) # Example 1: Basic PowerPoint slide table2pptx(results, file.path(tempdir(), "results.pptx")) # Example 2: With title table2pptx(results, file.path(tempdir(), "titled.pptx"), caption = "Multivariable Regression Results") # Example 3: Larger font for visibility table2pptx(results, file.path(tempdir(), "large_font.pptx"), font_size = 12, caption = "Main Findings") # Example 4: Condensed for slide space table2pptx(results, file.path(tempdir(), "condensed.pptx"), condense_table = TRUE, caption = "Key Results") # Example 5: Dark header for emphasis table2pptx(results, file.path(tempdir(), "dark.pptx"), dark_header = TRUE, caption = "Risk Factors") # Example 6: With zebra stripes table2pptx(results, file.path(tempdir(), "striped.pptx"), zebra_stripes = TRUE) # Example 7: Blank layout with custom positioning table2pptx(results, file.path(tempdir(), "blank.pptx"), layout = "Blank", left = 1, top = 1.5, width = 8) # Example 8: Get flextable for customization ft <- table2pptx(results, file.path(tempdir(), "base.pptx"), return_ft = TRUE) # Customize the returned flextable object ft <- flextable::color(ft, j = "p-value", color = "darkred") # Example 9: Presentation-optimized table table2pptx(results, file.path(tempdir(), "presentation.pptx"), caption = "Main Analysis Results", font_size = 11, condense_table = TRUE, zebra_stripes = TRUE, dark_header = TRUE, bold_significant = TRUE) # Example 10: Descriptive statistics slide desc <- desctable( data = clintrial, by = "treatment", variables = c("age", "sex", "bmi"), labels = clintrial_labels ) table2pptx(desc, file.path(tempdir(), "baseline.pptx"), caption = "Baseline Characteristics", font_size = 10) # Example 11: Conference presentation style table2pptx(results, file.path(tempdir(), "conference.pptx"), caption = "Study Outcomes", font_family = "Calibri", font_size = 14, # Large for big rooms dark_header = TRUE, condense_table = TRUE) options(old_width)
Converts a data frame, data.table, or matrix to a Rich Text Format (.rtf)
document using the flextable and officer packages. Creates
widely compatible tables with extensive formatting options. RTF files can be
opened and edited in Microsoft Word, LibreOffice, WordPad, and many other word
processors. Particularly useful for regulatory submissions, cross-platform
compatibility, and when maximum editability is required.
table2rtf( table, file, caption = NULL, font_size = 8, font_family = "Arial", format_headers = TRUE, bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, dark_header = FALSE, paper = "letter", orientation = "portrait", width = NULL, align = NULL, return_ft = FALSE, ... )table2rtf( table, file, caption = NULL, font_size = 8, font_family = "Arial", format_headers = TRUE, bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, zebra_stripes = FALSE, dark_header = FALSE, paper = "letter", orientation = "portrait", width = NULL, align = NULL, return_ft = FALSE, ... )
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output RTF filename. Must have
|
caption |
Character string. Optional caption displayed above the table
in the RTF document. Default is |
font_size |
Numeric. Base font size in points for table content. Default is 8. Typical range: 8-12 points. Headers use slightly larger size. |
font_family |
Character string. Font family name for the table. Must be
a font installed on the system. Default is |
format_headers |
Logical. If |
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
zebra_stripes |
Logical. If |
dark_header |
Logical. If |
paper |
Character string specifying paper size:
|
orientation |
Character string specifying page orientation:
|
width |
Numeric. Table width in inches. If |
align |
Character vector specifying column alignment for each column.
Options: |
return_ft |
Logical. If |
... |
Additional arguments (currently unused, reserved for future extensions). |
Package Requirements:
This function requires:
flextable - For creating formatted tables
officer - For RTF document generation
Install if needed:
install.packages(c("flextable", "officer"))
RTF Format Advantages:
RTF (Rich Text Format) is a universal document format with several advantages:
Maximum compatibility - Opens in virtually all word processors
Cross-platform - Works on Windows, Mac, Linux without conversion
Fully editable - Native text format, not embedded objects
Lightweight - Smaller file sizes than DOCX
Regulatory compliance - Widely accepted for submissions (FDA, EMA)
Long-term accessibility - Simple text-based format
Version control friendly - Text-based, works with diff tools
Applications that can open RTF files:
Microsoft Word (Windows, Mac)
LibreOffice Writer
Apache OpenOffice Writer
WordPad (Windows built-in)
TextEdit (Mac built-in)
Google Docs (with import)
Pages (Mac)
Many other word processors
Output Features:
The generated RTF document contains:
Fully editable table (native RTF table, not image)
Professional typography and spacing
Proper page setup (size, orientation, margins)
Caption (if provided) as separate paragraph above table
All formatting preserved but editable
Compatible with RTF 1.5 specification
Further Customization:
For programmatic customization beyond the built-in options, access the
flextable object:
Method 1: Via attribute (default)
result <- table2rtf(table, "output.rtf") ft <- attr(result, "flextable") # Customize flextable ft <- flextable::bold(ft, i = 1, j = 1, part = "body") ft <- flextable::color(ft, i = 2, j = 3, color = "red") # Re-save if needed flextable::save_as_rtf(ft, path = "customized.rtf")
Method 2: Direct return
ft <- table2rtf(table, "output.rtf", return_ft = TRUE) # Customize immediately ft <- flextable::bg(ft, bg = "yellow", part = "header") ft <- flextable::autofit(ft) # Save to new file flextable::save_as_rtf(ft, path = "custom.rtf")
Page Layout:
The function automatically sets up the RTF document with:
Specified paper size and orientation
Standard margins (1 inch by default)
Table positioned at document start
Left-aligned table placement
For landscape orientation:
Automatically swaps page dimensions
Applies landscape property
Useful for wide tables with many columns
Table Width Management:
Width behavior:
width = NULL - Auto-fits to content and page width
width = 6 - Exactly 6 inches wide
Width distributed evenly across columns by default
Can adjust individual column widths in word processor after creation
For very wide tables:
Use orientation = "landscape"
Use paper = "legal" for extra width
Reduce font_size
Use condense_table = TRUE
Consider breaking across multiple tables
Typography:
The function applies professional typography:
Column headers: Bold, slightly larger font
Body text: Regular weight, specified font size
Numbers: Right-aligned for easy comparison
Text: Left-aligned for readability
Consistent spacing: Adequate padding in cells
Statistical notation: Italicized appropriately
Behavior depends on return_ft:
return_ft = FALSEInvisibly returns a list with components:
file - Path to created file
caption - Caption text (if provided)
The flextable object is accessible via attr(result, "flextable")
return_ft = TRUEDirectly returns the flextable object for immediate further customization
In both cases, creates a .rtf file at the specified location.
autotable for automatic format detection,
table2docx for Word documents,
table2pptx for PowerPoint slides,
table2pdf for PDF output,
table2html for HTML tables,
table2tex for LaTeX output,
flextable for the underlying table object,
save_as_rtf for direct RTF export
Other export functions:
autotable(),
table2docx(),
table2html(),
table2pdf(),
table2pptx(),
table2tex()
data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Example 1: Basic RTF export if (requireNamespace("flextable", quietly = TRUE)) { table2rtf(results, file.path(tempdir(), "results.rtf")) } old_width <- options(width = 180) # Example 2: With caption table2rtf(results, file.path(tempdir(), "captioned.rtf"), caption = "Table 1: Multivariable Logistic Regression Results") # Example 3: Landscape orientation for wide tables table2rtf(results, file.path(tempdir(), "wide.rtf"), orientation = "landscape") # Example 4: Custom font and size table2rtf(results, file.path(tempdir(), "custom_font.rtf"), font_family = "Times New Roman", font_size = 11) # Example 5: Hierarchical display table2rtf(results, file.path(tempdir(), "indented.rtf"), indent_groups = TRUE) # Example 6: Condensed table table2rtf(results, file.path(tempdir(), "condensed.rtf"), condense_table = TRUE) # Example 7: With zebra stripes table2rtf(results, file.path(tempdir(), "striped.rtf"), zebra_stripes = TRUE) # Example 8: Dark header style table2rtf(results, file.path(tempdir(), "dark.rtf"), dark_header = TRUE) # Example 9: A4 paper for international submissions table2rtf(results, file.path(tempdir(), "a4.rtf"), paper = "a4") # Example 10: Get flextable for customization result <- table2rtf(results, file.path(tempdir(), "base.rtf")) ft <- attr(result, "flextable") # Customize the flextable ft <- flextable::bold(ft, i = 1, part = "body") ft <- flextable::color(ft, j = "p-value", color = "blue") # Re-save flextable::save_as_rtf(ft, path = file.path(tempdir(), "customized.rtf")) # Example 11: Direct flextable return ft <- table2rtf(results, file.path(tempdir(), "direct.rtf"), return_ft = TRUE) ft <- flextable::bg(ft, bg = "yellow", part = "header") # Example 12: Regulatory submission table table2rtf(results, file.path(tempdir(), "submission.rtf"), caption = "Table 2: Adjusted Odds Ratios for Mortality", font_family = "Times New Roman", font_size = 10, indent_groups = TRUE, zebra_stripes = FALSE, bold_significant = TRUE) # Example 13: Custom column alignment table2rtf(results, file.path(tempdir(), "aligned.rtf"), align = c("left", "left", "center", "right", "right")) # Example 14: Disable significance bolding table2rtf(results, file.path(tempdir(), "no_bold.rtf"), bold_significant = FALSE) # Example 15: Stricter significance threshold table2rtf(results, file.path(tempdir(), "strict.rtf"), bold_significant = TRUE, p_threshold = 0.01) # Example 16: Descriptive statistics for baseline characteristics desc <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi", "stage"), labels = clintrial_labels) table2rtf(desc, file.path(tempdir(), "baseline.rtf"), caption = "Table 1: Baseline Patient Characteristics", zebra_stripes = TRUE) # Example 17: Clinical trial efficacy table table2rtf(results, file.path(tempdir(), "efficacy.rtf"), caption = "Table 3: Primary Efficacy Analysis - Intent to Treat Population", font_family = "Courier New", # Monospace for alignment paper = "letter", orientation = "landscape", condense_table = TRUE) options(old_width)data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Example 1: Basic RTF export if (requireNamespace("flextable", quietly = TRUE)) { table2rtf(results, file.path(tempdir(), "results.rtf")) } old_width <- options(width = 180) # Example 2: With caption table2rtf(results, file.path(tempdir(), "captioned.rtf"), caption = "Table 1: Multivariable Logistic Regression Results") # Example 3: Landscape orientation for wide tables table2rtf(results, file.path(tempdir(), "wide.rtf"), orientation = "landscape") # Example 4: Custom font and size table2rtf(results, file.path(tempdir(), "custom_font.rtf"), font_family = "Times New Roman", font_size = 11) # Example 5: Hierarchical display table2rtf(results, file.path(tempdir(), "indented.rtf"), indent_groups = TRUE) # Example 6: Condensed table table2rtf(results, file.path(tempdir(), "condensed.rtf"), condense_table = TRUE) # Example 7: With zebra stripes table2rtf(results, file.path(tempdir(), "striped.rtf"), zebra_stripes = TRUE) # Example 8: Dark header style table2rtf(results, file.path(tempdir(), "dark.rtf"), dark_header = TRUE) # Example 9: A4 paper for international submissions table2rtf(results, file.path(tempdir(), "a4.rtf"), paper = "a4") # Example 10: Get flextable for customization result <- table2rtf(results, file.path(tempdir(), "base.rtf")) ft <- attr(result, "flextable") # Customize the flextable ft <- flextable::bold(ft, i = 1, part = "body") ft <- flextable::color(ft, j = "p-value", color = "blue") # Re-save flextable::save_as_rtf(ft, path = file.path(tempdir(), "customized.rtf")) # Example 11: Direct flextable return ft <- table2rtf(results, file.path(tempdir(), "direct.rtf"), return_ft = TRUE) ft <- flextable::bg(ft, bg = "yellow", part = "header") # Example 12: Regulatory submission table table2rtf(results, file.path(tempdir(), "submission.rtf"), caption = "Table 2: Adjusted Odds Ratios for Mortality", font_family = "Times New Roman", font_size = 10, indent_groups = TRUE, zebra_stripes = FALSE, bold_significant = TRUE) # Example 13: Custom column alignment table2rtf(results, file.path(tempdir(), "aligned.rtf"), align = c("left", "left", "center", "right", "right")) # Example 14: Disable significance bolding table2rtf(results, file.path(tempdir(), "no_bold.rtf"), bold_significant = FALSE) # Example 15: Stricter significance threshold table2rtf(results, file.path(tempdir(), "strict.rtf"), bold_significant = TRUE, p_threshold = 0.01) # Example 16: Descriptive statistics for baseline characteristics desc <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi", "stage"), labels = clintrial_labels) table2rtf(desc, file.path(tempdir(), "baseline.rtf"), caption = "Table 1: Baseline Patient Characteristics", zebra_stripes = TRUE) # Example 17: Clinical trial efficacy table table2rtf(results, file.path(tempdir(), "efficacy.rtf"), caption = "Table 3: Primary Efficacy Analysis - Intent to Treat Population", font_family = "Courier New", # Monospace for alignment paper = "letter", orientation = "landscape", condense_table = TRUE) options(old_width)
Converts a data frame, data.table, or matrix to LaTeX source code suitable for
inclusion in LaTeX documents. Generates publication-quality table markup with
extensive formatting options including booktabs styling, color schemes, and
hierarchical displays. Output can be directly \input{} or \include{}
into LaTeX manuscripts. Requires xtable for export.
table2tex( table, file, format_headers = TRUE, variable_padding = FALSE, cell_padding = "normal", bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, align = NULL, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, booktabs = FALSE, zebra_stripes = FALSE, stripe_color = "gray!20", dark_header = FALSE, caption = NULL, caption_size = NULL, label = NULL, show_logs = FALSE, ... )table2tex( table, file, format_headers = TRUE, variable_padding = FALSE, cell_padding = "normal", bold_significant = TRUE, bold_variables = FALSE, p_threshold = 0.05, align = NULL, indent_groups = FALSE, condense_table = FALSE, condense_quantitative = FALSE, booktabs = FALSE, zebra_stripes = FALSE, stripe_color = "gray!20", dark_header = FALSE, caption = NULL, caption_size = NULL, label = NULL, show_logs = FALSE, ... )
table |
Data frame, data.table, or matrix to export. Can be output from
|
file |
Character string specifying the output |
format_headers |
Logical. If |
variable_padding |
Logical. If |
cell_padding |
Character string or numeric. Vertical padding within cells:
|
bold_significant |
Logical. If |
bold_variables |
Logical. If |
p_threshold |
Numeric. Threshold for bold p-value formatting. Only
used when |
align |
Character string or vector specifying column alignment:
If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
condense_quantitative |
Logical. If |
booktabs |
Logical. If |
zebra_stripes |
Logical. If |
stripe_color |
Character string. LaTeX color specification for zebra
stripes (e.g., |
dark_header |
Logical. If |
caption |
Character string. Table caption for LaTeX caption command.
Supports multi-line captions using double backslash. Default is |
caption_size |
Numeric. Caption font size in points. If |
label |
Character string. LaTeX label for cross-references.
Example: |
show_logs |
Logical. If |
... |
Additional arguments passed to |
Output Format:
The function generates a standalone LaTeX tabular environment that can be:
Included in documents with \input command
Embedded in table/figure environments
Used in manuscript classes (article, report, etc.)
The output includes:
Complete tabular environment with proper alignment
Horizontal rules (\hline or booktabs rules)
Column headers with optional formatting
Data rows with automatic escaping of special characters
Optional caption and label commands
Required LaTeX Packages:
Add these to your LaTeX document preamble:
Always required:
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{array}
\usepackage{graphicx}
Optional (based on parameters):
\usepackage{booktabs}
\usepackage[table]{xcolor}
Booktabs Style:
When booktabs = TRUE, the table uses publication-quality rules:
\toprule - Heavy rule at top
\midrule - Medium rule below headers
\bottomrule - Heavy rule at bottom
No vertical rules (booktabs style)
Better spacing around rules
This is the preferred style for most academic journals.
Color Features:
Zebra Stripes: Creates alternating background colors for visual grouping:
zebra_stripes = TRUE stripe_color = "gray!20" # 20% gray stripe_color = "blue!10" # 10% blue
Dark Header: Creates high-contrast header row:
dark_header = TRUE # Black background, white text
Both require the xcolor package with table option in your document.
Integration with LaTeX Documents:
Basic inclusion:
\begin{table}[htbp]
\centering
\caption{Regression Results}
\label{tab:regression}
\input{results.tex}
\end{table}
With resizing:
\begin{table}[htbp]
\centering
\caption{Results}
\resizebox{\textwidth}{!}{\input{results.tex}}
\end{table}
Landscape orientation:
\usepackage{pdflscape}
\begin{landscape}
\begin{table}[htbp]
\centering
\input{wide_results.tex}
\end{table}
\end{landscape}
Caption Formatting:
Captions in the caption parameter are written as LaTeX comments in
the output file for reference. For actual LaTeX captions, wrap the table
in a table environment (see examples above).
Special Characters:
The function automatically escapes LaTeX special characters in your data:
Ampersand, percent, dollar sign, hash, underscore
Left and right braces
Tilde and caret (using textasciitilde and textasciicircum)
Variable names and labels should not include these characters unless intentionally using LaTeX commands.
Invisibly returns NULL. Creates a .tex file at the specified
location containing a LaTeX tabular environment.
autotable for automatic format detection,
table2pdf for direct PDF output,
table2html for HTML tables,
table2docx for Word documents,
table2pptx for PowerPoint,
table2rtf for Rich Text Format,
fit for regression tables,
desctable for descriptive tables
Other export functions:
autotable(),
table2docx(),
table2html(),
table2pdf(),
table2pptx(),
table2rtf()
data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Example 1: Basic LaTeX export if (requireNamespace("xtable", quietly = TRUE)) { table2tex(results, file.path(tempdir(), "basic.tex")) } # Example 2: With booktabs for publication table2tex(results, file.path(tempdir(), "publication.tex"), booktabs = TRUE, caption = "Multivariable logistic regression results", label = "tab:regression") # Example 3: Multi-line caption with abbreviations table2tex(results, file.path(tempdir(), "detailed.tex"), booktabs = TRUE, caption = "Table 1: Risk Factors for Mortality\\\\ aOR = adjusted odds ratio; CI = confidence interval\\\\ Model adjusted for age, sex, treatment, and disease stage", label = "tab:mortality") # Example 4: Hierarchical display with indentation table2tex(results, file.path(tempdir(), "indented.tex"), indent_groups = TRUE, booktabs = TRUE) # Example 5: Condensed table (reduced height) table2tex(results, file.path(tempdir(), "condensed.tex"), condense_table = TRUE, booktabs = TRUE) # Example 6: With zebra stripes table2tex(results, file.path(tempdir(), "striped.tex"), zebra_stripes = TRUE, stripe_color = "gray!15", booktabs = TRUE) # Remember to add \usepackage[table]{xcolor} to the LaTeX document # Example 7: Dark header style table2tex(results, file.path(tempdir(), "dark_header.tex"), dark_header = TRUE, booktabs = TRUE) # Requires \usepackage[table]{xcolor} # Example 8: Custom cell padding table2tex(results, file.path(tempdir(), "relaxed.tex"), cell_padding = "relaxed", booktabs = TRUE) # Example 9: Custom column alignment (auto-detected by default) table2tex(results, file.path(tempdir(), "custom_align.tex"), align = c("c", "c", "c", "c", "c", "c", "c")) # Example 10: No header formatting (keep original names) table2tex(results, file.path(tempdir(), "raw_headers.tex"), format_headers = FALSE) # Example 11: Disable significance bolding table2tex(results, file.path(tempdir(), "no_bold.tex"), bold_significant = FALSE, booktabs = TRUE) # Example 12: Stricter significance threshold table2tex(results, file.path(tempdir(), "strict_sig.tex"), bold_significant = TRUE, p_threshold = 0.01, # Bold only if p < 0.01 booktabs = TRUE) # Example 13: With caption size control table2tex(results, file.path(tempdir(), "caption_size.tex"), caption_size = 6, caption = "Table 1 - Results with Compact Caption\\\\ Smaller caption fits better on constrained pages") # Example 14: Complete publication-ready table table2tex(results, file.path(tempdir(), "final_table1.tex"), booktabs = TRUE, caption = "Table 1: Multivariable Analysis of Mortality Risk Factors", label = "tab:main_results", indent_groups = TRUE, zebra_stripes = FALSE, # Many journals prefer no stripes bold_significant = TRUE, cell_padding = "normal") # Example 15: Descriptive statistics table desc_table <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), labels = clintrial_labels) table2tex(desc_table, file.path(tempdir(), "table1_descriptive.tex"), booktabs = TRUE, caption = "Table 1: Baseline Characteristics", label = "tab:baseline") # Example 16: Model comparison table models <- list( base = c("age", "sex"), full = c("age", "sex", "treatment", "stage") ) comparison <- compfit( data = clintrial, outcome = "os_status", model_list = models ) table2tex(comparison, file.path(tempdir(), "model_comparison.tex"), booktabs = TRUE, caption = "Model Comparison Statistics", label = "tab:models")data(clintrial) data(clintrial_labels) # Create example table results <- fit( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), labels = clintrial_labels ) # Example 1: Basic LaTeX export if (requireNamespace("xtable", quietly = TRUE)) { table2tex(results, file.path(tempdir(), "basic.tex")) } # Example 2: With booktabs for publication table2tex(results, file.path(tempdir(), "publication.tex"), booktabs = TRUE, caption = "Multivariable logistic regression results", label = "tab:regression") # Example 3: Multi-line caption with abbreviations table2tex(results, file.path(tempdir(), "detailed.tex"), booktabs = TRUE, caption = "Table 1: Risk Factors for Mortality\\\\ aOR = adjusted odds ratio; CI = confidence interval\\\\ Model adjusted for age, sex, treatment, and disease stage", label = "tab:mortality") # Example 4: Hierarchical display with indentation table2tex(results, file.path(tempdir(), "indented.tex"), indent_groups = TRUE, booktabs = TRUE) # Example 5: Condensed table (reduced height) table2tex(results, file.path(tempdir(), "condensed.tex"), condense_table = TRUE, booktabs = TRUE) # Example 6: With zebra stripes table2tex(results, file.path(tempdir(), "striped.tex"), zebra_stripes = TRUE, stripe_color = "gray!15", booktabs = TRUE) # Remember to add \usepackage[table]{xcolor} to the LaTeX document # Example 7: Dark header style table2tex(results, file.path(tempdir(), "dark_header.tex"), dark_header = TRUE, booktabs = TRUE) # Requires \usepackage[table]{xcolor} # Example 8: Custom cell padding table2tex(results, file.path(tempdir(), "relaxed.tex"), cell_padding = "relaxed", booktabs = TRUE) # Example 9: Custom column alignment (auto-detected by default) table2tex(results, file.path(tempdir(), "custom_align.tex"), align = c("c", "c", "c", "c", "c", "c", "c")) # Example 10: No header formatting (keep original names) table2tex(results, file.path(tempdir(), "raw_headers.tex"), format_headers = FALSE) # Example 11: Disable significance bolding table2tex(results, file.path(tempdir(), "no_bold.tex"), bold_significant = FALSE, booktabs = TRUE) # Example 12: Stricter significance threshold table2tex(results, file.path(tempdir(), "strict_sig.tex"), bold_significant = TRUE, p_threshold = 0.01, # Bold only if p < 0.01 booktabs = TRUE) # Example 13: With caption size control table2tex(results, file.path(tempdir(), "caption_size.tex"), caption_size = 6, caption = "Table 1 - Results with Compact Caption\\\\ Smaller caption fits better on constrained pages") # Example 14: Complete publication-ready table table2tex(results, file.path(tempdir(), "final_table1.tex"), booktabs = TRUE, caption = "Table 1: Multivariable Analysis of Mortality Risk Factors", label = "tab:main_results", indent_groups = TRUE, zebra_stripes = FALSE, # Many journals prefer no stripes bold_significant = TRUE, cell_padding = "normal") # Example 15: Descriptive statistics table desc_table <- desctable(clintrial, by = "treatment", variables = c("age", "sex", "bmi"), labels = clintrial_labels) table2tex(desc_table, file.path(tempdir(), "table1_descriptive.tex"), booktabs = TRUE, caption = "Table 1: Baseline Characteristics", label = "tab:baseline") # Example 16: Model comparison table models <- list( base = c("age", "sex"), full = c("age", "sex", "treatment", "stage") ) comparison <- compfit( data = clintrial, outcome = "os_status", model_list = models ) table2tex(comparison, file.path(tempdir(), "model_comparison.tex"), booktabs = TRUE, caption = "Model Comparison Statistics", label = "tab:models")
Generates a publication-ready forest plot from a uniscreen() output
object. The plot displays effect estimates (OR, HR, RR, or coefficients) with
confidence intervals for each predictor tested in univariable analysis against
a single outcome.
uniforest( x, title = "Univariable Screening", effect_label = NULL, digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, show_events = NULL, indent_groups = FALSE, condense_table = FALSE, bold_variables = FALSE, center_padding = 4, zebra_stripes = TRUE, color = NULL, null_line = NULL, log_scale = NULL, labels = NULL, show_footer = TRUE, units = "in", number_format = NULL )uniforest( x, title = "Univariable Screening", effect_label = NULL, digits = 2, p_digits = 3, conf_level = 0.95, font_size = 1, annot_size = 3.88, header_size = 5.82, title_size = 23.28, plot_width = NULL, plot_height = NULL, table_width = 0.6, show_n = TRUE, show_events = NULL, indent_groups = FALSE, condense_table = FALSE, bold_variables = FALSE, center_padding = 4, zebra_stripes = TRUE, color = NULL, null_line = NULL, log_scale = NULL, labels = NULL, show_footer = TRUE, units = "in", number_format = NULL )
x |
Univariable screen result object (data.table with class attributes
from |
title |
Character string specifying the plot title. Default is
|
effect_label |
Character string for the effect measure label on the
forest plot axis. Default is |
digits |
Integer specifying the number of decimal places for effect estimates and confidence intervals. Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
conf_level |
Numeric confidence level for confidence intervals. Must be
between 0 and 1. Default is 0.95 (95% confidence intervals). The CI
percentage is automatically displayed in column headers (e.g., "90% CI"
when |
font_size |
Numeric multiplier controlling the base font size for all text elements. Default is 1.0. |
annot_size |
Numeric value controlling the relative font size for data annotations. Default is 3.88. |
header_size |
Numeric value controlling the relative font size for column headers. Default is 5.82. |
title_size |
Numeric value controlling the relative font size for the main plot title. Default is 23.28. |
plot_width |
Numeric value specifying the intended output width in
specified |
plot_height |
Numeric value specifying the intended output height in
specified |
table_width |
Numeric value between 0 and 1 specifying the proportion of total plot width allocated to the data table. Default is 0.6 (60% table, 40% forest plot). |
show_n |
Logical. If |
show_events |
Logical. If |
indent_groups |
Logical. If |
condense_table |
Logical. If |
bold_variables |
Logical. If |
center_padding |
Numeric value specifying horizontal spacing between table and forest plot. Default is 4. |
zebra_stripes |
Logical. If |
color |
Character string specifying the color for point estimates in
the forest plot. Default is |
null_line |
Numeric value for the reference line position. Default is
|
log_scale |
Logical. If |
labels |
Named character vector providing custom display labels for
variables. Applied to predictor names in the plot.
Default is |
show_footer |
Logical. If |
units |
Character string specifying units for plot dimensions:
|
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
The forest plot displays univariable (unadjusted) associations between each predictor and the outcome. This is useful for:
Visualizing results of initial variable screening
Identifying potential predictors for multivariable modeling
Presenting crude associations alongside adjusted results
Quick visual assessment of effect sizes and significance
The plot automatically handles:
Different effect types (OR, HR, RR, coefficients) with appropriate axis scaling (log vs linear)
Factor variables with multiple levels (grouped under variable name)
Continuous variables (single row per predictor)
Reference categories for categorical variables
A ggplot object containing the complete forest plot. The plot
can be:
Displayed directly: print(plot)
Saved to file: ggsave("forest.pdf", plot, width = 12, height = 8)
Further customized with ggplot2 functions
The returned object includes an attribute "rec_dims"
accessible via attr(plot, "rec_dims"), which is a list
containing:
Numeric. Recommended plot width in specified units
Numeric. Recommended plot height in specified units
These recommendations are automatically calculated based on the number of
variables, text sizes, and layout parameters, and are printed to console
if plot_width or plot_height are not specified.
autoforest for automatic model detection,
uniscreen for generating univariable screening results,
multiforest for multi-outcome forest plots,
coxforest, glmforest, lmforest for
single-model forest plots
Other visualization functions:
autoforest(),
coxforest(),
glmforest(),
lmforest(),
multiforest()
data(clintrial) data(clintrial_labels) # Create example uniscreen result uni_results <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "smoking", "treatment", "stage"), labels = clintrial_labels, parallel = FALSE ) # Example 1: Basic univariable forest plot p <- uniforest(uni_results, title = "Univariable Associations with Mortality") old_width <- options(width = 180) # Example 2: Survival analysis library(survival) surv_results <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) p2 <- uniforest(surv_results, title = "Univariable Survival Analysis") # Example 3: Linear regression lm_results <- uniscreen( data = clintrial, outcome = "los_days", predictors = c("age", "sex", "surgery", "diabetes"), model_type = "lm", labels = clintrial_labels, parallel = FALSE ) p3 <- uniforest(lm_results, title = "Predictors of Length of Stay") # Example 4: Customize appearance p4 <- uniforest( uni_results, title = "Crude Associations with Mortality", color = "#E74C3C", indent_groups = TRUE, zebra_stripes = TRUE, bold_variables = TRUE ) # Example 5: Save with recommended dimensions dims <- attr(p4, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "univariable_forest.pdf"), p4, width = dims$width, height = dims$height) options(old_width)data(clintrial) data(clintrial_labels) # Create example uniscreen result uni_results <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "smoking", "treatment", "stage"), labels = clintrial_labels, parallel = FALSE ) # Example 1: Basic univariable forest plot p <- uniforest(uni_results, title = "Univariable Associations with Mortality") old_width <- options(width = 180) # Example 2: Survival analysis library(survival) surv_results <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) p2 <- uniforest(surv_results, title = "Univariable Survival Analysis") # Example 3: Linear regression lm_results <- uniscreen( data = clintrial, outcome = "los_days", predictors = c("age", "sex", "surgery", "diabetes"), model_type = "lm", labels = clintrial_labels, parallel = FALSE ) p3 <- uniforest(lm_results, title = "Predictors of Length of Stay") # Example 4: Customize appearance p4 <- uniforest( uni_results, title = "Crude Associations with Mortality", color = "#E74C3C", indent_groups = TRUE, zebra_stripes = TRUE, bold_variables = TRUE ) # Example 5: Save with recommended dimensions dims <- attr(p4, "rec_dims") ggplot2::ggsave(file.path(tempdir(), "univariable_forest.pdf"), p4, width = dims$width, height = dims$height) options(old_width)
Performs comprehensive univariable (unadjusted) regression analyses by fitting separate models for each predictor against a single outcome. This function is designed for initial variable screening, hypothesis generation, and understanding crude associations before multivariable modeling. Returns publication-ready formatted results with optional p-value filtering.
uniscreen( data, outcome, predictors, model_type = "glm", family = "binomial", random = NULL, p_threshold = 0.05, conf_level = 0.95, reference_rows = TRUE, show_n = TRUE, show_events = TRUE, digits = 2, p_digits = 3, labels = NULL, keep_models = FALSE, exponentiate = NULL, conf_method = NULL, parallel = TRUE, n_cores = NULL, number_format = NULL, verbose = NULL, ... )uniscreen( data, outcome, predictors, model_type = "glm", family = "binomial", random = NULL, p_threshold = 0.05, conf_level = 0.95, reference_rows = TRUE, show_n = TRUE, show_events = TRUE, digits = 2, p_digits = 3, labels = NULL, keep_models = FALSE, exponentiate = NULL, conf_method = NULL, parallel = TRUE, n_cores = NULL, number_format = NULL, verbose = NULL, ... )
data |
Data frame or data.table containing the analysis dataset. The function automatically converts data frames to data.tables for efficient processing. |
outcome |
Character string specifying the outcome variable name. For
survival analysis, use |
predictors |
Character vector of predictor variable names to screen. Each predictor is tested independently in its own univariable model. Can include continuous, categorical (factor), or binary variables. |
model_type |
Character string specifying the type of regression model to fit. Options include:
|
family |
For GLM and GLMER models, specifies the error distribution and link function. Can be a character string, a family function, or a family object. Ignored for non-GLM/GLMER models. Binary/Binomial outcomes:
Count outcomes:
Continuous outcomes:
Positive continuous outcomes:
For negative binomial regression (overdispersed counts), use
See |
random |
Character string specifying the random-effects formula for
mixed-effects models ( |
p_threshold |
Numeric value between 0 and 1 specifying the p-value threshold used to count significant predictors in the printed summary. All predictors are always included in the output table. Default is 0.05. |
conf_level |
Numeric confidence level for confidence intervals. Must be between 0 and 1. Default is 0.95 (95% confidence intervals). |
reference_rows |
Logical. If |
show_n |
Logical. If |
show_events |
Logical. If |
digits |
Integer specifying the number of decimal places for effect estimates (OR, HR, RR, coefficients). Default is 2. |
p_digits |
Integer specifying the number of decimal places for
p-values. Values smaller than |
labels |
Named character vector or list providing custom display
labels for variables. Names should match predictor names, values are the
display labels. Predictors not in |
keep_models |
Logical. If |
exponentiate |
Logical. Whether to exponentiate coefficients (display
OR/HR/RR instead of log odds/log hazards). Default is |
conf_method |
Character string controlling the confidence interval method.
If
Cox and mixed-effects models use Wald intervals regardless of this setting.
Set globally with |
parallel |
Logical. If |
n_cores |
Integer specifying the number of CPU cores to use for parallel
processing. Default is |
number_format |
Character string or two-element character vector controlling thousand and decimal separators in formatted output. Named presets:
Or provide a custom two-element vector When
options(summata.number_format = "eu")
|
verbose |
Logical. If |
... |
Additional arguments passed to the underlying model fitting functions
( |
Analysis Approach:
The function implements a comprehensive univariable screening workflow:
For each predictor in predictors, fits a separate model:
outcome ~ predictor
Extracts coefficients, confidence intervals, and p-values from each model
Combines results into a single table for easy comparison
Formats output for publication with appropriate effect measures
Each predictor is tested independently - these are crude (unadjusted) associations that do not account for confounding or interaction effects.
When to Use Univariable Screening:
Initial variable selection: Identify predictors associated with the outcome before building multivariable models
Hypothesis generation: Explore potential associations in exploratory analyses
Understanding crude associations: Report unadjusted effects alongside adjusted estimates
Variable reduction: Use p-value thresholds (e.g., p < 0.20) to identify candidates for multivariable modeling
Checking multicollinearity: Compare univariable and multivariable effects to identify potential collinearity
Threshold for p-values:
The p_threshold parameter controls the significance threshold used
in the printed summary to count how many predictors are significant. All
predictors are always included in the output table regardless of this setting.
Effect Measures by Model Type:
Logistic regression (model_type = "glm",
family = "binomial"): Odds ratios (OR)
Cox regression (model_type = "coxph"): Hazard ratios (HR)
Poisson regression (model_type = "glm",
family = "poisson"): Rate/risk ratios (RR)
Negative binomial (model_type = "negbin"): Rate ratios (RR)
Linear regression (model_type = "lm" or GLM with
identity link): Raw coefficient estimates
Gamma regression (model_type = "glm",
family = "Gamma"): Multiplicative effects (with default log link)
Memory Considerations:
When keep_models = FALSE (default), fitted models are discarded after
extracting results to conserve memory. Set keep_models = TRUE only when
the following are needed:
Model diagnostic plots
Predictions from individual models
Additional model statistics not extracted by default
Further analysis of specific models
A data.table with S3 class "uniscreen_result" containing formatted
univariable screening results. The table structure includes:
Character. Predictor name or custom label (from labels)
Character. For factor variables: category level. For continuous variables: typically empty or descriptive statistic label
Integer. Sample size used in the model (if show_n = TRUE)
Integer. Sample size for this specific factor level (factor variables only)
Integer. Total number of events in the model for survival
or logistic regression (if show_events = TRUE)
Integer. Number of events for this specific factor level (factor variables only)
Character. Formatted effect estimate with confidence interval. Column name depends on model type: "OR (95% CI)" for logistic, "HR (95% CI)" for survival, "RR (95% CI)" for counts, "Coefficient (95% CI)" for linear models
Character. Formatted p-value from the Wald test
The returned object includes the following attributes accessible via attr():
data.table. Unformatted numeric results with separate columns for coefficients, standard errors, confidence interval bounds, etc. Suitable for further statistical analysis or custom formatting
List (if keep_models = TRUE). Named list of fitted
model objects, with predictor names as list names. Access specific models
via attr(result, "models")[["predictor_name"]]
Character. The outcome variable name used
Character. The regression model type used
Character. Always "Univariable" for screening results
Character. Always "univariable" to identify the analysis type
Numeric. The p-value threshold used for significance
Character vector. Names of predictors with p-value below the screening threshold, suitable for passing directly to downstream modeling functions
fit for fitting a single multivariable model,
fullfit for complete univariable-to-multivariable workflow,
compfit for comparing multiple models,
m2dt for converting individual models to tables
Other regression functions:
compfit(),
fit(),
fullfit(),
multifit(),
print.compfit_result(),
print.fit_result(),
print.fullfit_result(),
print.multifit_result(),
print.uniscreen_result()
# Load example data data(clintrial) data(clintrial_labels) # Example 1: Basic logistic regression screening screen1 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension"), model_type = "glm", family = "binomial", parallel = FALSE ) print(screen1) # Example 2: With custom variable labels screen2 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "treatment"), labels = clintrial_labels, parallel = FALSE ) print(screen2) # Example 3: Filter by p-value threshold # Only keep predictors with p < 0.20 (common for screening) screen3 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension", "diabetes", "stage"), p_threshold = 0.20, labels = clintrial_labels, parallel = FALSE ) print(screen3) # Only significant predictors are shown # Example 4: Cox proportional hazards screening library(survival) cox_screen <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage", "grade"), model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) print(cox_screen) # Returns hazard ratios (HR) instead of odds ratios # Example 5: Keep models for diagnostics screen5 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "bmi", "creatinine"), keep_models = TRUE, parallel = FALSE ) # Access stored models models <- attr(screen5, "models") summary(models[["age"]]) plot(models[["age"]]) # Diagnostic plots # Example 6: Linear regression screening linear_screen <- uniscreen( data = clintrial, outcome = "bmi", predictors = c("age", "sex", "smoking", "creatinine", "hemoglobin"), model_type = "lm", labels = clintrial_labels, parallel = FALSE ) print(linear_screen) # Example 7: Poisson regression for equidispersed count outcomes # fu_count has variance ~= mean, appropriate for standard Poisson poisson_screen <- uniscreen( data = clintrial, outcome = "fu_count", predictors = c("age", "stage", "treatment", "surgery"), model_type = "glm", family = "poisson", labels = clintrial_labels, parallel = FALSE ) print(poisson_screen) # Returns rate ratios (RR) # Example 8: Negative binomial for overdispersed counts # ae_count has variance > mean (overdispersed), use negbin if (requireNamespace("MASS", quietly = TRUE)) { nb_screen <- uniscreen( data = clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes", "surgery"), model_type = "negbin", labels = clintrial_labels, parallel = FALSE ) print(nb_screen) } # Example 9: Gamma regression for positive continuous outcomes (\emph{e.g.,} costs) gamma_screen <- uniscreen( data = clintrial, outcome = "los_days", predictors = c("age", "sex", "treatment", "surgery"), model_type = "glm", family = Gamma(link = "log"), labels = clintrial_labels, parallel = FALSE ) print(gamma_screen) # Example 10: Hide reference rows for factor variables screen10 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("treatment", "stage", "grade"), reference_rows = FALSE, parallel = FALSE ) print(screen10) # Reference categories not shown # Example 11: Customize decimal places screen11 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "bmi", "creatinine"), digits = 3, # 3 decimal places for OR p_digits = 4 # 4 decimal places for p-values ) print(screen11) # Example 12: Hide sample size and event columns screen12 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi"), show_n = FALSE, show_events = FALSE, parallel = FALSE ) print(screen12) # Example 13: Access raw numeric data screen13 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment"), parallel = FALSE ) raw_data <- attr(screen13, "raw_data") print(raw_data) # Contains unformatted coefficients, SEs, CIs, etc. # Example 14: Force coefficient display instead of OR screen14 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "bmi"), model_type = "glm", family = "binomial", parallel = FALSE, exponentiate = FALSE # Show log odds instead of OR ) print(screen14) # Example 15: Screening with weights screen15 <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "bmi"), model_type = "coxph", weights = runif(nrow(clintrial), min = 0.5, max = 2), # Random numbers for example parallel = FALSE ) # Example 16: Strict significance filter (p < 0.05) sig_only <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension", "diabetes", "ecog", "treatment", "stage", "grade"), p_threshold = 0.05, labels = clintrial_labels, parallel = FALSE ) # Check how many predictors passed the filter n_significant <- length(unique(sig_only$Variable[sig_only$Variable != ""])) cat("Significant predictors:", n_significant, "\n") # Example 17: Complete workflow - screen then use in multivariable # Step 1: Screen with liberal threshold candidates <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension", "diabetes", "treatment", "stage", "grade"), p_threshold = 0.20, parallel = FALSE ) # Step 2: Extract significant predictor names sig_predictors <- attr(candidates, "significant") # Step 3: Fit multivariable model with selected predictors multi_model <- fit( data = clintrial, outcome = "os_status", predictors = sig_predictors, labels = clintrial_labels ) print(multi_model) # Example 18: Mixed-effects logistic regression (glmer) # Accounts for clustering by site if (requireNamespace("lme4", quietly = TRUE)) { glmer_screen <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), model_type = "glmer", random = "(1|site)", family = "binomial", labels = clintrial_labels, parallel = FALSE ) print(glmer_screen) } # Example 19: Mixed-effects linear regression (lmer) if (requireNamespace("lme4", quietly = TRUE)) { lmer_screen <- uniscreen( data = clintrial, outcome = "biomarker_x", predictors = c("age", "sex", "treatment", "smoking"), model_type = "lmer", random = "(1|site)", labels = clintrial_labels, parallel = FALSE ) print(lmer_screen) } # Example 20: Mixed-effects Cox model (coxme) if (requireNamespace("coxme", quietly = TRUE)) { coxme_screen <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), model_type = "coxme", random = "(1|site)", labels = clintrial_labels, parallel = FALSE ) print(coxme_screen) } # Example 21: Mixed-effects with nested random effects # Patients nested within sites if (requireNamespace("lme4", quietly = TRUE)) { nested_screen <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "treatment"), model_type = "glmer", random = "(1|site/patient_id)", family = "binomial", parallel = FALSE ) } # Example 22: Quasipoisson for overdispersed count data # Alternative to negative binomial when MASS not available quasi_screen <- uniscreen( data = clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes", "surgery", "stage"), model_type = "glm", family = "quasipoisson", labels = clintrial_labels, parallel = FALSE ) print(quasi_screen) # Adjusts standard errors for overdispersion # Example 23: Quasibinomial for overdispersed binary data quasibin_screen <- uniscreen( data = clintrial, outcome = "any_complication", predictors = c("age", "bmi", "diabetes", "surgery", "ecog"), model_type = "glm", family = "quasibinomial", labels = clintrial_labels, parallel = FALSE ) print(quasibin_screen) # Example 24: Inverse Gaussian for highly skewed positive data invgauss_screen <- uniscreen( data = clintrial, outcome = "recovery_days", predictors = c("age", "surgery", "pain_score", "los_days"), model_type = "glm", family = inverse.gaussian(link = "log"), labels = clintrial_labels, parallel = FALSE ) print(invgauss_screen)# Load example data data(clintrial) data(clintrial_labels) # Example 1: Basic logistic regression screening screen1 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension"), model_type = "glm", family = "binomial", parallel = FALSE ) print(screen1) # Example 2: With custom variable labels screen2 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "treatment"), labels = clintrial_labels, parallel = FALSE ) print(screen2) # Example 3: Filter by p-value threshold # Only keep predictors with p < 0.20 (common for screening) screen3 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension", "diabetes", "stage"), p_threshold = 0.20, labels = clintrial_labels, parallel = FALSE ) print(screen3) # Only significant predictors are shown # Example 4: Cox proportional hazards screening library(survival) cox_screen <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage", "grade"), model_type = "coxph", labels = clintrial_labels, parallel = FALSE ) print(cox_screen) # Returns hazard ratios (HR) instead of odds ratios # Example 5: Keep models for diagnostics screen5 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "bmi", "creatinine"), keep_models = TRUE, parallel = FALSE ) # Access stored models models <- attr(screen5, "models") summary(models[["age"]]) plot(models[["age"]]) # Diagnostic plots # Example 6: Linear regression screening linear_screen <- uniscreen( data = clintrial, outcome = "bmi", predictors = c("age", "sex", "smoking", "creatinine", "hemoglobin"), model_type = "lm", labels = clintrial_labels, parallel = FALSE ) print(linear_screen) # Example 7: Poisson regression for equidispersed count outcomes # fu_count has variance ~= mean, appropriate for standard Poisson poisson_screen <- uniscreen( data = clintrial, outcome = "fu_count", predictors = c("age", "stage", "treatment", "surgery"), model_type = "glm", family = "poisson", labels = clintrial_labels, parallel = FALSE ) print(poisson_screen) # Returns rate ratios (RR) # Example 8: Negative binomial for overdispersed counts # ae_count has variance > mean (overdispersed), use negbin if (requireNamespace("MASS", quietly = TRUE)) { nb_screen <- uniscreen( data = clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes", "surgery"), model_type = "negbin", labels = clintrial_labels, parallel = FALSE ) print(nb_screen) } # Example 9: Gamma regression for positive continuous outcomes (\emph{e.g.,} costs) gamma_screen <- uniscreen( data = clintrial, outcome = "los_days", predictors = c("age", "sex", "treatment", "surgery"), model_type = "glm", family = Gamma(link = "log"), labels = clintrial_labels, parallel = FALSE ) print(gamma_screen) # Example 10: Hide reference rows for factor variables screen10 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("treatment", "stage", "grade"), reference_rows = FALSE, parallel = FALSE ) print(screen10) # Reference categories not shown # Example 11: Customize decimal places screen11 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "bmi", "creatinine"), digits = 3, # 3 decimal places for OR p_digits = 4 # 4 decimal places for p-values ) print(screen11) # Example 12: Hide sample size and event columns screen12 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi"), show_n = FALSE, show_events = FALSE, parallel = FALSE ) print(screen12) # Example 13: Access raw numeric data screen13 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment"), parallel = FALSE ) raw_data <- attr(screen13, "raw_data") print(raw_data) # Contains unformatted coefficients, SEs, CIs, etc. # Example 14: Force coefficient display instead of OR screen14 <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "bmi"), model_type = "glm", family = "binomial", parallel = FALSE, exponentiate = FALSE # Show log odds instead of OR ) print(screen14) # Example 15: Screening with weights screen15 <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "bmi"), model_type = "coxph", weights = runif(nrow(clintrial), min = 0.5, max = 2), # Random numbers for example parallel = FALSE ) # Example 16: Strict significance filter (p < 0.05) sig_only <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension", "diabetes", "ecog", "treatment", "stage", "grade"), p_threshold = 0.05, labels = clintrial_labels, parallel = FALSE ) # Check how many predictors passed the filter n_significant <- length(unique(sig_only$Variable[sig_only$Variable != ""])) cat("Significant predictors:", n_significant, "\n") # Example 17: Complete workflow - screen then use in multivariable # Step 1: Screen with liberal threshold candidates <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "bmi", "smoking", "hypertension", "diabetes", "treatment", "stage", "grade"), p_threshold = 0.20, parallel = FALSE ) # Step 2: Extract significant predictor names sig_predictors <- attr(candidates, "significant") # Step 3: Fit multivariable model with selected predictors multi_model <- fit( data = clintrial, outcome = "os_status", predictors = sig_predictors, labels = clintrial_labels ) print(multi_model) # Example 18: Mixed-effects logistic regression (glmer) # Accounts for clustering by site if (requireNamespace("lme4", quietly = TRUE)) { glmer_screen <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "sex", "treatment", "stage"), model_type = "glmer", random = "(1|site)", family = "binomial", labels = clintrial_labels, parallel = FALSE ) print(glmer_screen) } # Example 19: Mixed-effects linear regression (lmer) if (requireNamespace("lme4", quietly = TRUE)) { lmer_screen <- uniscreen( data = clintrial, outcome = "biomarker_x", predictors = c("age", "sex", "treatment", "smoking"), model_type = "lmer", random = "(1|site)", labels = clintrial_labels, parallel = FALSE ) print(lmer_screen) } # Example 20: Mixed-effects Cox model (coxme) if (requireNamespace("coxme", quietly = TRUE)) { coxme_screen <- uniscreen( data = clintrial, outcome = "Surv(os_months, os_status)", predictors = c("age", "sex", "treatment", "stage"), model_type = "coxme", random = "(1|site)", labels = clintrial_labels, parallel = FALSE ) print(coxme_screen) } # Example 21: Mixed-effects with nested random effects # Patients nested within sites if (requireNamespace("lme4", quietly = TRUE)) { nested_screen <- uniscreen( data = clintrial, outcome = "os_status", predictors = c("age", "treatment"), model_type = "glmer", random = "(1|site/patient_id)", family = "binomial", parallel = FALSE ) } # Example 22: Quasipoisson for overdispersed count data # Alternative to negative binomial when MASS not available quasi_screen <- uniscreen( data = clintrial, outcome = "ae_count", predictors = c("age", "treatment", "diabetes", "surgery", "stage"), model_type = "glm", family = "quasipoisson", labels = clintrial_labels, parallel = FALSE ) print(quasi_screen) # Adjusts standard errors for overdispersion # Example 23: Quasibinomial for overdispersed binary data quasibin_screen <- uniscreen( data = clintrial, outcome = "any_complication", predictors = c("age", "bmi", "diabetes", "surgery", "ecog"), model_type = "glm", family = "quasibinomial", labels = clintrial_labels, parallel = FALSE ) print(quasibin_screen) # Example 24: Inverse Gaussian for highly skewed positive data invgauss_screen <- uniscreen( data = clintrial, outcome = "recovery_days", predictors = c("age", "surgery", "pain_score", "los_days"), model_type = "glm", family = inverse.gaussian(link = "log"), labels = clintrial_labels, parallel = FALSE ) print(invgauss_screen)