Skip to content

Builders

The six top-level "builder" functions construct a fresh SofraTable from a DataFrame plus some configuration. Every builder returns an immutable object that can be chained through presentational modifiers (.bold_p(), .theme(), .set_caption(), …) and rendered to any supported format.

Most builders also support statistical re-computation modifiers (.add_p(), .add_smd(), …) that rebuild the table with additional columns. Exception: tbl_uvregression bakes p-values and confidence intervals in at build time. Use the conf_level= and digits= constructor arguments to control formatting; calling .add_p() on a tbl_uvregression result raises NotImplementedError with an explanatory message.

tbl_one

tbl_one(data: DataFrame, *, by: str | None = None, variables: list[str] | None = None, labels: dict[str, str] | None = None, types: dict[str, VarKind] | None = None, nonnormal: list[str] | None = None, tests: dict[str, str] | None = None, weights: str | None = None, design: SurveyDesign | None = None, digits: int = 2, pct_digits: int = 1, missing: str = 'ifany', include_missing: bool | None = None) -> SofraTable

Build a Table 1.

Parameters:

Name Type Description Default
data DataFrame

Source dataframe.

required
by str | None

Optional column name to stratify on. If omitted, a single Overall column is produced.

None
variables list[str] | None

Explicit list of variables to include. Defaults to all columns other than by.

None
labels dict[str, str] | None

Mapping of column name → display label.

None
types dict[str, VarKind] | None

Override automatic variable typing on a per-column basis.

None
nonnormal list[str] | None

Continuous variables that should be summarised as median (Q1, Q3) and tested with rank-based tests.

None
tests dict[str, str] | None

Per-variable statistical test overrides, e.g. {'age': 'wilcoxon', 'race': 'fisher'}. See :func:pysofra.summary.tests.available_tests for the registry.

None
weights str | None

Column name carrying non-negative sampling weights (integer counts, inverse-probability weights, or raking weights). When supplied, continuous summaries become weighted means / SDs and categorical summaries become weighted proportions. The variance formula matches R's Hmisc::wtd.var (same as gtsummary). The weights column is excluded from the variable list automatically.

None
design SurveyDesign | None

A :class:SurveyDesign describing a complex sampling structure (weights + optional strata, clusters, and FPC). When provided, variance estimates use Taylor linearisation for design-correct standard errors. If both weights and design are passed, design wins.

None
digits int

Decimal places for continuous summaries.

2
pct_digits int

Decimal places for percentages.

1
missing str

"ifany" (default) — include a Missing row only when there is missing data; "always" — always include; "never".

'ifany'
include_missing bool | None

Deprecated alias for missing. True maps to "ifany", False to "never".

None

tbl_summary

tbl_summary(data: DataFrame, *, by: str | None = None, variables: list[str] | None = None, labels: dict[str, str] | None = None, types: dict[str, VarKind] | None = None, nonnormal: list[str] | None = None, digits: int = 2, pct_digits: int = 1, missing: str = 'ifany') -> SofraTable

Build a general descriptive summary table.

See :func:pysofra.tbl_one for parameter documentation. The two functions share an engine; the names exist separately because the intent differs (Table 1 baseline vs. arbitrary descriptive summary) and we may diverge their defaults further in future releases.

tbl_cross

tbl_cross(data: Any, *, row: str, column: str, cell: str = 'n_col_pct', margins: bool = True, digits: int = 1, labels: dict[str, str] | None = None) -> SofraTable

Cross-tabulate row against column.

Parameters:

Name Type Description Default
data Any

Source dataframe.

required
row str

Variable name for the rows.

required
column str

Variable name for the columns.

required
cell str

How to display each interior cell. See module docstring.

'n_col_pct'
margins bool

Include row / column / grand totals.

True
digits int

Decimal places for the percent.

1
labels dict[str, str] | None

Optional mapping of level → display label, applied to both row and column labels.

None
Notes

The returned :class:SofraTable carries a rebuild closure over the source data so the statistical modifiers .add_p() and .add_overall() work directly:

  • .add_p() re-runs the cross-tab and appends a p-value footnote based on the auto-selected categorical test (Fisher's exact for 2x2, Pearson χ² otherwise).
  • .add_overall() toggles margins=True so the row, column, and grand totals are rendered (no-op when margins are already on, which is the default).
  • .add_smd() raises :class:NotImplementedError — SMD is a between-group effect-size on a single distribution and is undefined on a contingency table. Use :func:tbl_one for SMD between two arms.

tbl_survival

tbl_survival(data: Any, *, time: str, event: str, by: str | None = None, times: list[float] | tuple[float, ...] | None = None, times_label: str | None = None, conf_level: float = 0.95, digits: int = 2, pct_digits: int = 1, labels: dict[str, str] | None = None, show_logrank: bool = True, weights: str | None = None) -> SofraTable

Build a Kaplan–Meier summary table.

Parameters:

Name Type Description Default
data Any

Source dataframe (pandas or polars).

required
time str

Column carrying follow-up time.

required
event str

Column carrying the event indicator (1 = event, 0 = censored).

required
by str | None

Optional stratification column. Without it, a single "Overall" column is produced.

None
times list[float] | tuple[float, ...] | None

Optional list of follow-up times at which to report survival probability and N at risk. For example [12, 24, 36] for 1/2/3-year survival in a months-scaled study.

None
times_label str | None

Unit label appended to each times header (e.g. "months""S(12 months)"). Defaults to bare numbers.

None
conf_level float

Confidence level for the median survival CI.

0.95
digits int

Decimal places for survival probabilities and median.

2
pct_digits int

Decimal places for survival percentages.

1
labels dict[str, str] | None

Optional mapping from group level → display label.

None
show_logrank bool

Whether to compute and footnote the multi-group log-rank test.

True
weights str | None

Optional column carrying per-row sampling/frequency weights. When supplied, the Kaplan–Meier estimator is fit with the weights= kwarg of lifelines.KaplanMeierFitter (a weighted product-limit estimator). N totals / events / censored report weighted sums. The log-rank test currently uses unweighted ranks regardless — lifelines does not expose a weighted log-rank — and a footnote flags this when weights are active.

None

tbl_regression

tbl_regression(model: Any | list[Any], *, exponentiate: bool | None = None, conf_level: float = 0.95, digits: int = 2, labels: dict[str, str] | None = None, intercept: bool = False, estimate_label: str | None = None, model_labels: list[str] | None = None, design: Any = None, data: Any = None) -> SofraTable

Build a regression results table.

Parameters:

Name Type Description Default
model Any | list[Any]

A fitted model, or a list of fitted models for a multi-model side-by-side table.

required
exponentiate bool | None

If True, exponentiate point estimates and CI bounds (ORs / HRs / IRRs). None (default) auto-selects: True for log-link models (Logit / Poisson / Cox / Weibull AFT), False otherwise.

None
conf_level float

Confidence level for the CI column (default 95%).

0.95
digits int

Decimal places for estimates and CI bounds.

2
labels dict[str, str] | None

Mapping from coefficient name → display label. Shared across all models in a multi-model table.

None
intercept bool

Whether to include the intercept row.

False
estimate_label str | None

Custom header label for the estimate column. Defaults to OR / HR / IRR / β / Estimate based on the detected model family.

None
model_labels list[str] | None

For multi-model tables, the spanning-header label for each model (defaults to Model 1, Model 2, ...).

None
design Any

Optional :class:~pysofra.SurveyDesign. When provided, the fit is re-summarised with cluster-robust standard errors (Taylor linearisation matching survey::svyglm to first order). The data argument is required for statsmodels models when a design with cluster columns is given.

None
data Any

Source dataframe — needed only when design= references columns that the fitted model didn't already see.

None

tbl_uvregression

tbl_uvregression(data: Any, *, outcome: str, predictors: list[str] | None = None, method: Callable[..., Any] | str = 'OLS', method_kwargs: dict[str, Any] | None = None, adjust_for: list[str] | None = None, exponentiate: bool | None = None, conf_level: float = 0.95, digits: int = 2, labels: dict[str, str] | None = None) -> SofraTable

Univariable regression — one model per predictor.

Parameters:

Name Type Description Default
data Any

Source dataframe (pandas or polars).

required
outcome str

Column name of the response variable.

required
predictors list[str] | None

Predictor columns. Defaults to every column except outcome and any adjust_for covariates (numeric and categorical).

None
method Callable[..., Any] | str

Either a callable that takes (y, X) and returns a fitted statsmodels-style results object, or one of the string aliases "OLS", "Logit", "Poisson", "GLM".

'OLS'
method_kwargs dict[str, Any] | None

Extra keyword arguments forwarded to the model class.

None
adjust_for list[str] | None

Optional list of covariates included in every univariable fit (matching gtsummary's include argument). Adjustment covariates are themselves dummy-encoded if categorical.

None
exponentiate bool | None

If True, exponentiate point estimates and CI bounds. None (default) auto-selects based on the model family.

None
conf_level float

Confidence level for the CI column.

0.95
digits int

Decimal places for estimates and CI bounds.

2
labels dict[str, str] | None

Mapping from predictor name → display label. Applied to the group-header row for categorical predictors.

None
Notes

For a categorical predictor with K levels the result has K rows: a header naming the variable, plus K indented rows (the reference level rendered as — ref, and one row per non-reference level with its estimate / CI / p-value).