Builders¶

The six top-level "builder" functions construct a fresh SofraTable from a DataFrame plus some configuration. Every builder returns an immutable object that can be chained through presentational modifiers (.bold_p(), .theme(), .set_caption(), …) and rendered to any supported format.

Most builders also support statistical re-computation modifiers (.add_p(), .add_smd(), …) that rebuild the table with additional columns. Exception: tbl_uvregression bakes p-values and confidence intervals in at build time. Use the conf_level= and digits= constructor arguments to control formatting; calling .add_p() on a tbl_uvregression result raises NotImplementedError with an explanatory message.

tbl_one ¶

tbl_one(data: DataFrame, *, by: str | None = None, variables: list[str] | None = None, labels: dict[str, str] | None = None, types: dict[str, VarKind] | None = None, nonnormal: list[str] | None = None, tests: dict[str, str] | None = None, weights: str | None = None, design: SurveyDesign | None = None, digits: int = 2, pct_digits: int = 1, missing: str = 'ifany', include_missing: bool | None = None) -> SofraTable

Build a Table 1.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Source dataframe.	required
`by`	`str \| None`	Optional column name to stratify on. If omitted, a single `Overall` column is produced.	`None`
`variables`	`list[str] \| None`	Explicit list of variables to include. Defaults to all columns other than `by`.	`None`
`labels`	`dict[str, str] \| None`	Mapping of column name → display label.	`None`
`types`	`dict[str, VarKind] \| None`	Override automatic variable typing on a per-column basis.	`None`
`nonnormal`	`list[str] \| None`	Continuous variables that should be summarised as `median (Q1, Q3)` and tested with rank-based tests.	`None`
`tests`	`dict[str, str] \| None`	Per-variable statistical test overrides, e.g. `{'age': 'wilcoxon', 'race': 'fisher'}`. See :func:`pysofra.summary.tests.available_tests` for the registry.	`None`
`weights`	`str \| None`	Column name carrying non-negative sampling weights (integer counts, inverse-probability weights, or raking weights). When supplied, continuous summaries become weighted means / SDs and categorical summaries become weighted proportions. The variance formula matches R's `Hmisc::wtd.var` (same as gtsummary). The weights column is excluded from the variable list automatically.	`None`
`design`	`SurveyDesign \| None`	A :class:`SurveyDesign` describing a complex sampling structure (weights + optional strata, clusters, and FPC). When provided, variance estimates use Taylor linearisation for design-correct standard errors. If both `weights` and `design` are passed, `design` wins.	`None`
`digits`	`int`	Decimal places for continuous summaries.	`2`
`pct_digits`	`int`	Decimal places for percentages.	`1`
`missing`	`str`	`"ifany"` (default) — include a Missing row only when there is missing data; `"always"` — always include; `"never"`.	`'ifany'`
`include_missing`	`bool \| None`	Deprecated alias for `missing`. `True` maps to `"ifany"`, `False` to `"never"`.	`None`

tbl_summary ¶

tbl_summary(data: DataFrame, *, by: str | None = None, variables: list[str] | None = None, labels: dict[str, str] | None = None, types: dict[str, VarKind] | None = None, nonnormal: list[str] | None = None, digits: int = 2, pct_digits: int = 1, missing: str = 'ifany') -> SofraTable

Build a general descriptive summary table.

See :func:pysofra.tbl_one for parameter documentation. The two functions share an engine; the names exist separately because the intent differs (Table 1 baseline vs. arbitrary descriptive summary) and we may diverge their defaults further in future releases.

tbl_cross ¶

tbl_cross(data: Any, *, row: str, column: str, cell: str = 'n_col_pct', margins: bool = True, digits: int = 1, labels: dict[str, str] | None = None) -> SofraTable

Cross-tabulate row against column.

Parameters:

Name	Type	Description	Default
`data`	`Any`	Source dataframe.	required
`row`	`str`	Variable name for the rows.	required
`column`	`str`	Variable name for the columns.	required
`cell`	`str`	How to display each interior cell. See module docstring.	`'n_col_pct'`
`margins`	`bool`	Include row / column / grand totals.	`True`
`digits`	`int`	Decimal places for the percent.	`1`
`labels`	`dict[str, str] \| None`	Optional mapping of level → display label, applied to both row and column labels.	`None`

Notes

The returned :class:SofraTable carries a rebuild closure over the source data so the statistical modifiers .add_p() and .add_overall() work directly:

.add_p() re-runs the cross-tab and appends a p-value footnote based on the auto-selected categorical test (Fisher's exact for 2x2, Pearson χ² otherwise).
.add_overall() toggles margins=True so the row, column, and grand totals are rendered (no-op when margins are already on, which is the default).
.add_smd() raises :class:NotImplementedError — SMD is a between-group effect-size on a single distribution and is undefined on a contingency table. Use :func:tbl_one for SMD between two arms.

tbl_survival ¶

tbl_survival(data: Any, *, time: str, event: str, by: str | None = None, times: list[float] | tuple[float, ...] | None = None, times_label: str | None = None, conf_level: float = 0.95, digits: int = 2, pct_digits: int = 1, labels: dict[str, str] | None = None, show_logrank: bool = True, weights: str | None = None) -> SofraTable

Build a Kaplan–Meier summary table.

Parameters:

Name	Type	Description	Default
`data`	`Any`	Source dataframe (pandas or polars).	required
`time`	`str`	Column carrying follow-up time.	required
`event`	`str`	Column carrying the event indicator (1 = event, 0 = censored).	required
`by`	`str \| None`	Optional stratification column. Without it, a single `"Overall"` column is produced.	`None`
`times`	`list[float] \| tuple[float, ...] \| None`	Optional list of follow-up times at which to report survival probability and N at risk. For example `[12, 24, 36]` for 1/2/3-year survival in a months-scaled study.	`None`
`times_label`	`str \| None`	Unit label appended to each `times` header (e.g. `"months"` → `"S(12 months)"`). Defaults to bare numbers.	`None`
`conf_level`	`float`	Confidence level for the median survival CI.	`0.95`
`digits`	`int`	Decimal places for survival probabilities and median.	`2`
`pct_digits`	`int`	Decimal places for survival percentages.	`1`
`labels`	`dict[str, str] \| None`	Optional mapping from group level → display label.	`None`
`show_logrank`	`bool`	Whether to compute and footnote the multi-group log-rank test.	`True`
`weights`	`str \| None`	Optional column carrying per-row sampling/frequency weights. When supplied, the Kaplan–Meier estimator is fit with the `weights=` kwarg of `lifelines.KaplanMeierFitter` (a weighted product-limit estimator). N totals / events / censored report weighted sums. The log-rank test currently uses unweighted ranks regardless — lifelines does not expose a weighted log-rank — and a footnote flags this when weights are active.	`None`

tbl_regression ¶

tbl_regression(model: Any | list[Any], *, exponentiate: bool | None = None, conf_level: float = 0.95, digits: int = 2, labels: dict[str, str] | None = None, intercept: bool = False, estimate_label: str | None = None, model_labels: list[str] | None = None, design: Any = None, data: Any = None) -> SofraTable

Build a regression results table.

Parameters:

Name	Type	Description	Default
`model`	`Any \| list[Any]`	A fitted model, or a list of fitted models for a multi-model side-by-side table.	required
`exponentiate`	`bool \| None`	If `True`, exponentiate point estimates and CI bounds (ORs / HRs / IRRs). `None` (default) auto-selects: `True` for log-link models (Logit / Poisson / Cox / Weibull AFT), `False` otherwise.	`None`
`conf_level`	`float`	Confidence level for the CI column (default 95%).	`0.95`
`digits`	`int`	Decimal places for estimates and CI bounds.	`2`
`labels`	`dict[str, str] \| None`	Mapping from coefficient name → display label. Shared across all models in a multi-model table.	`None`
`intercept`	`bool`	Whether to include the intercept row.	`False`
`estimate_label`	`str \| None`	Custom header label for the estimate column. Defaults to `OR` / `HR` / `IRR` / `β` / `Estimate` based on the detected model family.	`None`
`model_labels`	`list[str] \| None`	For multi-model tables, the spanning-header label for each model (defaults to `Model 1`, `Model 2`, ...).	`None`
`design`	`Any`	Optional :class:`~pysofra.SurveyDesign`. When provided, the fit is re-summarised with cluster-robust standard errors (Taylor linearisation matching `survey::svyglm` to first order). The `data` argument is required for statsmodels models when a design with cluster columns is given.	`None`
`data`	`Any`	Source dataframe — needed only when `design=` references columns that the fitted model didn't already see.	`None`

tbl_uvregression ¶

tbl_uvregression(data: Any, *, outcome: str, predictors: list[str] | None = None, method: Callable[..., Any] | str = 'OLS', method_kwargs: dict[str, Any] | None = None, adjust_for: list[str] | None = None, exponentiate: bool | None = None, conf_level: float = 0.95, digits: int = 2, labels: dict[str, str] | None = None) -> SofraTable

Univariable regression — one model per predictor.

Parameters:

Name	Type	Description	Default
`data`	`Any`	Source dataframe (pandas or polars).	required
`outcome`	`str`	Column name of the response variable.	required
`predictors`	`list[str] \| None`	Predictor columns. Defaults to every column except `outcome` and any `adjust_for` covariates (numeric and categorical).	`None`
`method`	`Callable[..., Any] \| str`	Either a callable that takes `(y, X)` and returns a fitted statsmodels-style results object, or one of the string aliases `"OLS"`, `"Logit"`, `"Poisson"`, `"GLM"`.	`'OLS'`
`method_kwargs`	`dict[str, Any] \| None`	Extra keyword arguments forwarded to the model class.	`None`
`adjust_for`	`list[str] \| None`	Optional list of covariates included in every univariable fit (matching `gtsummary`'s `include` argument). Adjustment covariates are themselves dummy-encoded if categorical.	`None`
`exponentiate`	`bool \| None`	If `True`, exponentiate point estimates and CI bounds. `None` (default) auto-selects based on the model family.	`None`
`conf_level`	`float`	Confidence level for the CI column.	`0.95`
`digits`	`int`	Decimal places for estimates and CI bounds.	`2`
`labels`	`dict[str, str] \| None`	Mapping from predictor name → display label. Applied to the group-header row for categorical predictors.	`None`

Notes

For a categorical predictor with K levels the result has K rows: a header naming the variable, plus K indented rows (the reference level rendered as — ref, and one row per non-reference level with its estimate / CI / p-value).