Builders¶
The six top-level "builder" functions construct a fresh
SofraTable from a DataFrame plus
some configuration. Every builder returns an immutable object that
can be chained through presentational modifiers (.bold_p(),
.theme(), .set_caption(), …) and rendered to any supported format.
Most builders also support statistical re-computation modifiers
(.add_p(), .add_smd(), …) that rebuild the table with additional
columns. Exception: tbl_uvregression bakes p-values and
confidence intervals in at build time. Use the conf_level= and
digits= constructor arguments to control formatting; calling
.add_p() on a tbl_uvregression result raises NotImplementedError
with an explanatory message.
tbl_one ¶
tbl_one(data: DataFrame, *, by: str | None = None, variables: list[str] | None = None, labels: dict[str, str] | None = None, types: dict[str, VarKind] | None = None, nonnormal: list[str] | None = None, tests: dict[str, str] | None = None, weights: str | None = None, design: SurveyDesign | None = None, digits: int = 2, pct_digits: int = 1, missing: str = 'ifany', include_missing: bool | None = None) -> SofraTable
Build a Table 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Source dataframe. |
required |
by
|
str | None
|
Optional column name to stratify on. If omitted, a single
|
None
|
variables
|
list[str] | None
|
Explicit list of variables to include. Defaults to all columns
other than |
None
|
labels
|
dict[str, str] | None
|
Mapping of column name → display label. |
None
|
types
|
dict[str, VarKind] | None
|
Override automatic variable typing on a per-column basis. |
None
|
nonnormal
|
list[str] | None
|
Continuous variables that should be summarised as
|
None
|
tests
|
dict[str, str] | None
|
Per-variable statistical test overrides, e.g.
|
None
|
weights
|
str | None
|
Column name carrying non-negative sampling weights (integer
counts, inverse-probability weights, or raking weights).
When supplied, continuous summaries become weighted means /
SDs and categorical summaries become weighted proportions.
The variance formula matches R's |
None
|
design
|
SurveyDesign | None
|
A :class: |
None
|
digits
|
int
|
Decimal places for continuous summaries. |
2
|
pct_digits
|
int
|
Decimal places for percentages. |
1
|
missing
|
str
|
|
'ifany'
|
include_missing
|
bool | None
|
Deprecated alias for |
None
|
tbl_summary ¶
tbl_summary(data: DataFrame, *, by: str | None = None, variables: list[str] | None = None, labels: dict[str, str] | None = None, types: dict[str, VarKind] | None = None, nonnormal: list[str] | None = None, digits: int = 2, pct_digits: int = 1, missing: str = 'ifany') -> SofraTable
Build a general descriptive summary table.
See :func:pysofra.tbl_one for parameter documentation. The two
functions share an engine; the names exist separately because the
intent differs (Table 1 baseline vs. arbitrary descriptive summary)
and we may diverge their defaults further in future releases.
tbl_cross ¶
tbl_cross(data: Any, *, row: str, column: str, cell: str = 'n_col_pct', margins: bool = True, digits: int = 1, labels: dict[str, str] | None = None) -> SofraTable
Cross-tabulate row against column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
Source dataframe. |
required |
row
|
str
|
Variable name for the rows. |
required |
column
|
str
|
Variable name for the columns. |
required |
cell
|
str
|
How to display each interior cell. See module docstring. |
'n_col_pct'
|
margins
|
bool
|
Include row / column / grand totals. |
True
|
digits
|
int
|
Decimal places for the percent. |
1
|
labels
|
dict[str, str] | None
|
Optional mapping of level → display label, applied to both row and column labels. |
None
|
Notes
The returned :class:SofraTable carries a rebuild closure over the
source data so the statistical modifiers .add_p() and
.add_overall() work directly:
.add_p()re-runs the cross-tab and appends a p-value footnote based on the auto-selected categorical test (Fisher's exact for 2x2, Pearson χ² otherwise)..add_overall()togglesmargins=Trueso the row, column, and grand totals are rendered (no-op when margins are already on, which is the default)..add_smd()raises :class:NotImplementedError— SMD is a between-group effect-size on a single distribution and is undefined on a contingency table. Use :func:tbl_onefor SMD between two arms.
tbl_survival ¶
tbl_survival(data: Any, *, time: str, event: str, by: str | None = None, times: list[float] | tuple[float, ...] | None = None, times_label: str | None = None, conf_level: float = 0.95, digits: int = 2, pct_digits: int = 1, labels: dict[str, str] | None = None, show_logrank: bool = True, weights: str | None = None) -> SofraTable
Build a Kaplan–Meier summary table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
Source dataframe (pandas or polars). |
required |
time
|
str
|
Column carrying follow-up time. |
required |
event
|
str
|
Column carrying the event indicator (1 = event, 0 = censored). |
required |
by
|
str | None
|
Optional stratification column. Without it, a single
|
None
|
times
|
list[float] | tuple[float, ...] | None
|
Optional list of follow-up times at which to report survival
probability and N at risk. For example |
None
|
times_label
|
str | None
|
Unit label appended to each |
None
|
conf_level
|
float
|
Confidence level for the median survival CI. |
0.95
|
digits
|
int
|
Decimal places for survival probabilities and median. |
2
|
pct_digits
|
int
|
Decimal places for survival percentages. |
1
|
labels
|
dict[str, str] | None
|
Optional mapping from group level → display label. |
None
|
show_logrank
|
bool
|
Whether to compute and footnote the multi-group log-rank test. |
True
|
weights
|
str | None
|
Optional column carrying per-row sampling/frequency weights.
When supplied, the Kaplan–Meier estimator is fit with the
|
None
|
tbl_regression ¶
tbl_regression(model: Any | list[Any], *, exponentiate: bool | None = None, conf_level: float = 0.95, digits: int = 2, labels: dict[str, str] | None = None, intercept: bool = False, estimate_label: str | None = None, model_labels: list[str] | None = None, design: Any = None, data: Any = None) -> SofraTable
Build a regression results table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
Any | list[Any]
|
A fitted model, or a list of fitted models for a multi-model side-by-side table. |
required |
exponentiate
|
bool | None
|
If |
None
|
conf_level
|
float
|
Confidence level for the CI column (default 95%). |
0.95
|
digits
|
int
|
Decimal places for estimates and CI bounds. |
2
|
labels
|
dict[str, str] | None
|
Mapping from coefficient name → display label. Shared across all models in a multi-model table. |
None
|
intercept
|
bool
|
Whether to include the intercept row. |
False
|
estimate_label
|
str | None
|
Custom header label for the estimate column. Defaults to |
None
|
model_labels
|
list[str] | None
|
For multi-model tables, the spanning-header label for each model
(defaults to |
None
|
design
|
Any
|
Optional :class: |
None
|
data
|
Any
|
Source dataframe — needed only when |
None
|
tbl_uvregression ¶
tbl_uvregression(data: Any, *, outcome: str, predictors: list[str] | None = None, method: Callable[..., Any] | str = 'OLS', method_kwargs: dict[str, Any] | None = None, adjust_for: list[str] | None = None, exponentiate: bool | None = None, conf_level: float = 0.95, digits: int = 2, labels: dict[str, str] | None = None) -> SofraTable
Univariable regression — one model per predictor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
Source dataframe (pandas or polars). |
required |
outcome
|
str
|
Column name of the response variable. |
required |
predictors
|
list[str] | None
|
Predictor columns. Defaults to every column except |
None
|
method
|
Callable[..., Any] | str
|
Either a callable that takes |
'OLS'
|
method_kwargs
|
dict[str, Any] | None
|
Extra keyword arguments forwarded to the model class. |
None
|
adjust_for
|
list[str] | None
|
Optional list of covariates included in every univariable fit
(matching |
None
|
exponentiate
|
bool | None
|
If |
None
|
conf_level
|
float
|
Confidence level for the CI column. |
0.95
|
digits
|
int
|
Decimal places for estimates and CI bounds. |
2
|
labels
|
dict[str, str] | None
|
Mapping from predictor name → display label. Applied to the group-header row for categorical predictors. |
None
|
Notes
For a categorical predictor with K levels the result has
K rows: a header naming the variable, plus K indented
rows (the reference level rendered as — ref, and one row
per non-reference level with its estimate / CI / p-value).