Home

Validated at machine precision¶

Most reporting packages test that code runs. PySofra tests that numbers are correct.

A 54-step audit notebook runs against the 2017–18 US National Health and Nutrition Examination Survey and asserts 52 numerical contracts against independent R reference implementations — on every push to main.

Quantity	Reference	Observed error
Weighted mean — Table 1 cell	`gtsummary` display value	3.3 × 10⁻¹⁵ rel
Weighted SD — Table 1 cell	`gtsummary` display value	3.8 × 10⁻¹⁵ rel
Weighted proportion (4 variables)	`gtsummary` display value	4.6 × 10⁻¹⁵ rel
Survey regression SE (6 coefficients)	R `svyglm`	< 0.8% rel
MI pooled point estimate	Rubin (1987)	< 10⁻¹⁴ abs
KM survival probability	lifelines	< 10⁻¹⁵ abs

Weighted means and SDs agree with R at floating-point machine precision — not approximation. Nominal 95% CIs attain 94.2% and 93.8% empirical coverage in a 1,000-replicate Monte Carlo study.

The pre-executed audit notebook is readable without installing anything. See AUDITOR.md for the single-command reproduction recipe.

Features¶

Six table builders

tbl_one, tbl_summary, tbl_cross, tbl_regression, tbl_uvregression, tbl_survival — covering the full clinical-reporting stack in one coherent API.
Seven output formats

HTML · Markdown · LaTeX · DOCX · PPTX · XLSX · PNG from a single immutable SofraTable object. Output is byte-identical across processes.
Survey weights

SurveyDesign(weights=, strata=, cluster=) accepted by every builder. Taylor-linearised sandwich SEs, Rao–Scott adjusted test statistics.
Multiple-imputation pooling

ps.pool(fits) applies Rubin's rules and returns an object accepted by tbl_regression — no manual bookkeeping.
Safety diagnostics

with_safety_warnings() embeds separation, PH-violation, and sparse-cell footnotes directly into DOCX, HTML, and LaTeX output.
Auto-dispatched tests

Welch, Wilcoxon, ANOVA, Kruskal–Wallis, Fisher, χ², Rao–Scott — selected per row by variable type, overridable per variable.

Quick example¶

import pysofra as ps

tbl = (
    ps.tbl_one(df, by="arm",
               labels={"age": "Age (years)", "bmi": "BMI (kg/m²)"},
               nonnormal=["bmi"])
      .add_p()
      .add_smd()
      .add_overall()
      .theme("clinical")
)

tbl                          # renders in Jupyter / VS Code / Colab
tbl.to_docx("table1.docx")   # publication-quality Word document
tbl.to_latex()               # LaTeX fragment, ready for manuscript

See the full quickstart API reference