Scope & known limitations¶

A short, honest list of where PySofra makes a deliberate approximation, exposes a known gap, or simply does not cover a case. Each item is paired with the renderer-level signal a user sees, the recommended workaround, and the audit step in the case-study notebook that quantifies the gap.

This page exists so that PySofra users (and their reviewers) never encounter a limitation through an unlabelled numeric discrepancy — the limitation should be visible on the rendered table itself.

1. Rao–Scott design-based chi-square is first-order¶


Where	`tbl_one(design=...)` p-values for categorical variables
What	PySofra uses the first-order Rao–Scott adjustment (Kish DEFF) for design-based chi-square. R `survey::svychisq` uses the second-order (generalised-DEFF / eigen-decomposition) adjustment.
Observed	On NHANES 2017-2018 (moderate clustering) the first-order p-values differ from R `svychisq` by 57–69 % on individual variables. Under high intra-cluster correlation — designs where the outcome is nearly constant within PSUs — the Kish DEFF approximation is blind to the clustering structure and can be off by an order of magnitude or more (empirically: ×10–×20 vs the second-order p-value). The underlying Pearson chi-square statistic matches R exactly in all cases.
User signal	Table 1 p-values for categorical variables under `design=` carry a footnote naming the approximation.
Workaround	Compute the chi-square statistic via PySofra (matches R exactly), then run `survey::svychisq()` in R for the second-order p-value if a publication requires that exact match.
Audit step	jss_case_study Step 38 (quantified gap + Table-1 linkage).

2. Weighted Kaplan–Meier CIs use Greenwood¶


Where	`tbl_survival(weights=...)` median-survival and S(t) CIs
What	PySofra delegates to `lifelines.KaplanMeierFitter`, which uses the Greenwood variance. Greenwood is exact for integer (frequency) weights, but biased (too narrow) for non-integer (sampling, propensity, IPTW) weights. The KM point estimates remain unbiased under any weights.
User signal	When `weights=` resolves to non-integer values, `tbl_survival` emits one `UserWarning` and attaches a matching table footnote naming the Greenwood approximation. Integer weights stay silent.
Workaround	For design-grade weighted-survival CIs, bootstrap-resample units (or PSUs) and report empirical-percentile CIs.
Audit step	jss_case_study Step 27 (pinned CI-bias warning + footnote as a contract).

3. scikit-learn estimators expose point estimates only¶


Where	`tbl_regression(sklearn_fit)` for `LogisticRegression`, `LinearRegression`, etc.
What	scikit-learn does not natively expose standard errors, confidence intervals, or p-values on fitted estimators. PySofra renders the point estimates faithfully and leaves the CI / p-value columns blank.
User signal	The rendered table carries a footnote naming the source family (e.g., `LogisticRegression (scikit-learn)`) and stating "point estimates only — the source fitter does not expose standard errors, confidence intervals, or p-values".
Workaround	Refit the same model with `statsmodels` (`sm.Logit`, `sm.GLM`, `sm.OLS`) when inferential output is required. PySofra then auto-extracts CI + p from the statsmodels result via the same `tbl_regression()` entry point — no other code changes needed.
Audit step	jss_case_study Step 53 (no-inference footnote pinned as a contract); unit test `test_sklearn_table_carries_no_inference_footnote` in `tests/test_regressions.py`.

What is not a limitation we intend to fix¶

Some properties are deliberate, not gaps:

SofraTable is frozen / immutable. Modifier methods are copy-on-write. This is the foundation of the cross-backend consistency proof (tests/test_cross_backend_consistency.py) and is fixed by the API stability contract.
PySofra does not fit models on the user's behalf for tbl_regression. The function accepts a fitted statsmodels / lifelines / sklearn result and extracts the right quantities. A user fitting their own model preserves all model diagnostics that PySofra cannot reproduce (BIC, fit warnings, residual plots).
tbl_survival re-derives CIs from lifelines.fit(alpha=) and does not patch lifelines' variance formula. Doing so would make PySofra a fork of lifelines rather than a thin reporting layer over it.

Anything else? Open an issue on the repository tagged limitation and we will either fix it or add it to this page.