Scope & known limitations¶
A short, honest list of where PySofra makes a deliberate approximation, exposes a known gap, or simply does not cover a case. Each item is paired with the renderer-level signal a user sees, the recommended workaround, and the audit step in the case-study notebook that quantifies the gap.
This page exists so that PySofra users (and their reviewers) never encounter a limitation through an unlabelled numeric discrepancy — the limitation should be visible on the rendered table itself.
1. Rao–Scott design-based chi-square is first-order¶
| Where | tbl_one(design=...) p-values for categorical variables |
| What | PySofra uses the first-order Rao–Scott adjustment (Kish DEFF) for design-based chi-square. R survey::svychisq uses the second-order (generalised-DEFF / eigen-decomposition) adjustment. |
| Observed | On NHANES 2017-2018 (moderate clustering) the first-order p-values differ from R svychisq by 57–69 % on individual variables. Under high intra-cluster correlation — designs where the outcome is nearly constant within PSUs — the Kish DEFF approximation is blind to the clustering structure and can be off by an order of magnitude or more (empirically: ×10–×20 vs the second-order p-value). The underlying Pearson chi-square statistic matches R exactly in all cases. |
| User signal | Table 1 p-values for categorical variables under design= carry a footnote naming the approximation. |
| Workaround | Compute the chi-square statistic via PySofra (matches R exactly), then run survey::svychisq() in R for the second-order p-value if a publication requires that exact match. |
| Audit step | jss_case_study Step 38 (quantified gap + Table-1 linkage). |
2. Weighted Kaplan–Meier CIs use Greenwood¶
| Where | tbl_survival(weights=...) median-survival and S(t) CIs |
| What | PySofra delegates to lifelines.KaplanMeierFitter, which uses the Greenwood variance. Greenwood is exact for integer (frequency) weights, but biased (too narrow) for non-integer (sampling, propensity, IPTW) weights. The KM point estimates remain unbiased under any weights. |
| User signal | When weights= resolves to non-integer values, tbl_survival emits one UserWarning and attaches a matching table footnote naming the Greenwood approximation. Integer weights stay silent. |
| Workaround | For design-grade weighted-survival CIs, bootstrap-resample units (or PSUs) and report empirical-percentile CIs. |
| Audit step | jss_case_study Step 27 (pinned CI-bias warning + footnote as a contract). |
3. scikit-learn estimators expose point estimates only¶
| Where | tbl_regression(sklearn_fit) for LogisticRegression, LinearRegression, etc. |
| What | scikit-learn does not natively expose standard errors, confidence intervals, or p-values on fitted estimators. PySofra renders the point estimates faithfully and leaves the CI / p-value columns blank. |
| User signal | The rendered table carries a footnote naming the source family (e.g., LogisticRegression (scikit-learn)) and stating "point estimates only — the source fitter does not expose standard errors, confidence intervals, or p-values". |
| Workaround | Refit the same model with statsmodels (sm.Logit, sm.GLM, sm.OLS) when inferential output is required. PySofra then auto-extracts CI + p from the statsmodels result via the same tbl_regression() entry point — no other code changes needed. |
| Audit step | jss_case_study Step 53 (no-inference footnote pinned as a contract); unit test test_sklearn_table_carries_no_inference_footnote in tests/test_regressions.py. |
What is not a limitation we intend to fix¶
Some properties are deliberate, not gaps:
SofraTableis frozen / immutable. Modifier methods are copy-on-write. This is the foundation of the cross-backend consistency proof (tests/test_cross_backend_consistency.py) and is fixed by the API stability contract.- PySofra does not fit models on the user's behalf for
tbl_regression. The function accepts a fitted statsmodels / lifelines / sklearn result and extracts the right quantities. A user fitting their own model preserves all model diagnostics that PySofra cannot reproduce (BIC, fit warnings, residual plots). tbl_survivalre-derives CIs fromlifelines.fit(alpha=)and does not patch lifelines' variance formula. Doing so would make PySofra a fork of lifelines rather than a thin reporting layer over it.
Anything else? Open an issue on
the repository tagged
limitation and we will either fix it or add it to this page.