Model architecture
PolicyEngine models tax and benefit systems as a set of country packages wrapped by the policyengine Python interface. The country packages contain the policy rules; this package provides the user-facing calculation and analysis surface.
This page describes the documentation structure we would choose if starting fresh.
Three layers
PolicyEngine documentation should keep three layers separate:
| Layer | Owns | Should answer |
|---|---|---|
| Methodology | Authored prose | Why the model is structured this way, how concepts flow, where assumptions enter |
| Program pages | Authored prose plus generated links | What a program does, who qualifies, how values and household costs are represented |
| Reference | Generated from code and data packages | What variables, parameters, programs, sources, and calibration targets exist in a release |
This split avoids two common failure modes:
- Long-form methodology pages becoming stale variable catalogs.
- Generated reference pages trying to explain modeling choices that need narrative context.
Source of truth
The source of truth should depend on content type:
| Content | Source |
|---|---|
| Formulas, entities, variable metadata | Country model packages such as policyengine-us |
| Parameter values, uprating, legislative references | Country model parameter YAML |
| Microdata construction, imputations, calibration targets | Country data packages such as policyengine-us-data |
| User tutorials, model-wide methodology, program narratives | policyengine.py docs |
The documentation site should not manually copy reference metadata that can be regenerated from a release. It should explain how the generated pieces fit together.
Rules, data, calibration, outputs
A complete model page should be explicit about four pieces:
- Rules: statutory formulas and administrative program rules.
- Data: the household or person records to which rules are applied.
- Calibration: adjustments that align the data and model outputs with external targets.
- Outputs: the resource, budget, poverty, inequality, and distributional concepts returned to users.
For example, a program page should not stop at eligibility. It should say how benefit value is represented, whether household-paid costs are modeled, what data inputs are required, and how the program enters aggregate output concepts.
What belongs in generated reference
Generated reference pages should include:
- variable name, entity, period, unit, label, and documentation
adds,subtracts, anddefined_forrelationships- source file path and source line
- parameter value history and references
- program coverage metadata
- calibration target source, vintage, unit, and current model fit
The existing reference generator is a prototype for the variable and program parts. Parameter and data-lineage generation should follow the same pattern.