Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

US tax-benefit model

The US tax-benefit model implements the United States federal tax and benefit system using PolicyEngine US as the underlying calculation engine.

Entity structure

The US model uses a more complex entity hierarchy:

household
    ├── tax_unit (federal tax filing unit)
    ├── spm_unit (Supplemental Poverty Measure unit)
    ├── family (Census definition)
    └── marital_unit (married couple or single person)
            └── person

Person

Individual people with demographic and income characteristics.

Key variables:

Tax unit

The federal tax filing unit (individual or married filing jointly).

Key variables:

SPM unit

The Supplemental Poverty Measure unit used for SNAP and other means-tested benefits.

Key variables:

Family

Census definition of family (related individuals).

Key variables:

Marital unit

Married couple or single person.

Key variables:

Household

The residence unit.

Key variables:

Required fields:

Using the US model

Loading representative data

from policyengine.tax_benefit_models.us import PolicyEngineUSDataset

dataset = PolicyEngineUSDataset(
    name="Enhanced CPS 2024",
    description="Enhanced Current Population Survey microdata",
    filepath="./data/enhanced_cps_2024_year_2024.h5",
    year=2024,
)

print(f"People: {len(dataset.data.person):,}")
print(f"Tax units: {len(dataset.data.tax_unit):,}")
print(f"SPM units: {len(dataset.data.spm_unit):,}")
print(f"Households: {len(dataset.data.household):,}")

Creating custom scenarios

import pandas as pd
from microdf import MicroDataFrame
from policyengine.tax_benefit_models.us import USYearData

# Married couple with 2 children
person_df = MicroDataFrame(
    pd.DataFrame({
        "person_id": [0, 1, 2, 3],
        "person_household_id": [0, 0, 0, 0],
        "person_tax_unit_id": [0, 0, 0, 0],
        "person_spm_unit_id": [0, 0, 0, 0],
        "person_family_id": [0, 0, 0, 0],
        "person_marital_unit_id": [0, 0, 1, 2],
        "age": [35, 33, 8, 5],
        "employment_income": [60000, 40000, 0, 0],
        "person_weight": [1.0, 1.0, 1.0, 1.0],
    }),
    weights="person_weight"
)

tax_unit_df = MicroDataFrame(
    pd.DataFrame({
        "tax_unit_id": [0],
        "tax_unit_weight": [1.0],
    }),
    weights="tax_unit_weight"
)

spm_unit_df = MicroDataFrame(
    pd.DataFrame({
        "spm_unit_id": [0],
        "spm_unit_weight": [1.0],
    }),
    weights="spm_unit_weight"
)

family_df = MicroDataFrame(
    pd.DataFrame({
        "family_id": [0],
        "family_weight": [1.0],
    }),
    weights="family_weight"
)

marital_unit_df = MicroDataFrame(
    pd.DataFrame({
        "marital_unit_id": [0, 1, 2],
        "marital_unit_weight": [1.0, 1.0, 1.0],
    }),
    weights="marital_unit_weight"
)

household_df = MicroDataFrame(
    pd.DataFrame({
        "household_id": [0],
        "household_weight": [1.0],
        "state_code": ["CA"],
    }),
    weights="household_weight"
)

dataset = PolicyEngineUSDataset(
    name="Married couple scenario",
    description="Two adults, two children",
    filepath="./married_couple.h5",
    year=2024,
    data=USYearData(
        person=person_df,
        tax_unit=tax_unit_df,
        spm_unit=spm_unit_df,
        family=family_df,
        marital_unit=marital_unit_df,
        household=household_df,
    )
)

Running a simulation

from policyengine.core import Simulation
from policyengine.tax_benefit_models.us import us_latest

simulation = Simulation(
    dataset=dataset,
    tax_benefit_model_version=us_latest,
)
simulation.run()

# Check results
output = simulation.output_dataset.data
print(output.household[["household_net_income", "household_benefits", "household_tax"]])

Key parameters

Income tax

Payroll tax

Child Tax Credit

Earned Income Tax Credit

SNAP

Common policy reforms

Increasing standard deduction

from policyengine.core import Policy, Parameter, ParameterValue
import datetime

parameter = Parameter(
    name="gov.irs.income.standard_deduction.single",
    tax_benefit_model_version=us_latest,
    description="Standard deduction (single)",
    data_type=float,
)

policy = Policy(
    name="Increase standard deduction to $20,000",
    description="Raises single standard deduction from $14,600 to $20,000",
    parameter_values=[
        ParameterValue(
            parameter=parameter,
            start_date=datetime.date(2024, 1, 1),
            end_date=datetime.date(2024, 12, 31),
            value=20000,
        )
    ],
)

Expanding Child Tax Credit

parameter = Parameter(
    name="gov.irs.credits.ctc.amount.base",
    tax_benefit_model_version=us_latest,
    description="Base CTC amount",
    data_type=float,
)

policy = Policy(
    name="Increase CTC to $3,000",
    description="Expands CTC from $2,000 to $3,000 per child",
    parameter_values=[
        ParameterValue(
            parameter=parameter,
            start_date=datetime.date(2024, 1, 1),
            end_date=datetime.date(2024, 12, 31),
            value=3000,
        )
    ],
)

Making CTC fully refundable

parameter = Parameter(
    name="gov.irs.credits.ctc.refundable.amount.max",
    tax_benefit_model_version=us_latest,
    description="Maximum refundable CTC",
    data_type=float,
)

policy = Policy(
    name="Fully refundable CTC",
    description="Makes entire $2,000 CTC refundable",
    parameter_values=[
        ParameterValue(
            parameter=parameter,
            start_date=datetime.date(2024, 1, 1),
            end_date=datetime.date(2024, 12, 31),
            value=2000,  # Match base amount
        )
    ],
)

State variations

The US model includes state-level variations for:

State codes

Use two-letter state codes (e.g., “CA”, “NY”, “TX”). All 50 states plus DC are supported.

Entity mapping considerations

The US model’s complex entity structure requires careful attention to entity mapping:

Person → Household

When mapping person-level variables (like ssi) to household level, values are summed across all household members:

agg = Aggregate(
    simulation=simulation,
    variable="ssi",  # Person-level
    entity="household",  # Aggregate to household
    aggregate_type=AggregateType.SUM,
)
# Result: Total SSI for all persons in each household

Tax unit → Household

Tax units nest within households. A household may contain multiple tax units (e.g., adult child filing separately):

agg = Aggregate(
    simulation=simulation,
    variable="income_tax",  # Tax unit level
    entity="household",  # Aggregate to household
    aggregate_type=AggregateType.SUM,
)
# Result: Total income tax for all tax units in each household

Household → Person

Household variables are replicated to all household members:

# household_net_income at person level
# Each person in household gets the same household_net_income value

Direct entity mapping

For complex multi-entity scenarios, you can use map_to_entity directly:

# Map SPM unit SNAP benefits to household level
household_snap = dataset.data.map_to_entity(
    source_entity="spm_unit",
    target_entity="household",
    columns=["snap"],
    how="sum"
)

# Split tax unit income equally among persons
person_tax_income = dataset.data.map_to_entity(
    source_entity="tax_unit",
    target_entity="person",
    columns=["taxable_income"],
    how="divide"
)

# Map custom analysis values
custom_analysis = dataset.data.map_to_entity(
    source_entity="person",
    target_entity="tax_unit",
    values=custom_values_array,
    how="sum"
)

See the Entity mapping section in Core Concepts for full documentation on aggregation methods.

Data sources

The US model can use several data sources:

  1. Current Population Survey (CPS): Census Bureau household survey

    • ~60,000 households

    • Detailed income and demographic data

    • Published annually

  2. Enhanced CPS: Calibrated and enhanced version

    • Uprated to population totals

    • Imputed benefit receipt

    • Multiple projection years

  3. Custom datasets: User-created scenarios

    • Full control over household composition

    • Exact income levels

    • Specific tax filing scenarios

Validation

When creating custom datasets, validate:

  1. Entity relationships: All persons link to valid tax_unit, spm_unit, household

  2. Join key naming: Use person_household_id, person_tax_unit_id, etc.

  3. Weights: Appropriate weights for each entity level

  4. State codes: Valid two-letter codes

  5. Filing status: Tax units should reflect actual filing patterns

Examples

See working examples in the examples/ directory:

References