Integrating Economic Uprating with Demographic Reweighting¶
Executive Summary¶
This document outlines an innovative approach for projecting federal income tax revenue through 2100 that uniquely combines sophisticated economic microsimulation with demographic reweighting. By harmonizing PolicyEngine’s state-of-the-art tax modeling with Social Security Administration demographic projections, we can isolate and quantify the fiscal impact of population aging while preserving the full complexity of the tax code.
The Challenge¶
Projecting tax revenue over a 75-year horizon requires simultaneously modeling two distinct but interrelated dynamics:
Economic Evolution: How incomes, prices, and tax parameters change over time
Wage growth and income distribution shifts
Inflation affecting brackets and deductions
Legislative changes and indexing rules
Behavioral responses to tax policy
Demographic Transformation: How the population structure evolves
Baby boom generation aging through retirement
Declining birth rates reducing working-age population
Increasing longevity extending retirement duration
Shifting household composition patterns
Traditional approaches typically sacrifice either economic sophistication (using simplified tax calculations) or demographic realism (holding age distributions constant). Our methodology preserves both.
Run projections using run_household_projection.py:
# Save calibrated datasets as .h5 files for each year
!python ../policyengine_us_data/datasets/cps/long_term/run_household_projection.py 2027 --greg --use-ss --use-payroll --save-h5TEST_LITE == False
======================================================================
HOUSEHOLD-LEVEL INCOME TAX PROJECTION: 2025-2027
======================================================================
Configuration:
Base year: 2024 (CPS microdata)
Projection: 2025-2027
Calculation level: HOUSEHOLD ONLY (simplified)
Calibration method: GREG
Including Social Security benefits constraint: Yes
Including taxable payroll constraint: Yes
Saving year-specific .h5 files: Yes (to ./projected_datasets/)
Years to process: 3
Estimated time: ~9 minutes
======================================================================
STEP 1: DEMOGRAPHIC PROJECTIONS
======================================================================
Loaded SSA projections: 86 ages x 3 years
Population projections:
2025: 346.6M
2027: 350.9M
======================================================================
STEP 2: BUILDING HOUSEHOLD AGE COMPOSITION
======================================================================
Loaded 21,532 households
Household age matrix shape: (21532, 86)
======================================================================
STEP 3: HOUSEHOLD-LEVEL PROJECTION
======================================================================
Methodology (SIMPLIFIED):
1. PolicyEngine uprates to each projection year
2. Calculate all values at household level (map_to='household')
3. IPF/GREG adjusts weights to match SSA demographics
4. Apply calibrated weights directly (no aggregation needed)
Initial memory usage: 1.13 GB
Year Population Income Tax Baseline Tax Memory
-----------------------------------------------------------------
[DEBUG 2025] SS baseline: $1424.6B, target: $1609.0B
[DEBUG 2025] Payroll baseline: $8950.9B, target: $10621.0B
[DEBUG 2025] SS achieved: $1609.0B (error: -0.0%)
[DEBUG 2025] Payroll achieved: $10621.0B (error: -0.0%)
Saved 2025.h5
2025 346.6M $ 2543.1B $ 1882.9B 4.32GB
[DEBUG 2027] SS baseline: $1495.1B, target: $1799.9B
[DEBUG 2027] Payroll baseline: $9718.4B, target: $11627.0B
[DEBUG 2027] SS achieved: $1799.9B (error: -0.0%)
[DEBUG 2027] Payroll achieved: $11627.0B (error: 0.0%)
Saved 2027.h5
2027 350.9M $ 2873.8B $ 2125.0B 4.62GB
Arguments:
END_YEAR: Target year for projection (default: 2035)--greg: Use GREG calibration instead of IPF (optional)--use-ss: Include Social Security benefit totals as calibration target (requires --greg)--use-payroll: Include taxable payroll as calibration target (requires --greg)--save-h5: Save year-specific .h5 files to./projected_datasets/directory
The Challenge¶
Projecting tax revenue over a 75-year horizon requires simultaneously modeling two distinct but interrelated dynamics:
Economic Evolution: How incomes, prices, and tax parameters change over time
Wage growth and income distribution shifts
Inflation affecting brackets and deductions
Legislative changes and indexing rules
Behavioral responses to tax policy
Demographic Transformation: How the population structure evolves
Baby boom generation aging through retirement
Declining birth rates reducing working-age population
Increasing longevity extending retirement duration
Shifting household composition patterns
Traditional approaches typically sacrifice either economic sophistication (using simplified tax calculations) or demographic realism (holding age distributions constant). Our methodology preserves both.
Data Sources¶
The long-term projections use two key SSA datasets:
SSA Population Projections (
SSPopJul_TR2024.csv)
Source: SSA 2024 Trustees Report - Single Year Age Demographic Projections
Contains age-specific population projections through 2100
Used for demographic reweighting to match future population structure
Social Security Cost Projections (
social_security_aux.csv)
Contains OASDI benefit cost projections in CPI-indexed 2025 dollars
Used as calibration target in GREG method to ensure fiscal consistency
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from policyengine_us_data.storage import STORAGE_FOLDER/home/baogorek/envs/pe/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
TEST_LITE == False
# Load SSA population data
ssa_pop = pd.read_csv(STORAGE_FOLDER / 'SSPopJul_TR2024.csv')
ssa_pop.head()# Load Social Security auxiliary data
ss_aux = pd.read_csv(STORAGE_FOLDER / 'social_security_aux.csv')
ss_aux.head()Core Innovation¶
Our approach operates in two complementary stages:
Stage 1: Economic Uprating¶
PolicyEngine’s microsimulation engine projects each household’s economic circumstances forward using:
Sophisticated Income Modeling
The system models 17 distinct income categories, each uprated according to its economic fundamentals:
Primary Categories with Specific Projections:
Employment income (wages) - follows CBO wage growth projections
Self-employment income - follows CBO business income projections
Capital gains - follows CBO asset appreciation projections
Interest income - follows CBO interest rate projections
Dividend income - follows CBO corporate profit projections
Pension income - follows CBO retirement income projections
Social Security - follows SSA COLA projections (available through 2100)
Stage 2: Demographic Reweighting¶
We offer two calibration methods for adjusting household weights to match SSA projections:
Method 1: Iterative Proportional Fitting (IPF)
Traditional raking approach using Kullback-Leibler divergence
Iteratively adjusts weights to match marginal distributions
Robust to specification and always produces non-negative weights
Default method for backward compatibility
Method 2: Generalized Regression (GREG) Calibration
Modern calibration using chi-squared distance minimization
Enables simultaneous calibration to categorical AND continuous variables
Direct solution via matrix operations (no iteration needed)
Required for incorporating Social Security benefit constraints
Demonstrating the Calibration Methods¶
from policyengine_us_data.datasets.cps.long_term.ssa_data import (
load_ssa_age_projections,
load_ssa_benefit_projections
)# Get SSA population targets for a specific year
year = 2025
age_targets = load_ssa_age_projections(end_year=year)
print(f"\nAge distribution targets for {year}:")
print(f"Shape: {age_targets.shape}")
print(f"Total population: {age_targets[:, 0].sum() / 1000:.1f}M")
Age distribution targets for 2025:
Shape: (86, 1)
Total population: 346577.3M
# Get Social Security benefit target
ss_target = load_ssa_benefit_projections(year)
print(f"\nSocial Security benefit target for {year}: ${ss_target / 1e9:.1f}B")
Social Security benefit target for 2025: $1609.0B
PWBM Analysis: Eliminating Income Taxes on Social Security Benefits¶
Source: Eliminating Income Taxes on Social Security Benefits (Penn Wharton Budget Model, February 10, 2025)
Policy Analyzed¶
The Penn Wharton Budget Model (PWBM) analyzed a policy proposal to permanently eliminate all income taxes on Social Security benefits, effective January 1, 2025.
Key Findings¶
Budgetary Impact: The policy is projected to reduce federal revenues by $1.45 trillion over the 10-year budget window (2025-2034). Over the long term, it is projected to increase federal debt by 7 percent by 2054, relative to the current baseline.
Macroeconomic Impact: The analysis finds the policy would have negative long-term effects on the economy.
It reduces incentives for households to save for retirement and to work.
This leads to a smaller capital stock (projected to be 4.2% lower by 2054).
The smaller capital stock results in lower average wages (1.8% lower by 2054) and lower GDP (2.1% lower by 2054).
Conventional Distributional Impact (Your Table): The table you shared shows the annual “conventional” effects on household after-tax income.
The largest average dollar tax cuts go to households in the top 20 percent of the income distribution (quintiles 80-100%).
The largest relative gains (as a percentage of income) go to households in the fourth quintile (60-80%), who see a 1.6% increase in after-tax income by 2054.
The dollar amounts shown are in nominal dollars for each specified year, not adjusted to a single base year.
Dynamic (Lifetime) Impact: When analyzing the policy’s effects over a household’s entire lifetime, PWBM finds:
The policy primarily benefits high-income households who are nearing or in retirement.
It negatively impacts all households under the age of 30 and all future generations, who would experience a net welfare loss due to the long-term effects of lower wages and higher federal debt.
PolicyEngine’s Analysis of Eliminating Income Taxes on Social Security Benefits¶
import sys
import os
import pandas as pd
import numpy as np
import gc
from policyengine_us import Microsimulation
from policyengine_core.reforms import Reform
WHARTON_BENCHMARKS = {
2026: {
'First quintile': {'tax_change': 0, 'pct_change': 0.0},
'Second quintile': {'tax_change': -15, 'pct_change': 0.0},
'Middle quintile': {'tax_change': -340, 'pct_change': 0.5},
'Fourth quintile': {'tax_change': -1135, 'pct_change': 1.1},
'80-90%': {'tax_change': -1625, 'pct_change': 1.0},
'90-95%': {'tax_change': -1590, 'pct_change': 0.7},
'95-99%': {'tax_change': -2020, 'pct_change': 0.5},
'99-99.9%': {'tax_change': -2205, 'pct_change': 0.2},
'Top 0.1%': {'tax_change': -2450, 'pct_change': 0.0},
},
2034: {
'First quintile': {'tax_change': 0, 'pct_change': 0.0},
'Second quintile': {'tax_change': -45, 'pct_change': 0.1},
'Middle quintile': {'tax_change': -615, 'pct_change': 0.8},
'Fourth quintile': {'tax_change': -1630, 'pct_change': 1.2},
'80-90%': {'tax_change': -2160, 'pct_change': 1.1},
'90-95%': {'tax_change': -2160, 'pct_change': 0.7},
'95-99%': {'tax_change': -2605, 'pct_change': 0.6},
'99-99.9%': {'tax_change': -2715, 'pct_change': 0.2},
'Top 0.1%': {'tax_change': -2970, 'pct_change': 0.0},
},
2054: {
'First quintile': {'tax_change': -5, 'pct_change': 0.0},
'Second quintile': {'tax_change': -275, 'pct_change': 0.3},
'Middle quintile': {'tax_change': -1730, 'pct_change': 1.3},
'Fourth quintile': {'tax_change': -3560, 'pct_change': 1.6},
'80-90%': {'tax_change': -4075, 'pct_change': 1.2},
'90-95%': {'tax_change': -4385, 'pct_change': 0.9},
'95-99%': {'tax_change': -4565, 'pct_change': 0.6},
'99-99.9%': {'tax_change': -4820, 'pct_change': 0.2},
'Top 0.1%': {'tax_change': -5080, 'pct_change': 0.0},
},
}
def run_analysis(dataset_path, year, income_rank_var = "household_net_income"):
"""Run Option 1 analysis for given dataset and year"""
option1_reform = Reform.from_dict(
{
# Base rate parameters (0-50% bracket)
"gov.irs.social_security.taxability.rate.base.benefit_cap": {
"2026-01-01.2100-12-31": 0
},
"gov.irs.social_security.taxability.rate.base.excess": {
"2026-01-01.2100-12-31": 0
},
# Additional rate parameters (50-85% bracket)
"gov.irs.social_security.taxability.rate.additional.benefit_cap": {
"2026-01-01.2100-12-31": 0
},
"gov.irs.social_security.taxability.rate.additional.bracket": {
"2026-01-01.2100-12-31": 0
},
"gov.irs.social_security.taxability.rate.additional.excess": {
"2026-01-01.2100-12-31": 0
}
}, country_id="us"
)
reform = Microsimulation(dataset=dataset_path, reform=option1_reform)
# Get household data
household_net_income_reform = reform.calculate("household_net_income", period=year, map_to="household")
household_agi_reform = reform.calculate("adjusted_gross_income", period=year, map_to="household")
income_tax_reform = reform.calculate("income_tax", period=year, map_to="household")
del reform
gc.collect()
print(f"Loading dataset: {dataset_path}")
baseline = Microsimulation(dataset=dataset_path)
household_weight = baseline.calculate("household_weight", period=year)
household_net_income_baseline = baseline.calculate("household_net_income", period=year, map_to="household")
household_agi_baseline = baseline.calculate("adjusted_gross_income", period=year, map_to="household")
income_tax_baseline = baseline.calculate("income_tax", period=year, map_to="household")
# Calculate changes
tax_change = income_tax_reform - income_tax_baseline
income_change_pct = (
(household_net_income_reform - household_net_income_baseline) / household_net_income_baseline
) * 100
# Create DataFrame
df = pd.DataFrame({
'household_net_income': household_net_income_baseline,
'weight': household_weight,
'tax_change': tax_change,
'income_change_pct': income_change_pct,
'income_rank_var': baseline.calculate(income_rank_var, year, map_to="household")
})
# Calculate percentiles
print(f"Ranking according to quantiles with: {income_rank_var}")
df['income_percentile'] = df['income_rank_var'].rank(pct=True) * 100
# Assign income groups
def assign_income_group(percentile):
if percentile <= 20:
return 'First quintile'
elif percentile <= 40:
return 'Second quintile'
elif percentile <= 60:
return 'Middle quintile'
elif percentile <= 80:
return 'Fourth quintile'
elif percentile <= 90:
return '80-90%'
elif percentile <= 95:
return '90-95%'
elif percentile <= 99:
return '95-99%'
elif percentile <= 99.9:
return '99-99.9%'
else:
return 'Top 0.1%'
df['income_group'] = df['income_percentile'].apply(assign_income_group)
# Calculate aggregate revenue
revenue_impact = (income_tax_reform.sum() - income_tax_baseline.sum()) / 1e9
# Calculate by group
results = []
for group in ['First quintile', 'Second quintile', 'Middle quintile', 'Fourth quintile',
'80-90%', '90-95%', '95-99%', '99-99.9%', 'Top 0.1%']:
group_data = df[df['income_group'] == group]
if len(group_data) == 0:
continue
total_weight = group_data['weight'].sum()
avg_tax_change = (group_data['tax_change'] * group_data['weight']).sum() / total_weight
avg_income_change_pct = (group_data['income_change_pct'] * group_data['weight']).sum() / total_weight
results.append({
'group': group,
'pe_tax_change': round(avg_tax_change),
'pe_pct_change': round(avg_income_change_pct, 1),
})
return pd.DataFrame(results), revenue_impact
def generate_comparison_table(pe_results, year):
"""Generate comparison table with Wharton benchmark"""
if year not in WHARTON_BENCHMARKS:
print(f"Warning: No Wharton benchmark available for year {year}")
return pe_results
wharton_data = WHARTON_BENCHMARKS[year]
comparison = []
for _, row in pe_results.iterrows():
group = row['group']
wharton = wharton_data.get(group, {'tax_change': None, 'pct_change': None})
pe_tax = row['pe_tax_change']
wh_tax = wharton['tax_change']
comparison.append({
'Income Group': group,
'PolicyEngine': f"${pe_tax:,}",
'Wharton': f"${wh_tax:,}" if wh_tax is not None else 'N/A',
'Difference': f"${(pe_tax - wh_tax):,}" if wh_tax is not None else 'N/A',
'PE %': f"{row['pe_pct_change']}%",
'Wharton %': f"{wharton['pct_change']}%" if wharton['pct_change'] is not None else 'N/A',
})
return pd.DataFrame(comparison)
dataset_path = 'hf://policyengine/test/2054.h5'
year = 2054
income_rank_variable = "household_net_income"
print("="*80)
print(f"WHARTON COMPARISON PIPELINE - YEAR {year}")
print("="*80)
print()
# Run analysis
print("Running PolicyEngine analysis...")
pe_results, revenue_impact = run_analysis(dataset_path, year, income_rank_variable)
print(f"✓ Analysis complete")
print(f" Revenue impact: ${revenue_impact:.1f}B")
print()
# Generate comparison table
print("Generating comparison table...")
comparison_table = generate_comparison_table(pe_results, year)
print()
print("="*80)
print(f"COMPARISON TABLE: {year}")
print("="*80)
print()
print("Average Tax Change (per household):")
print(comparison_table[['Income Group', 'PolicyEngine', 'Wharton', 'Difference']].to_string(index=False))
print()
print("Percent Change in Income:")
print(comparison_table[['Income Group', 'PE %', 'Wharton %']].to_string(index=False))
print()================================================================================
WHARTON COMPARISON PIPELINE - YEAR 2054
================================================================================
Running PolicyEngine analysis...
Loading dataset: hf://policyengine/test/2054.h5
Ranking according to quantiles with: household_net_income
✓ Analysis complete
Revenue impact: $-579.1B
Generating comparison table...
================================================================================
COMPARISON TABLE: 2054
================================================================================
Average Tax Change (per household):
Income Group PolicyEngine Wharton Difference
First quintile $-95 $-5 $-90
Second quintile $-1,054 $-275 $-779
Middle quintile $-2,241 $-1,730 $-511
Fourth quintile $-4,633 $-3,560 $-1,073
80-90% $-6,737 $-4,075 $-2,662
90-95% $-12,121 $-4,385 $-7,736
95-99% $-8,066 $-4,565 $-3,501
99-99.9% $-7,257 $-4,820 $-2,437
Top 0.1% $-8,615 $-5,080 $-3,535
Percent Change in Income:
Income Group PE % Wharton %
First quintile -0.10000000149011612% 0.0%
Second quintile 0.800000011920929% 0.3%
Middle quintile 1.100000023841858% 1.3%
Fourth quintile 1.399999976158142% 1.6%
80-90% 1.399999976158142% 1.2%
90-95% 1.7000000476837158% 0.9%
95-99% 0.699999988079071% 0.6%
99-99.9% 0.30000001192092896% 0.2%
Top 0.1% 0.10000000149011612% 0.0%