Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Appendix

Appendix A: Implementation Code

A.1 Quantile Regression Forest Implementation

The following code demonstrates the implementation of Quantile Regression Forests for variable imputation:

from quantile_forest import RandomForestQuantileRegressor

qrf = RandomForestQuantileRegressor(
    n_estimators=100,
    min_samples_leaf=1,
    random_state=0
)

A.2 PyTorch Optimization for Reweighting

The reweighting optimization uses PyTorch for gradient-based optimization:

import torch

# Initialize with log of original weights
log_weights = torch.log(original_weights)
log_weights.requires_grad = True

# Adam optimizer
optimizer = torch.optim.Adam([log_weights], lr=0.1)

# Optimization loop
for iteration in range(5000):
    weights = torch.exp(log_weights)
    achieved = weights @ loss_matrix
    relative_errors = (achieved - targets) / targets
    loss = torch.mean(relative_errors ** 2)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Appendix B: Tables

Table A1: Complete List of Imputed Variables

Variables Imputed from IRS Public Use File (57 variables)

Income Variables:

Deductions and Adjustments:

Tax Credits:

Qualified Business Income Variables:

The current PUF/calibration pipeline uses the legacy business_is_sstb flag to split these SSTB variables on an all-or-nothing basis. It does not yet infer mixed SSTB and non-SSTB allocations within the same record.

Other Tax Variables:

PUF Reported/Calculated Tax Outputs Excluded from Donor Imputation

Variables Imputed from Survey of Income and Program Participation (1 variable)

Variables Imputed from Survey of Consumer Finances (3 variables)

Variables Imputed from American Community Survey (2 variables)