Simulating policy#
The most common use-case of PolicyEngine Core country packages is to compute unknown variables for a given household, under a country’s law. To do this, we need to specify the input household, using the entity and variable definitions as defined in the country code. If you’re not sure what entities to declare or which variables to specify for which time periods, check the repository of the country package you’re using. This page shows examples of how to use an existing country model to simulate policy.
Simulating on individual households#
Simulating policy on individual households is simple: import Simulation
from the country package, and pass in the household’s data as a dictionary. Then, use Simulation.calculate(variable_name, time_period)
to compute a particular variable at a particular time period. The input data should be a dictionary, under the following format:
Entity plural (e.g. “people”)
Entity ID (e.g. “person”)
Variable name (e.g. “age”)
Time period (e.g. “2019-01”). This is optional; providing a single value here will use the default time period.
Value (e.g. 30)
(Only for group entities) The plural of the role (e.g. “members”).
The list of entity IDs who meet that role
Hint
You don’t have to pass in group entity plurals if there’s only one group entity (e.g. if everyone is a member of the same household). Just pass in the single entity key (e.g. “household”).
from policyengine_core.country_template import Simulation
EXAMPLE = dict(
persons=dict(
person=dict(
age=30,
salary=30_000,
),
),
)
simulation = Simulation(situation=EXAMPLE)
simulation.calculate("income_tax", "2022-01")
array([4500.], dtype=float32)
Our example calculated €1,500 in income taxes for this input household, in three lines of code. We could also have asked for any other variable, at any other time period.
Simulating over axes#
Often, we might want to see how a particular variable changes in response to another variable(s). For this, we can use axes
. This is a list of lists, with the following structure:
List of perpendicular axes
List of parallel axes
Axes description: the name of the variable to vary, the min, max and count of values
EXAMPLE = dict(
persons=dict(
person=dict(
age=30,
),
),
axes=[[dict(name="salary", count=10, min=0, max=100_000)]],
)
simulation = Simulation(situation=EXAMPLE)
simulation.calculate("income_tax", "2022-01")
array([ 0. , 1666.6667, 3333.3335, 5000. , 6666.667 ,
8333.334 , 10000. , 11666.668 , 13333.334 , 15000.001 ],
dtype=float32)
import plotly.express as px
from policyengine_core.charts import format_fig, display_fig, BLUE
fig = (
px.line(
x=simulation.calculate("salary"),
y=simulation.calculate("income_tax"),
color_discrete_sequence=[BLUE],
)
.update_layout(
xaxis_title="Salary",
yaxis_title="Income tax",
title="Income tax by salary",
xaxis_tickformat=",.0f",
yaxis_tickformat=",.0f",
xaxis_tickprefix="£",
yaxis_tickprefix="£",
)
.update_traces(
hovertemplate="<b>Salary</b>: £%{x:,.0f}<br><b>Income tax</b>: £%{y:,.0f}",
)
)
format_fig(fig)
Simulating over populations#
As well as a general-purpose Simulation
interface, each country package also includes a Microsimulation
interface. This inherits from WeightedSimulation
(so everything that works on the former will work on the latter), handling survey weights and dataset loading.
from policyengine_core.country_template import Microsimulation
sim = Microsimulation()
sim.calculate("income_tax", "2022-01").sum() / 1e6
51.000003242492674
If you inspect the result of the sim.calculate
call, you’ll find it actually returns a MicroSeries
(defined by the microdf
Python package). This is a class inheriting from pandas.Series
, with a few extra methods for handling survey weights. The general intuition is that you can treat this weighted array as if it were an array of the full population it’s representative of, using it it as you would any other pandas.Series
.
Subsampling simulations#
Often, we’re running simulations over very large (100,000+) datasets. This can be slow, so we might want to subsample the dataset to speed up the simulation. This can be done by using Simulation.subsample
or Microsimulation.subsample
, which will return a new Simulation
or Microsimulation
object with a smaller dataset.
from policyengine_us import Microsimulation
sim = Microsimulation()
sim.calculate("adjusted_gross_income", 2024).sum() / 1e9
13996.034939691408
sim = Microsimulation().subsample(frac=0.1)
sim.calculate("adjusted_gross_income", 2024).sum() / 1e9
13891.76442888221