Simulating policy#

The most common use-case of PolicyEngine Core country packages is to compute unknown variables for a given household, under a country’s law. To do this, we need to specify the input household, using the entity and variable definitions as defined in the country code. If you’re not sure what entities to declare or which variables to specify for which time periods, check the repository of the country package you’re using. This page shows examples of how to use an existing country model to simulate policy.

Simulating on individual households#

Simulating policy on individual households is simple: import Simulation from the country package, and pass in the household’s data as a dictionary. Then, use Simulation.calculate(variable_name, time_period) to compute a particular variable at a particular time period. The input data should be a dictionary, under the following format:

Entity plural (e.g. “people”)
- Entity ID (e.g. “person”)
  - Variable name (e.g. “age”)
    - Time period (e.g. “2019-01”). This is optional; providing a single value here will use the default time period.
      - Value (e.g. 30)
  - (Only for group entities) The plural of the role (e.g. “members”).
    - The list of entity IDs who meet that role

Hint

You don’t have to pass in group entity plurals if there’s only one group entity (e.g. if everyone is a member of the same household). Just pass in the single entity key (e.g. “household”).

from policyengine_core.country_template import Simulation

EXAMPLE = dict(
    persons=dict(
        person=dict(
            age=30,
            salary=30_000,
        ),
    ),
)

simulation = Simulation(situation=EXAMPLE)
simulation.calculate("income_tax", "2022-01")

array([4500.], dtype=float32)

Our example calculated €1,500 in income taxes for this input household, in three lines of code. We could also have asked for any other variable, at any other time period.

Simulating over axes#

Often, we might want to see how a particular variable changes in response to another variable(s). For this, we can use axes. This is a list of lists, with the following structure:

List of perpendicular axes
- List of parallel axes
  - Axes description: the name of the variable to vary, the min, max and count of values

EXAMPLE = dict(
    persons=dict(
        person=dict(
            age=30,
        ),
    ),
    axes=[[dict(name="salary", count=10, min=0, max=100_000)]],
)

simulation = Simulation(situation=EXAMPLE)
simulation.calculate("income_tax", "2022-01")

array([    0.    ,  1666.6667,  3333.3335,  5000.    ,  6666.667 ,
        8333.334 , 10000.    , 11666.668 , 13333.334 , 15000.001 ],
      dtype=float32)

import plotly.express as px
from policyengine_core.charts import format_fig, display_fig, BLUE

fig = (
    px.line(
        x=simulation.calculate("salary"),
        y=simulation.calculate("income_tax"),
        color_discrete_sequence=[BLUE],
    )
    .update_layout(
        xaxis_title="Salary",
        yaxis_title="Income tax",
        title="Income tax by salary",
        xaxis_tickformat=",.0f",
        yaxis_tickformat=",.0f",
        xaxis_tickprefix="£",
        yaxis_tickprefix="£",
    )
    .update_traces(
        hovertemplate="<b>Salary</b>: £%{x:,.0f}<br><b>Income tax</b>: £%{y:,.0f}",
    )
)

format_fig(fig)

Simulating over populations#

As well as a general-purpose Simulation interface, each country package also includes a Microsimulation interface. This inherits from WeightedSimulation (so everything that works on the former will work on the latter), handling survey weights and dataset loading.

from policyengine_core.country_template import Microsimulation

sim = Microsimulation()

sim.calculate("income_tax", "2022-01").sum() / 1e6

51.000003242492674

If you inspect the result of the sim.calculate call, you’ll find it actually returns a MicroSeries (defined by the microdf Python package). This is a class inheriting from pandas.Series, with a few extra methods for handling survey weights. The general intuition is that you can treat this weighted array as if it were an array of the full population it’s representative of, using it it as you would any other pandas.Series.

Subsampling simulations#

Often, we’re running simulations over very large (100,000+) datasets. This can be slow, so we might want to subsample the dataset to speed up the simulation. This can be done by using Simulation.subsample or Microsimulation.subsample, which will return a new Simulation or Microsimulation object with a smaller dataset.

from policyengine_us import Microsimulation

sim = Microsimulation()

sim.calculate("adjusted_gross_income", 2024).sum() / 1e9

13996.034939691408

sim = Microsimulation().subsample(frac=0.1)
sim.calculate("adjusted_gross_income", 2024).sum() / 1e9

13891.76442888221