Skip to article frontmatterSkip to article content

Getting started

In this notebook, we’ll walk through how to use the PolicyEngine.py package to run simulations and produce analyses. We’ll start with a basic analysis in the UK that doesn’t use any databases, and then start saving and loading things into a database.

Running a baseline simulation

To start, let’s run through a simulation of the UK, and create a chart of the distribution of household income.

import plotly.graph_objects as go

from policyengine.models import (
    Aggregate,
    Simulation,
    policyengine_uk_latest_version,
    policyengine_uk_model,
)
from policyengine.utils.charts import add_fonts, format_figure
from policyengine.utils.datasets import create_uk_dataset

# Load the dataset

uk_dataset = create_uk_dataset()

# Create and run the simulation


sim = Simulation(
    dataset=uk_dataset,
    model=policyengine_uk_model,
    model_version=policyengine_uk_latest_version,
)

sim.run()

# Extract aggregates for household income ranges

income_ranges = [
    0,
    20000,
    40000,
    60000,
    80000,
    100000,
    150000,
    200000,
    300000,
    500000,
    1_000_000,
]
aggregates = []
for i in range(len(income_ranges) - 1):
    aggregates.append(
        Aggregate(
            entity="household",
            variable_name="hbai_household_net_income",
            aggregate_function="count",
            filter_variable_name="hbai_household_net_income",
            filter_variable_geq=income_ranges[i],
            filter_variable_leq=income_ranges[i + 1],
            simulation=sim,
        )
    )

aggregates = Aggregate.run(aggregates)

# Create the bar chart

fig = go.Figure(
    data=[
        go.Bar(
            x=[f"£{inc:,}" for inc in income_ranges[:-1]],
            y=[agg.value for agg in aggregates],
        )
    ]
)

# Apply formatting

format_figure(
    fig,
    title="The distribution of household income in the UK",
    x_title="Income range",
    y_title="Number of households",
)
Loading...

So, in this example we introduced a few concepts:

  • The Simulation object, which represents a full run of a microsimulation model, containing all the information (simulated and input) about a set of people or groups. It takes here a few arguments: a Dataset, Model and ModelVersion.

  • The Dataset object, which represents a set of people or groups. Here we used a utility function to create this dataset for the UK, but we later will be able to create these from scratch or pull them from a database.

  • The Model object, which represents a particular microsimulation model (essentially defined as a function transforming a dataset to a new dataset). There are two models defined by this package, one for the UK and one for the US. Think of these objects as adapters representing the full microsimulation models. Here, we’ve taken the pre-defined UK model.

  • The ModelVersion object, which represents a particular version of a model. This is useful for tracking changes to the model over time. Here, we used the latest version of the UK model.

Adding a policy reform

Next, we’ll add in a policy reform, and see how that changes the results.

from datetime import datetime

from policyengine.models import Parameter, ParameterValue, Policy

# Parameter = the parameter to change

personal_allowance = Parameter(
    id="gov.hmrc.income_tax.allowances.personal_allowance.amount",
    model=policyengine_uk_model,
)

# ParameterValue = the value to set the parameter to, and when to start

personal_allowance_value = ParameterValue(
    parameter=personal_allowance,
    start_date=datetime(2029, 1, 1),
    value=20000,
)

# Create a policy to increase the personal allowance to £20,000 from 2029-30

policy = Policy(
    name="Increase personal allowance to £20,000",
    description="A policy to increase the personal allowance for income tax to £20,000.",
    parameter_values=[personal_allowance_value],
)

sim_2 = Simulation(
    dataset=uk_dataset,
    model=policyengine_uk_model,
    model_version=policyengine_uk_latest_version,
    policy=policy,  # Pass in the policy here
)

sim_2.run()

# Extract new aggregates for household income ranges

income_ranges = [
    0,
    20000,
    40000,
    60000,
    80000,
    100000,
    150000,
    200000,
    300000,
    500000,
    1_000_000,
]
aggregates_2 = []
for i in range(len(income_ranges) - 1):
    aggregates_2.append(
        Aggregate(
            entity="household",
            variable_name="hbai_household_net_income",
            aggregate_function="count",
            filter_variable_name="hbai_household_net_income",
            filter_variable_geq=income_ranges[i],
            filter_variable_leq=income_ranges[i + 1],
            simulation=sim_2,
        )
    )

aggregates_2 = Aggregate.run(aggregates_2)

# Create the comparative bar chart
fig = go.Figure(
    data=[
        go.Bar(
            name="Baseline",
            x=[f"£{inc:,}" for inc in income_ranges[:-1]],
            y=[agg.value for agg in aggregates],
        ),
        go.Bar(
            name="Reform",
            x=[f"£{inc:,}" for inc in income_ranges[:-1]],
            y=[agg.value for agg in aggregates_2],
        ),
    ]
)

# Apply formatting
fig = format_figure(
    fig,
    title="The distribution of household income in the UK",
    x_title="Income range",
    y_title="Number of households",
)

add_fonts()

fig
Loading...

In the above example, we created a Policy object, which represents a particular policy reform. This object contains a list of ParameterValue objects, which represent changes to specific parameters in the model. Here, we changed the personal allowance for income tax to £20,000.

Bringing in a database

Now, we can upload these objects to a database, and then load them back out again. This is useful for tracking different simulations and policy reforms over time.

from policyengine.database import Database

database = Database("postgresql://postgres:postgres@127.0.0.1:54322/postgres")

# These two lines are not usually needed, but you should use them the first time you set up a new database
database.reset()  # Drop and recreate all tables
database.register_model_version(
    policyengine_uk_latest_version
)  # Add in the model, model version, parameters and baseline parameter values and variables.

database.set(uk_dataset)
database.set(policy)

for pv in policy.parameter_values:
    database.set(pv)
database.set(sim)
database.set(sim_2)
for agg in aggregates:
    database.set(agg)
for agg in aggregates_2:
    database.set(agg)
database.get(Policy, id=policy.id)
Policy(id='26f30afa-77b9-4435-812c-071873e25400', name='Increase personal allowance to £20,000', description='A policy to increase the personal allowance for income tax to £20,000.', parameter_values=[], simulation_modifier=None, created_at=datetime.datetime(2025, 9, 20, 12, 36, 27, 162725), updated_at=datetime.datetime(2025, 9, 20, 12, 36, 27, 162729))