In this notebook, we’ll walk through how to use the PolicyEngine.py package to run simulations and produce analyses. We’ll start with a basic analysis in the UK that doesn’t use any databases, and then start saving and loading things into a database.
Running a baseline simulation¶
To start, let’s run through a simulation of the UK, and create a chart of the distribution of household income.
import plotly.graph_objects as go
from policyengine.models import (
Aggregate,
Simulation,
policyengine_uk_latest_version,
policyengine_uk_model,
)
from policyengine.utils.charts import add_fonts, format_figure
from policyengine.utils.datasets import create_uk_dataset
# Load the dataset
uk_dataset = create_uk_dataset()
# Create and run the simulation
sim = Simulation(
dataset=uk_dataset,
model=policyengine_uk_model,
model_version=policyengine_uk_latest_version,
)
sim.run()
# Extract aggregates for household income ranges
income_ranges = [
0,
20000,
40000,
60000,
80000,
100000,
150000,
200000,
300000,
500000,
1_000_000,
]
aggregates = []
for i in range(len(income_ranges) - 1):
aggregates.append(
Aggregate(
entity="household",
variable_name="hbai_household_net_income",
aggregate_function="count",
filter_variable_name="hbai_household_net_income",
filter_variable_geq=income_ranges[i],
filter_variable_leq=income_ranges[i + 1],
simulation=sim,
)
)
aggregates = Aggregate.run(aggregates)
# Create the bar chart
fig = go.Figure(
data=[
go.Bar(
x=[f"£{inc:,}" for inc in income_ranges[:-1]],
y=[agg.value for agg in aggregates],
)
]
)
# Apply formatting
format_figure(
fig,
title="The distribution of household income in the UK",
x_title="Income range",
y_title="Number of households",
)So, in this example we introduced a few concepts:
The
Simulationobject, which represents a full run of a microsimulation model, containing all the information (simulated and input) about a set of people or groups. It takes here a few arguments: aDataset,ModelandModelVersion.The
Datasetobject, which represents a set of people or groups. Here we used a utility function to create this dataset for the UK, but we later will be able to create these from scratch or pull them from a database.The
Modelobject, which represents a particular microsimulation model (essentially defined as a function transforming a dataset to a new dataset). There are two models defined by this package, one for the UK and one for the US. Think of these objects as adapters representing the full microsimulation models. Here, we’ve taken the pre-defined UK model.The
ModelVersionobject, which represents a particular version of a model. This is useful for tracking changes to the model over time. Here, we used the latest version of the UK model.
Adding a policy reform¶
Next, we’ll add in a policy reform, and see how that changes the results.
from datetime import datetime
from policyengine.models import Parameter, ParameterValue, Policy
# Parameter = the parameter to change
personal_allowance = Parameter(
id="gov.hmrc.income_tax.allowances.personal_allowance.amount",
model=policyengine_uk_model,
)
# ParameterValue = the value to set the parameter to, and when to start
personal_allowance_value = ParameterValue(
parameter=personal_allowance,
start_date=datetime(2029, 1, 1),
value=20000,
)
# Create a policy to increase the personal allowance to £20,000 from 2029-30
policy = Policy(
name="Increase personal allowance to £20,000",
description="A policy to increase the personal allowance for income tax to £20,000.",
parameter_values=[personal_allowance_value],
)
sim_2 = Simulation(
dataset=uk_dataset,
model=policyengine_uk_model,
model_version=policyengine_uk_latest_version,
policy=policy, # Pass in the policy here
)
sim_2.run()
# Extract new aggregates for household income ranges
income_ranges = [
0,
20000,
40000,
60000,
80000,
100000,
150000,
200000,
300000,
500000,
1_000_000,
]
aggregates_2 = []
for i in range(len(income_ranges) - 1):
aggregates_2.append(
Aggregate(
entity="household",
variable_name="hbai_household_net_income",
aggregate_function="count",
filter_variable_name="hbai_household_net_income",
filter_variable_geq=income_ranges[i],
filter_variable_leq=income_ranges[i + 1],
simulation=sim_2,
)
)
aggregates_2 = Aggregate.run(aggregates_2)
# Create the comparative bar chart
fig = go.Figure(
data=[
go.Bar(
name="Baseline",
x=[f"£{inc:,}" for inc in income_ranges[:-1]],
y=[agg.value for agg in aggregates],
),
go.Bar(
name="Reform",
x=[f"£{inc:,}" for inc in income_ranges[:-1]],
y=[agg.value for agg in aggregates_2],
),
]
)
# Apply formatting
fig = format_figure(
fig,
title="The distribution of household income in the UK",
x_title="Income range",
y_title="Number of households",
)
add_fonts()
figIn the above example, we created a Policy object, which represents a particular policy reform. This object contains a list of ParameterValue objects, which represent changes to specific parameters in the model. Here, we changed the personal allowance for income tax to £20,000.
Bringing in a database¶
Now, we can upload these objects to a database, and then load them back out again. This is useful for tracking different simulations and policy reforms over time.
from policyengine.database import Database
database = Database("postgresql://postgres:postgres@127.0.0.1:54322/postgres")
# These two lines are not usually needed, but you should use them the first time you set up a new database
database.reset() # Drop and recreate all tables
database.register_model_version(
policyengine_uk_latest_version
) # Add in the model, model version, parameters and baseline parameter values and variables.
database.set(uk_dataset)
database.set(policy)
for pv in policy.parameter_values:
database.set(pv)
database.set(sim)
database.set(sim_2)
for agg in aggregates:
database.set(agg)
for agg in aggregates_2:
database.set(agg)database.get(Policy, id=policy.id)Policy(id='26f30afa-77b9-4435-812c-071873e25400', name='Increase personal allowance to £20,000', description='A policy to increase the personal allowance for income tax to £20,000.', parameter_values=[], simulation_modifier=None, created_at=datetime.datetime(2025, 9, 20, 12, 36, 27, 162725), updated_at=datetime.datetime(2025, 9, 20, 12, 36, 27, 162729))