Simulations#
The policyengine_core.simulations
module contains the definition of Simulation
, the singular most important class in the repo. Simulations
combine the logic of a country package with data, and can use the country logic and parameters to calculate the values of unknown variables. The class SimulationBuilder
can create Simulation
s from a variety of inputs: JSON descriptions, or dataset arrays.
Simulation#
- class policyengine_core.simulations.simulation.Simulation(tax_benefit_system: TaxBenefitSystem = None, populations: Dict[str, Population] = None, situation: dict = None, dataset: Union[str, Type[Dataset]] = None, reform: Reform = None, trace: bool = False)[source]#
Bases:
object
Represents a simulation, and handles the calculation logic
- baseline: Simulation = None#
The baseline simulation, if this simulation is a reform.
- build_from_populations(populations: Dict[str, Population]) None [source]#
This method of initialisation requires the populations to be pre-initialised.
- Parameters:
populations (Dict[str, Population]) – A dictionary of populations, indexed by entity key.
- calculate(variable_name: str, period: Period = None, map_to: str = None, decode_enums: bool = False) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] [source]#
Calculate
variable_name
forperiod
.- Parameters:
variable_name (str) – The name of the variable to calculate.
period (Period) – The period to calculate the variable for.
map_to (str) – The name of the variable to map the result to. If None, the result is returned as is.
decode_enums (bool) – If True, the result is decoded from an array of integers to an array of strings.
- Returns:
The calculated variable.
- Return type:
ArrayLike
- calculate_add(variable_name: str, period: Period = None, decode_enums: bool = False) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] [source]#
- calculate_dataframe(variable_names: List[str], period: Period = None, map_to: str = None) DataFrame [source]#
Calculate
variable_names
forperiod
.- Parameters:
variable_names (List[str]) – A list of variable names to calculate.
period (Period) – The period to calculate for.
- Returns:
A dataframe containing the calculated variables.
- Return type:
pd.DataFrame
- calculate_divide(variable_name: str, period: Period = None, decode_enums: bool = False) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] [source]#
- calculate_output(variable_name: str, period: Period = None) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] [source]#
Calculate the value of a variable using the
calculate_output
attribute of the variable.
- check_macro_cache(variable_name: str, period: str) bool [source]#
Check if the variable is able to have cached value
- clone(debug: bool = False, trace: bool = False, clone_tax_benefit_system: bool = True) Simulation [source]#
Copy the simulation just enough to be able to run the copy without modifying the original simulation
- property data_storage_dir: str#
Temporary folder used to store intermediate calculation data in case the memory is saturated
- default_calculation_period: str = None#
The default period to calculate for if none is provided.
- default_input_period: str = None#
The default period to use when inputting variables.
- default_role: str = None#
The default role to assign people to groups if none is provided.
- default_tax_benefit_system: Type[TaxBenefitSystem] = None#
The default tax-benefit system class to use if none is provided.
- default_tax_benefit_system_instance: TaxBenefitSystem = None#
The default tax-benefit system instance to use if none is provided. This requires that the tax-benefit system is initialised when importing a country package. This will slow down the import, but may speed up individual simulations.
- delete_arrays(variable: str, period: Period = None) None [source]#
Delete a variable’s value for a given period
- Parameters:
variable – the variable to be set
period – the period for which the value should be deleted
Example:
>>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input('age', '2018-04', [12, 14]) >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.get_array('age', '2018-05') array([13, 14], dtype=int32) >>> simulation.delete_arrays('age', '2018-05') >>> simulation.get_array('age', '2018-04') array([12, 14], dtype=int32) >>> simulation.get_array('age', '2018-05') is None True >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.delete_arrays('age') >>> simulation.get_array('age', '2018-04') is None True >>> simulation.get_array('age', '2018-05') is None True
- derivative(variable: str, wrt: str, period: Period = None, delta: float = 1) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] [source]#
Compute the derivative of a variable w.r.t another variable.
- Parameters:
variable (str) – The variable to differentiate.
wrt (str) – The variable to differentiate with respect to.
period (Period) – The period for which to compute the derivative.
delta (float) – The infinitesimal to use for the derivative.
- Returns:
The derivative.
- Return type:
ArrayLike
- extract_person(index: int = 0, exclude_entities: tuple = ('state',)) dict [source]#
Extract a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).
- Parameters:
index (int) – The index of the person to extract.
- Returns:
A dictionary containing the person’s values.
- Return type:
dict
- get_array(variable_name: str, period: Period) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] [source]#
Return the value of
variable_name
forperiod
, if this value is alreay in the cache (if it has been set as an input or previously calculated).Unlike
calculate()
, this method does not trigger calculations and does not use any formula.
- get_branch(name: str = 'branch', clone_system: bool = False) Simulation [source]#
Create a clone of this simulation, whose calculations are traced in the original.
- Parameters:
name (str, optional) – Name of the branch. Defaults to “branch”.
clone_system (bool, optional) – Whether to clone the tax-benefit system. Use this if you’re changing policy parameters. Defaults to False.
- Returns:
The cloned simulation.
- Return type:
- get_holder(variable_name: str) Holder [source]#
Get the
Holder
associated with the variablevariable_name
for the simulation
- get_known_periods(variable: str) List[Period] [source]#
Get a list variable’s known period, i.e. the periods where a value has been initialized and
- Parameters:
variable – the variable to be set
Example:
>>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input('age', '2018-04', [12, 14]) >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.get_known_periods('age') [Period((u'month', Instant((2018, 5, 1)), 1)), Period((u'month', Instant((2018, 4, 1)), 1))]
- get_memory_usage(variables: List[str] = None) dict [source]#
Get data about the virtual memory usage of the simulation
- get_population(plural: str = None) Population [source]#
- get_variable_population(variable_name: str) Population [source]#
- is_over_dataset: bool = False#
Whether this simulation is built over a dataset.
- macro_cache_read: bool = False#
Whether to read from the macro cache.
- macro_cache_write: bool = False#
Whether to write to the macro cache.
- map_result(values: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], source_entity: str, target_entity: str, how: str = None)[source]#
Maps values from one entity to another.
- Parameters:
arr (np.array) – The values in their original position.
source_entity (str) – The source entity key.
target_entity (str) – The target entity key.
how (str, optional) – A function to use when mapping. Defaults to None.
- Raises:
ValueError – If an invalid (dis)aggregation function is passed.
- Returns:
The mapped values.
- Return type:
np.array
- sample_person() dict [source]#
Sample a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).
- Returns:
A dictionary containing the person’s values.
- Return type:
dict
- set_input(variable_name: str, period: Period, value: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) None [source]#
Set a variable’s value for a given period
- Parameters:
variable – the variable to be set
value – the input value for the variable
period – the period for which the value is setted
Example: >>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input(‘age’, ‘2018-04’, [12, 14]) >>> simulation.get_array(‘age’, ‘2018-04’) array([12, 14], dtype=int32)
If a
set_input
property has been set for the variable, this method may accept inputs for periods not matching thedefinition_period
of the variable. To read more about this, check the documentation.
- start_instant: str = None#
The earliest data input instant of the simulation.
- subsample(n=None, frac=None, seed=None, time_period=None) Simulation [source]#
Quantize the simulation to a smaller size by sampling households.
- Parameters:
n (int, optional) – The number of households to sample. Defaults to 10_000.
frac (float, optional) – The fraction of households to sample. Defaults to None.
seed (int, optional) – The key used to seed the random number generator. Defaults to the dataset name.
time_period (str, optional) – Sample households based on their weight in this time period. Defaults to the default calculation period.
- Returns:
The quantized simulation.
- Return type:
- to_input_dataframe() DataFrame [source]#
Exports a DataFrame which can be loaded back to a new Simulation to reproduce the same results.
- Returns:
The DataFrame containing the input values.
- Return type:
pd.DataFrame
- property trace: bool#
Microsimulation#
- class policyengine_core.simulations.microsimulation.Microsimulation(tax_benefit_system: TaxBenefitSystem = None, populations: Dict[str, Population] = None, situation: dict = None, dataset: Union[str, Type[Dataset]] = None, reform: Reform = None, trace: bool = False)[source]#
Bases:
Simulation
A Simulation whose entities use weights to represent larger populations.
- baseline: Simulation = None#
The baseline simulation, if this simulation is a reform.
- branches: Dict[str, Simulation]#
- build_from_dataset() None #
Build a simulation from a dataset.
- build_from_populations(populations: Dict[str, Population]) None #
This method of initialisation requires the populations to be pre-initialised.
- Parameters:
populations (Dict[str, Population]) – A dictionary of populations, indexed by entity key.
- calculate(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True, decode_enums: bool = True) MicroSeries [source]#
Calculate
variable_name
forperiod
.- Parameters:
variable_name (str) – The name of the variable to calculate.
period (Period) – The period to calculate the variable for.
map_to (str) – The name of the variable to map the result to. If None, the result is returned as is.
decode_enums (bool) – If True, the result is decoded from an array of integers to an array of strings.
- Returns:
The calculated variable.
- Return type:
ArrayLike
- calculate_add(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True) MicroSeries [source]#
- calculate_dataframe(variable_names: list, period: Period = None, map_to: str = None, use_weights: bool = True) MicroDataFrame [source]#
Calculate
variable_names
forperiod
.- Parameters:
variable_names (List[str]) – A list of variable names to calculate.
period (Period) – The period to calculate for.
- Returns:
A dataframe containing the calculated variables.
- Return type:
pd.DataFrame
- calculate_divide(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True) MicroSeries [source]#
- calculate_output(variable_name: str, period: Period = None) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] #
Calculate the value of a variable using the
calculate_output
attribute of the variable.
- check_macro_cache(variable_name: str, period: str) bool #
Check if the variable is able to have cached value
- clone(debug: bool = False, trace: bool = False, clone_tax_benefit_system: bool = True) Simulation #
Copy the simulation just enough to be able to run the copy without modifying the original simulation
- create_shortcuts() None #
- property data_storage_dir: str#
Temporary folder used to store intermediate calculation data in case the memory is saturated
- debug: bool#
- default_calculation_period: str = None#
The default period to calculate for if none is provided.
- default_input_period: str = None#
The default period to use when inputting variables.
- default_role: str = None#
The default role to assign people to groups if none is provided.
- default_tax_benefit_system: Type['TaxBenefitSystem'] = None#
The default tax-benefit system class to use if none is provided.
- default_tax_benefit_system_instance: TaxBenefitSystem = None#
The default tax-benefit system instance to use if none is provided. This requires that the tax-benefit system is initialised when importing a country package. This will slow down the import, but may speed up individual simulations.
- delete_arrays(variable: str, period: Period = None) None #
Delete a variable’s value for a given period
- Parameters:
variable – the variable to be set
period – the period for which the value should be deleted
Example:
>>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input('age', '2018-04', [12, 14]) >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.get_array('age', '2018-05') array([13, 14], dtype=int32) >>> simulation.delete_arrays('age', '2018-05') >>> simulation.get_array('age', '2018-04') array([12, 14], dtype=int32) >>> simulation.get_array('age', '2018-05') is None True >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.delete_arrays('age') >>> simulation.get_array('age', '2018-04') is None True >>> simulation.get_array('age', '2018-05') is None True
- derivative(variable: str, wrt: str, period: Period = None, delta: float = 1) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] #
Compute the derivative of a variable w.r.t another variable.
- Parameters:
variable (str) – The variable to differentiate.
wrt (str) – The variable to differentiate with respect to.
period (Period) – The period for which to compute the derivative.
delta (float) – The infinitesimal to use for the derivative.
- Returns:
The derivative.
- Return type:
ArrayLike
- describe_entities() dict #
- extract_person(index: int = 0, exclude_entities: tuple = ('state',)) dict #
Extract a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).
- Parameters:
index (int) – The index of the person to extract.
- Returns:
A dictionary containing the person’s values.
- Return type:
dict
- get_array(variable_name: str, period: Period) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]] #
Return the value of
variable_name
forperiod
, if this value is alreay in the cache (if it has been set as an input or previously calculated).Unlike
calculate()
, this method does not trigger calculations and does not use any formula.
- get_branch(name: str = 'branch', clone_system: bool = False) Simulation #
Create a clone of this simulation, whose calculations are traced in the original.
- Parameters:
name (str, optional) – Name of the branch. Defaults to “branch”.
clone_system (bool, optional) – Whether to clone the tax-benefit system. Use this if you’re changing policy parameters. Defaults to False.
- Returns:
The cloned simulation.
- Return type:
- get_holder(variable_name: str) Holder #
Get the
Holder
associated with the variablevariable_name
for the simulation
- get_known_periods(variable: str) List[Period] #
Get a list variable’s known period, i.e. the periods where a value has been initialized and
- Parameters:
variable – the variable to be set
Example:
>>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input('age', '2018-04', [12, 14]) >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.get_known_periods('age') [Period((u'month', Instant((2018, 5, 1)), 1)), Period((u'month', Instant((2018, 4, 1)), 1))]
- get_memory_usage(variables: List[str] = None) dict #
Get data about the virtual memory usage of the simulation
- get_population(plural: str = None) Population #
- get_variable_population(variable_name: str) Population #
- invalidate_spiral_variables(variable: str) None #
- is_over_dataset: bool = False#
Whether this simulation is built over a dataset.
- link_to_entities_instances() None #
- macro_cache_read: bool = False#
Whether to read from the macro cache.
- macro_cache_write: bool = False#
Whether to write to the macro cache.
- map_result(values: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], source_entity: str, target_entity: str, how: str = None)#
Maps values from one entity to another.
- Parameters:
arr (np.array) – The values in their original position.
source_entity (str) – The source entity key.
target_entity (str) – The target entity key.
how (str, optional) – A function to use when mapping. Defaults to None.
- Raises:
ValueError – If an invalid (dis)aggregation function is passed.
- Returns:
The mapped values.
- Return type:
np.array
- max_spiral_loops: int#
- memory_config: MemoryConfig#
- opt_out_cache: bool#
- purge_cache_of_invalid_values() None #
- sample_person() dict #
Sample a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).
- Returns:
A dictionary containing the person’s values.
- Return type:
dict
- set_input(variable_name: str, period: Period, value: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) None #
Set a variable’s value for a given period
- Parameters:
variable – the variable to be set
value – the input value for the variable
period – the period for which the value is setted
Example: >>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input(‘age’, ‘2018-04’, [12, 14]) >>> simulation.get_array(‘age’, ‘2018-04’) array([12, 14], dtype=int32)
If a
set_input
property has been set for the variable, this method may accept inputs for periods not matching thedefinition_period
of the variable. To read more about this, check the documentation.
- start_instant: str = None#
The earliest data input instant of the simulation.
- subsample(n=None, frac=None, seed=None, time_period=None) Simulation #
Quantize the simulation to a smaller size by sampling households.
- Parameters:
n (int, optional) – The number of households to sample. Defaults to 10_000.
frac (float, optional) – The fraction of households to sample. Defaults to None.
seed (int, optional) – The key used to seed the random number generator. Defaults to the dataset name.
time_period (str, optional) – Sample households based on their weight in this time period. Defaults to the default calculation period.
- Returns:
The quantized simulation.
- Return type:
- to_input_dataframe() DataFrame #
Exports a DataFrame which can be loaded back to a new Simulation to reproduce the same results.
- Returns:
The DataFrame containing the input values.
- Return type:
pd.DataFrame
- property trace: bool#
- tracer: SimpleTracer#
SimulationBuilder#
- class policyengine_core.simulations.simulation_builder.SimulationBuilder[source]#
Bases:
object
- add_default_group_entity(persons_ids: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], entity: Entity) None [source]#
- add_group_entity(persons_plural: str, persons_ids: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], entity: Entity, instances_json: dict) None [source]#
Add all instances of one of the model’s entities as described in
instances_json
.
- add_person_entity(entity: Entity, instances_json: dict) List[int] [source]#
Add the simulation’s instances of the persons entity as described in
instances_json
.
- build(tax_benefit_system: TaxBenefitSystem) Simulation [source]#
- build_default_simulation(tax_benefit_system: TaxBenefitSystem, count: int = 1, simulation: Simulation = None) Simulation [source]#
- Build a simulation where:
There are
count
personsThere are
count
instances of each group entity, containing one personEvery person has, in each entity, the first role
- build_from_dict(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) Simulation [source]#
Build a simulation from
input_dict
. Optionally overwrites an existing simulation.This method uses
build_from_entities
if entities are fully specified, orbuild_from_variables
if not.
- build_from_entities(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) Simulation [source]#
Build a simulation from a Python dict
input_dict
fully specifying entities.Examples:
>>> simulation_builder.build_from_entities({ 'persons': {'Javier': { 'salary': {'2018-11': 2000}}}, 'households': {'household': {'parents': ['Javier']}} })
- build_from_variables(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) Simulation [source]#
Build a simulation from a Python dict
input_dict
describing variables values without expliciting entities.This method uses
build_default_simulation
to infer an entity structureExample:
>>> simulation_builder.build_from_variables( {'salary': {'2016-10': 12000}} )
- check_persons_to_allocate(persons_plural, entity_plural, persons_ids, person_id, entity_id, role_id, persons_to_allocate, index)[source]#
- create_entities(tax_benefit_system: TaxBenefitSystem) None [source]#
- declare_entity(entity_singular: str, entity_ids: Iterable) Population [source]#
- explicit_singular_entities(tax_benefit_system: TaxBenefitSystem, input_dict: dict) dict [source]#
Preprocess
input_dict
to explicit entities defined using the single-entity shortcutExample:
>>> simulation_builder.explicit_singular_entities( {'persons': {'Javier': {}, }, 'household': {'parents': ['Javier']}} ) >>> {'persons': {'Javier': {}}, 'households': {'household': {'parents': ['Javier']}}
- join_with_persons(group_population: GroupPopulation, persons_group_assignment: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], roles: Iterable[str]) None [source]#