Simulations#
The policyengine_core.simulations module contains the definition of Simulation, the singular most important class in the repo. Simulations combine the logic of a country package with data, and can use the country logic and parameters to calculate the values of unknown variables. The class SimulationBuilder can create Simulations from a variety of inputs: JSON descriptions, or dataset arrays.
Simulation#
- class policyengine_core.simulations.simulation.Simulation(tax_benefit_system: TaxBenefitSystem = None, populations: Dict[str, Population] = None, situation: dict = None, dataset: Union[str, Type[Dataset]] = None, reform: Reform = None, trace: bool = False)[source]#
Bases:
objectRepresents a simulation, and handles the calculation logic
- baseline: Simulation = None#
The baseline simulation, if this simulation is a reform.
- build_from_populations(populations: Dict[str, Population]) None[source]#
This method of initialisation requires the populations to be pre-initialised.
- Parameters:
populations (Dict[str, Population]) – A dictionary of populations, indexed by entity key.
- calculate(variable_name: str, period: Period = None, map_to: str = None, decode_enums: bool = False) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#
Calculate
variable_nameforperiod.- Parameters:
variable_name (str) – The name of the variable to calculate.
period (Period) – The period to calculate the variable for.
map_to (str) – The name of the variable to map the result to. If None, the result is returned as is.
decode_enums (bool) – If True, the result is decoded from an array of integers to an array of strings.
- Returns:
The calculated variable.
- Return type:
ArrayLike
- calculate_add(variable_name: str, period: Period = None, decode_enums: bool = False) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#
- calculate_dataframe(variable_names: List[str], period: Period = None, map_to: str = None) DataFrame[source]#
Calculate
variable_namesforperiod.- Parameters:
variable_names (List[str]) – A list of variable names to calculate.
period (Period) – The period to calculate for.
- Returns:
A dataframe containing the calculated variables.
- Return type:
pd.DataFrame
- calculate_divide(variable_name: str, period: Period = None, decode_enums: bool = False) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#
- calculate_output(variable_name: str, period: Period = None) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#
Calculate the value of a variable using the
calculate_outputattribute of the variable.
- check_macro_cache(variable_name: str, period: str) bool[source]#
Check if the variable is able to have cached value
- clone(debug: bool = False, trace: bool = False, clone_tax_benefit_system: bool = True) Simulation[source]#
Copy the simulation just enough to be able to run the copy without modifying the original simulation
- property data_storage_dir: str#
Temporary folder used to store intermediate calculation data in case the memory is saturated
- default_calculation_period: str = None#
The default period to calculate for if none is provided.
- default_input_period: str = None#
The default period to use when inputting variables.
- default_role: str = None#
The default role to assign people to groups if none is provided.
- default_tax_benefit_system: Type[TaxBenefitSystem] = None#
The default tax-benefit system class to use if none is provided.
- default_tax_benefit_system_instance: TaxBenefitSystem = None#
The default tax-benefit system instance to use if none is provided. This requires that the tax-benefit system is initialised when importing a country package. This will slow down the import, but may speed up individual simulations.
- delete_arrays(variable: str, period: Period = None) None[source]#
Delete a variable’s value for a given period
- Parameters:
variable – the variable to be set
period – the period for which the value should be deleted
Example:
>>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input('age', '2018-04', [12, 14]) >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.get_array('age', '2018-05') array([13, 14], dtype=int32) >>> simulation.delete_arrays('age', '2018-05') >>> simulation.get_array('age', '2018-04') array([12, 14], dtype=int32) >>> simulation.get_array('age', '2018-05') is None True >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.delete_arrays('age') >>> simulation.get_array('age', '2018-04') is None True >>> simulation.get_array('age', '2018-05') is None True
- derivative(variable: str, wrt: str, period: Period = None, delta: float = 1) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#
Compute the derivative of a variable w.r.t another variable.
- Parameters:
variable (str) – The variable to differentiate.
wrt (str) – The variable to differentiate with respect to.
period (Period) – The period for which to compute the derivative.
delta (float) – The infinitesimal to use for the derivative.
- Returns:
The derivative.
- Return type:
ArrayLike
- extract_person(index: int = 0, exclude_entities: tuple = ('state',)) dict[source]#
Extract a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).
- Parameters:
index (int) – The index of the person to extract.
- Returns:
A dictionary containing the person’s values.
- Return type:
dict
- get_array(variable_name: str, period: Period) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#
Return the value of
variable_nameforperiod, if this value is alreay in the cache (if it has been set as an input or previously calculated).Unlike
calculate(), this method does not trigger calculations and does not use any formula.
- get_branch(name: str = 'branch', clone_system: bool = False) Simulation[source]#
Create a clone of this simulation, whose calculations are traced in the original.
- Parameters:
name (str, optional) – Name of the branch. Defaults to “branch”.
clone_system (bool, optional) – Whether to clone the tax-benefit system. Use this if you’re changing policy parameters. Defaults to False.
- Returns:
The cloned simulation.
- Return type:
- get_holder(variable_name: str) Holder[source]#
Get the
Holderassociated with the variablevariable_namefor the simulation
- get_known_periods(variable: str) List[Period][source]#
Get a list variable’s known period, i.e. the periods where a value has been initialized and
- Parameters:
variable – the variable to be set
Example:
>>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input('age', '2018-04', [12, 14]) >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.get_known_periods('age') [Period((u'month', Instant((2018, 5, 1)), 1)), Period((u'month', Instant((2018, 4, 1)), 1))]
- get_memory_usage(variables: List[str] = None) dict[source]#
Get data about the virtual memory usage of the simulation
- get_population(plural: str = None) Population[source]#
- get_variable_population(variable_name: str) Population[source]#
- is_over_dataset: bool = False#
Whether this simulation is built over a dataset.
- macro_cache_read: bool = False#
Whether to read from the macro cache.
- macro_cache_write: bool = False#
Whether to write to the macro cache.
- map_result(values: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], source_entity: str, target_entity: str, how: str = None)[source]#
Maps values from one entity to another.
- Parameters:
arr (np.array) – The values in their original position.
source_entity (str) – The source entity key.
target_entity (str) – The target entity key.
how (str, optional) – A function to use when mapping. Defaults to None.
- Raises:
ValueError – If an invalid (dis)aggregation function is passed.
- Returns:
The mapped values.
- Return type:
np.array
- sample_person() dict[source]#
Sample a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).
- Returns:
A dictionary containing the person’s values.
- Return type:
dict
- set_input(variable_name: str, period: Period, value: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) None[source]#
Set a variable’s value for a given period
- Parameters:
variable – the variable to be set
value – the input value for the variable
period – the period for which the value is setted
Example: >>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input(‘age’, ‘2018-04’, [12, 14]) >>> simulation.get_array(‘age’, ‘2018-04’) array([12, 14], dtype=int32)
If a
set_inputproperty has been set for the variable, this method may accept inputs for periods not matching thedefinition_periodof the variable. To read more about this, check the documentation.
- start_instant: str = None#
The earliest data input instant of the simulation.
- subsample(n=None, frac=None, seed=None, time_period=None) Simulation[source]#
Quantize the simulation to a smaller size by sampling households.
- Parameters:
n (int, optional) – The number of households to sample. Defaults to 10_000.
frac (float, optional) – The fraction of households to sample. Defaults to None.
seed (int, optional) – The key used to seed the random number generator. Defaults to the dataset name.
time_period (str, optional) – Sample households based on their weight in this time period. Defaults to the default calculation period.
- Returns:
The quantized simulation.
- Return type:
- to_input_dataframe() DataFrame[source]#
Exports a DataFrame which can be loaded back to a new Simulation to reproduce the same results.
- Returns:
The DataFrame containing the input values.
- Return type:
pd.DataFrame
- property trace: bool#
Microsimulation#
- class policyengine_core.simulations.microsimulation.Microsimulation(tax_benefit_system: TaxBenefitSystem = None, populations: Dict[str, Population] = None, situation: dict = None, dataset: Union[str, Type[Dataset]] = None, reform: Reform = None, trace: bool = False)[source]#
Bases:
SimulationA Simulation whose entities use weights to represent larger populations.
- baseline: Simulation = None#
The baseline simulation, if this simulation is a reform.
- branches: Dict[str, Simulation]#
- build_from_dataset() None#
Build a simulation from a dataset.
- build_from_populations(populations: Dict[str, Population]) None#
This method of initialisation requires the populations to be pre-initialised.
- Parameters:
populations (Dict[str, Population]) – A dictionary of populations, indexed by entity key.
- calculate(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True, decode_enums: bool = True) MicroSeries[source]#
Calculate
variable_nameforperiod.- Parameters:
variable_name (str) – The name of the variable to calculate.
period (Period) – The period to calculate the variable for.
map_to (str) – The name of the variable to map the result to. If None, the result is returned as is.
decode_enums (bool) – If True, the result is decoded from an array of integers to an array of strings.
- Returns:
The calculated variable.
- Return type:
ArrayLike
- calculate_add(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True) MicroSeries[source]#
- calculate_dataframe(variable_names: list, period: Period = None, map_to: str = None, use_weights: bool = True) MicroDataFrame[source]#
Calculate
variable_namesforperiod.- Parameters:
variable_names (List[str]) – A list of variable names to calculate.
period (Period) – The period to calculate for.
- Returns:
A dataframe containing the calculated variables.
- Return type:
pd.DataFrame
- calculate_divide(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True) MicroSeries[source]#
- calculate_output(variable_name: str, period: Period = None) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]#
Calculate the value of a variable using the
calculate_outputattribute of the variable.
- check_macro_cache(variable_name: str, period: str) bool#
Check if the variable is able to have cached value
- clone(debug: bool = False, trace: bool = False, clone_tax_benefit_system: bool = True) Simulation#
Copy the simulation just enough to be able to run the copy without modifying the original simulation
- create_shortcuts() None#
- property data_storage_dir: str#
Temporary folder used to store intermediate calculation data in case the memory is saturated
- debug: bool#
- default_calculation_period: str = None#
The default period to calculate for if none is provided.
- default_input_period: str = None#
The default period to use when inputting variables.
- default_role: str = None#
The default role to assign people to groups if none is provided.
- default_tax_benefit_system: Type['TaxBenefitSystem'] = None#
The default tax-benefit system class to use if none is provided.
- default_tax_benefit_system_instance: TaxBenefitSystem = None#
The default tax-benefit system instance to use if none is provided. This requires that the tax-benefit system is initialised when importing a country package. This will slow down the import, but may speed up individual simulations.
- delete_arrays(variable: str, period: Period = None) None#
Delete a variable’s value for a given period
- Parameters:
variable – the variable to be set
period – the period for which the value should be deleted
Example:
>>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input('age', '2018-04', [12, 14]) >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.get_array('age', '2018-05') array([13, 14], dtype=int32) >>> simulation.delete_arrays('age', '2018-05') >>> simulation.get_array('age', '2018-04') array([12, 14], dtype=int32) >>> simulation.get_array('age', '2018-05') is None True >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.delete_arrays('age') >>> simulation.get_array('age', '2018-04') is None True >>> simulation.get_array('age', '2018-05') is None True
- derivative(variable: str, wrt: str, period: Period = None, delta: float = 1) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]#
Compute the derivative of a variable w.r.t another variable.
- Parameters:
variable (str) – The variable to differentiate.
wrt (str) – The variable to differentiate with respect to.
period (Period) – The period for which to compute the derivative.
delta (float) – The infinitesimal to use for the derivative.
- Returns:
The derivative.
- Return type:
ArrayLike
- describe_entities() dict#
- extract_person(index: int = 0, exclude_entities: tuple = ('state',)) dict#
Extract a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).
- Parameters:
index (int) – The index of the person to extract.
- Returns:
A dictionary containing the person’s values.
- Return type:
dict
- get_array(variable_name: str, period: Period) Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]#
Return the value of
variable_nameforperiod, if this value is alreay in the cache (if it has been set as an input or previously calculated).Unlike
calculate(), this method does not trigger calculations and does not use any formula.
- get_branch(name: str = 'branch', clone_system: bool = False) Simulation#
Create a clone of this simulation, whose calculations are traced in the original.
- Parameters:
name (str, optional) – Name of the branch. Defaults to “branch”.
clone_system (bool, optional) – Whether to clone the tax-benefit system. Use this if you’re changing policy parameters. Defaults to False.
- Returns:
The cloned simulation.
- Return type:
- get_holder(variable_name: str) Holder#
Get the
Holderassociated with the variablevariable_namefor the simulation
- get_known_periods(variable: str) List[Period]#
Get a list variable’s known period, i.e. the periods where a value has been initialized and
- Parameters:
variable – the variable to be set
Example:
>>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input('age', '2018-04', [12, 14]) >>> simulation.set_input('age', '2018-05', [13, 14]) >>> simulation.get_known_periods('age') [Period((u'month', Instant((2018, 5, 1)), 1)), Period((u'month', Instant((2018, 4, 1)), 1))]
- get_memory_usage(variables: List[str] = None) dict#
Get data about the virtual memory usage of the simulation
- get_population(plural: str = None) Population#
- get_variable_population(variable_name: str) Population#
- invalidate_spiral_variables(variable: str) None#
- is_over_dataset: bool = False#
Whether this simulation is built over a dataset.
- link_to_entities_instances() None#
- macro_cache_read: bool = False#
Whether to read from the macro cache.
- macro_cache_write: bool = False#
Whether to write to the macro cache.
- map_result(values: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], source_entity: str, target_entity: str, how: str = None)#
Maps values from one entity to another.
- Parameters:
arr (np.array) – The values in their original position.
source_entity (str) – The source entity key.
target_entity (str) – The target entity key.
how (str, optional) – A function to use when mapping. Defaults to None.
- Raises:
ValueError – If an invalid (dis)aggregation function is passed.
- Returns:
The mapped values.
- Return type:
np.array
- max_spiral_loops: int#
- memory_config: MemoryConfig#
- opt_out_cache: bool#
- purge_cache_of_invalid_values() None#
- sample_person() dict#
Sample a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).
- Returns:
A dictionary containing the person’s values.
- Return type:
dict
- set_input(variable_name: str, period: Period, value: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) None#
Set a variable’s value for a given period
- Parameters:
variable – the variable to be set
value – the input value for the variable
period – the period for which the value is setted
Example: >>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input(‘age’, ‘2018-04’, [12, 14]) >>> simulation.get_array(‘age’, ‘2018-04’) array([12, 14], dtype=int32)
If a
set_inputproperty has been set for the variable, this method may accept inputs for periods not matching thedefinition_periodof the variable. To read more about this, check the documentation.
- start_instant: str = None#
The earliest data input instant of the simulation.
- subsample(n=None, frac=None, seed=None, time_period=None) Simulation#
Quantize the simulation to a smaller size by sampling households.
- Parameters:
n (int, optional) – The number of households to sample. Defaults to 10_000.
frac (float, optional) – The fraction of households to sample. Defaults to None.
seed (int, optional) – The key used to seed the random number generator. Defaults to the dataset name.
time_period (str, optional) – Sample households based on their weight in this time period. Defaults to the default calculation period.
- Returns:
The quantized simulation.
- Return type:
- to_input_dataframe() DataFrame#
Exports a DataFrame which can be loaded back to a new Simulation to reproduce the same results.
- Returns:
The DataFrame containing the input values.
- Return type:
pd.DataFrame
- property trace: bool#
- tracer: SimpleTracer#
SimulationBuilder#
- class policyengine_core.simulations.simulation_builder.SimulationBuilder[source]#
Bases:
object- add_default_group_entity(persons_ids: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], entity: Entity) None[source]#
- add_group_entity(persons_plural: str, persons_ids: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], entity: Entity, instances_json: dict) None[source]#
Add all instances of one of the model’s entities as described in
instances_json.
- add_person_entity(entity: Entity, instances_json: dict) List[int][source]#
Add the simulation’s instances of the persons entity as described in
instances_json.
- build(tax_benefit_system: TaxBenefitSystem) Simulation[source]#
- build_default_simulation(tax_benefit_system: TaxBenefitSystem, count: int = 1, simulation: Simulation = None) Simulation[source]#
- Build a simulation where:
There are
countpersonsThere are
countinstances of each group entity, containing one personEvery person has, in each entity, the first role
- build_from_dict(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) Simulation[source]#
Build a simulation from
input_dict. Optionally overwrites an existing simulation.This method uses
build_from_entitiesif entities are fully specified, orbuild_from_variablesif not.
- build_from_entities(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) Simulation[source]#
Build a simulation from a Python dict
input_dictfully specifying entities.Examples:
>>> simulation_builder.build_from_entities({ 'persons': {'Javier': { 'salary': {'2018-11': 2000}}}, 'households': {'household': {'parents': ['Javier']}} })
- build_from_variables(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) Simulation[source]#
Build a simulation from a Python dict
input_dictdescribing variables values without expliciting entities.This method uses
build_default_simulationto infer an entity structureExample:
>>> simulation_builder.build_from_variables( {'salary': {'2016-10': 12000}} )
- check_persons_to_allocate(persons_plural, entity_plural, persons_ids, person_id, entity_id, role_id, persons_to_allocate, index)[source]#
- create_entities(tax_benefit_system: TaxBenefitSystem) None[source]#
- declare_entity(entity_singular: str, entity_ids: Iterable) Population[source]#
- explicit_singular_entities(tax_benefit_system: TaxBenefitSystem, input_dict: dict) dict[source]#
Preprocess
input_dictto explicit entities defined using the single-entity shortcutExample:
>>> simulation_builder.explicit_singular_entities( {'persons': {'Javier': {}, }, 'household': {'parents': ['Javier']}} ) >>> {'persons': {'Javier': {}}, 'households': {'household': {'parents': ['Javier']}}
- join_with_persons(group_population: GroupPopulation, persons_group_assignment: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], roles: Iterable[str]) None[source]#