Simulations#

The policyengine_core.simulations module contains the definition of Simulation, the singular most important class in the repo. Simulations combine the logic of a country package with data, and can use the country logic and parameters to calculate the values of unknown variables. The class SimulationBuilder can create Simulations from a variety of inputs: JSON descriptions, or dataset arrays.

Simulation#

class policyengine_core.simulations.simulation.Simulation(tax_benefit_system: TaxBenefitSystem = None, populations: Dict[str, Population] = None, situation: dict = None, dataset: Union[str, Type[Dataset]] = None, reform: Reform = None, trace: bool = False)[source]#

Bases: object

Represents a simulation, and handles the calculation logic

apply_reform(reform: Union[tuple, Reform])[source]#

baseline: Simulation = None#: The baseline simulation, if this simulation is a reform.

build_from_dataset() → None[source]#: Build a simulation from a dataset.

build_from_populations(populations: Dict[str, Population]) → None[source]#

This method of initialisation requires the populations to be pre-initialised.

Parameters:: populations (Dict[str, Population]) – A dictionary of populations, indexed by entity key.

calculate(variable_name: str, period: Period = None, map_to: str = None, decode_enums: bool = False) → Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

Calculate variable_name for period.

Parameters:

variable_name (str) – The name of the variable to calculate.
period (Period) – The period to calculate the variable for.
map_to (str) – The name of the variable to map the result to. If None, the result is returned as is.
decode_enums (bool) – If True, the result is decoded from an array of integers to an array of strings.

Returns:

The calculated variable.

Return type:

ArrayLike

calculate_add(variable_name: str, period: Period = None, decode_enums: bool = False) → Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

calculate_dataframe(variable_names: List[str], period: Period = None, map_to: str = None) → DataFrame[source]#

Calculate variable_names for period.

Parameters:

variable_names (List[str]) – A list of variable names to calculate.
period (Period) – The period to calculate for.

Returns:

A dataframe containing the calculated variables.

Return type:

pd.DataFrame

calculate_divide(variable_name: str, period: Period = None, decode_enums: bool = False) → Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

calculate_output(variable_name: str, period: Period = None) → Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#: Calculate the value of a variable using the calculate_output attribute of the variable.

check_macro_cache(variable_name: str, period: str) → bool[source]#: Check if the variable is able to have cached value

clone(debug: bool = False, trace: bool = False, clone_tax_benefit_system: bool = True) → Simulation[source]#: Copy the simulation just enough to be able to run the copy without modifying the original simulation

create_shortcuts() → None[source]#

property data_storage_dir: str#: Temporary folder used to store intermediate calculation data in case the memory is saturated

datasets: List[Dataset] = []#: The list of datasets available for this simulation.

default_calculation_period: str = None#: The default period to calculate for if none is provided.

default_dataset: Dataset = None#: The default dataset class to use if none is provided.

default_input_period: str = None#: The default period to use when inputting variables.

default_role: str = None#: The default role to assign people to groups if none is provided.

default_tax_benefit_system: Type[TaxBenefitSystem] = None#: The default tax-benefit system class to use if none is provided.

default_tax_benefit_system_instance: TaxBenefitSystem = None#: The default tax-benefit system instance to use if none is provided. This requires that the tax-benefit system is initialised when importing a country package. This will slow down the import, but may speed up individual simulations.

delete_arrays(variable: str, period: Period = None) → None[source]#

Delete a variable’s value for a given period

Parameters:

variable – the variable to be set
period – the period for which the value should be deleted

Example:

>>> from policyengine_core.country_template import CountryTaxBenefitSystem
>>> simulation = Simulation(CountryTaxBenefitSystem())
>>> simulation.set_input('age', '2018-04', [12, 14])
>>> simulation.set_input('age', '2018-05', [13, 14])
>>> simulation.get_array('age', '2018-05')
array([13, 14], dtype=int32)
>>> simulation.delete_arrays('age', '2018-05')
>>> simulation.get_array('age', '2018-04')
array([12, 14], dtype=int32)
>>> simulation.get_array('age', '2018-05') is None
True
>>> simulation.set_input('age', '2018-05', [13, 14])
>>> simulation.delete_arrays('age')
>>> simulation.get_array('age', '2018-04') is None
True
>>> simulation.get_array('age', '2018-05') is None
True

derivative(variable: str, wrt: str, period: Period = None, delta: float = 1) → Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

Compute the derivative of a variable w.r.t another variable.

Parameters:

variable (str) – The variable to differentiate.
wrt (str) – The variable to differentiate with respect to.
period (Period) – The period for which to compute the derivative.
delta (float) – The infinitesimal to use for the derivative.

Returns:

The derivative.

Return type:

ArrayLike

describe_entities() → dict[source]#

extract_person(index: int = 0, exclude_entities: tuple = ('state',)) → dict[source]#

Extract a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).

Parameters:: index (int) – The index of the person to extract.
Returns:: A dictionary containing the person’s values.
Return type:: dict

get_array(variable_name: str, period: Period) → Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

Return the value of variable_name for period, if this value is alreay in the cache (if it has been set as an input or previously calculated).

Unlike calculate(), this method does not trigger calculations and does not use any formula.

get_branch(name: str = 'branch', clone_system: bool = False) → Simulation[source]#

Create a clone of this simulation, whose calculations are traced in the original.

Parameters:

name (str, optional) – Name of the branch. Defaults to “branch”.
clone_system (bool, optional) – Whether to clone the tax-benefit system. Use this if you’re changing policy parameters. Defaults to False.

Returns:

The cloned simulation.

Return type:

Simulation

get_entity(plural: str = None) → Entity[source]#

get_holder(variable_name: str) → Holder[source]#: Get the Holder associated with the variable variable_name for the simulation

get_known_periods(variable: str) → List[Period][source]#

Get a list variable’s known period, i.e. the periods where a value has been initialized and

Parameters:: variable – the variable to be set

Example:

>>> from policyengine_core.country_template import CountryTaxBenefitSystem
>>> simulation = Simulation(CountryTaxBenefitSystem())
>>> simulation.set_input('age', '2018-04', [12, 14])
>>> simulation.set_input('age', '2018-05', [13, 14])
>>> simulation.get_known_periods('age')
[Period((u'month', Instant((2018, 5, 1)), 1)), Period((u'month', Instant((2018, 4, 1)), 1))]

get_memory_usage(variables: List[str] = None) → dict[source]#: Get data about the virtual memory usage of the simulation

get_population(plural: str = None) → Population[source]#

get_variable_population(variable_name: str) → Population[source]#

invalidate_cache_entry(variable: str, period: Period) → None[source]#

invalidate_spiral_variables(variable: str) → None[source]#

is_over_dataset: bool = False#: Whether this simulation is built over a dataset.

link_to_entities_instances() → None[source]#

macro_cache_read: bool = False#: Whether to read from the macro cache.

macro_cache_write: bool = False#: Whether to write to the macro cache.

map_result(values: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], source_entity: str, target_entity: str, how: str = None)[source]#

Maps values from one entity to another.

Parameters:

arr (np.array) – The values in their original position.
source_entity (str) – The source entity key.
target_entity (str) – The target entity key.
how (str, optional) – A function to use when mapping. Defaults to None.

Raises:

ValueError – If an invalid (dis)aggregation function is passed.

Returns:

The mapped values.

Return type:

np.array

purge_cache_of_invalid_values() → None[source]#

sample_person() → dict[source]#

Sample a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).

Returns:: A dictionary containing the person’s values.
Return type:: dict

set_input(variable_name: str, period: Period, value: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) → None[source]#

Set a variable’s value for a given period

Parameters:

variable – the variable to be set
value – the input value for the variable
period – the period for which the value is setted

Example: >>> from policyengine_core.country_template import CountryTaxBenefitSystem >>> simulation = Simulation(CountryTaxBenefitSystem()) >>> simulation.set_input(‘age’, ‘2018-04’, [12, 14]) >>> simulation.get_array(‘age’, ‘2018-04’) array([12, 14], dtype=int32)

If a set_input property has been set for the variable, this method may accept inputs for periods not matching the definition_period of the variable. To read more about this, check the documentation.

start_instant: str = None#: The earliest data input instant of the simulation.

subsample(n=None, frac=None, seed=None, time_period=None) → Simulation[source]#

Quantize the simulation to a smaller size by sampling households.

Parameters:

n (int, optional) – The number of households to sample. Defaults to 10_000.
frac (float, optional) – The fraction of households to sample. Defaults to None.
seed (int, optional) – The key used to seed the random number generator. Defaults to the dataset name.
time_period (str, optional) – Sample households based on their weight in this time period. Defaults to the default calculation period.

Returns:

The quantized simulation.

Return type:

Simulation

to_input_dataframe() → DataFrame[source]#

Exports a DataFrame which can be loaded back to a new Simulation to reproduce the same results.

Returns:: The DataFrame containing the input values.
Return type:: pd.DataFrame

property trace: bool#

Microsimulation#

class policyengine_core.simulations.microsimulation.Microsimulation(tax_benefit_system: TaxBenefitSystem = None, populations: Dict[str, Population] = None, situation: dict = None, dataset: Union[str, Type[Dataset]] = None, reform: Reform = None, trace: bool = False)[source]#

Bases: Simulation

A Simulation whose entities use weights to represent larger populations.

apply_reform(reform: Union[tuple, Reform])#

baseline: Simulation = None#: The baseline simulation, if this simulation is a reform.

branches: Dict[str, Simulation]#

build_from_dataset() → None#: Build a simulation from a dataset.

build_from_populations(populations: Dict[str, Population]) → None#

This method of initialisation requires the populations to be pre-initialised.

Parameters:: populations (Dict[str, Population]) – A dictionary of populations, indexed by entity key.

calculate(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True, decode_enums: bool = True) → MicroSeries[source]#

Calculate variable_name for period.

Parameters:

variable_name (str) – The name of the variable to calculate.
period (Period) – The period to calculate the variable for.
map_to (str) – The name of the variable to map the result to. If None, the result is returned as is.
decode_enums (bool) – If True, the result is decoded from an array of integers to an array of strings.

Returns:

The calculated variable.

Return type:

ArrayLike

calculate_add(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True) → MicroSeries[source]#

calculate_dataframe(variable_names: list, period: Period = None, map_to: str = None, use_weights: bool = True) → MicroDataFrame[source]#

Calculate variable_names for period.

Parameters:

variable_names (List[str]) – A list of variable names to calculate.
period (Period) – The period to calculate for.

Returns:

A dataframe containing the calculated variables.

Return type:

pd.DataFrame

calculate_divide(variable_name: str, period: Period = None, map_to: str = None, use_weights: bool = True) → MicroSeries[source]#

calculate_output(variable_name: str, period: Period = None) → Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]#: Calculate the value of a variable using the calculate_output attribute of the variable.

check_macro_cache(variable_name: str, period: str) → bool#: Check if the variable is able to have cached value

clone(debug: bool = False, trace: bool = False, clone_tax_benefit_system: bool = True) → Simulation#: Copy the simulation just enough to be able to run the copy without modifying the original simulation

create_shortcuts() → None#

property data_storage_dir: str#: Temporary folder used to store intermediate calculation data in case the memory is saturated

dataset: Dataset#

datasets: List[Dataset] = []#: The list of datasets available for this simulation.

debug: bool#

default_calculation_period: str = None#: The default period to calculate for if none is provided.

default_dataset: Dataset = None#: The default dataset class to use if none is provided.

default_input_period: str = None#: The default period to use when inputting variables.

default_role: str = None#: The default role to assign people to groups if none is provided.

default_tax_benefit_system: Type['TaxBenefitSystem'] = None#: The default tax-benefit system class to use if none is provided.

default_tax_benefit_system_instance: TaxBenefitSystem = None#: The default tax-benefit system instance to use if none is provided. This requires that the tax-benefit system is initialised when importing a country package. This will slow down the import, but may speed up individual simulations.

delete_arrays(variable: str, period: Period = None) → None#

Delete a variable’s value for a given period

Parameters:

variable – the variable to be set
period – the period for which the value should be deleted

Example:

>>> from policyengine_core.country_template import CountryTaxBenefitSystem
>>> simulation = Simulation(CountryTaxBenefitSystem())
>>> simulation.set_input('age', '2018-04', [12, 14])
>>> simulation.set_input('age', '2018-05', [13, 14])
>>> simulation.get_array('age', '2018-05')
array([13, 14], dtype=int32)
>>> simulation.delete_arrays('age', '2018-05')
>>> simulation.get_array('age', '2018-04')
array([12, 14], dtype=int32)
>>> simulation.get_array('age', '2018-05') is None
True
>>> simulation.set_input('age', '2018-05', [13, 14])
>>> simulation.delete_arrays('age')
>>> simulation.get_array('age', '2018-04') is None
True
>>> simulation.get_array('age', '2018-05') is None
True

Compute the derivative of a variable w.r.t another variable.

Parameters:

variable (str) – The variable to differentiate.
wrt (str) – The variable to differentiate with respect to.
period (Period) – The period for which to compute the derivative.
delta (float) – The infinitesimal to use for the derivative.

Returns:

The derivative.

Return type:

ArrayLike

describe_entities() → dict#

extract_person(index: int = 0, exclude_entities: tuple = ('state',)) → dict#

Extract a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).

Parameters:: index (int) – The index of the person to extract.
Returns:: A dictionary containing the person’s values.
Return type:: dict

Return the value of variable_name for period, if this value is alreay in the cache (if it has been set as an input or previously calculated).

Unlike calculate(), this method does not trigger calculations and does not use any formula.

get_branch(name: str = 'branch', clone_system: bool = False) → Simulation#

Create a clone of this simulation, whose calculations are traced in the original.

Parameters:

name (str, optional) – Name of the branch. Defaults to “branch”.
clone_system (bool, optional) – Whether to clone the tax-benefit system. Use this if you’re changing policy parameters. Defaults to False.

Returns:

The cloned simulation.

Return type:

Simulation

get_entity(plural: str = None) → Entity#

get_holder(variable_name: str) → Holder#: Get the Holder associated with the variable variable_name for the simulation

get_known_periods(variable: str) → List[Period]#

Get a list variable’s known period, i.e. the periods where a value has been initialized and

Parameters:: variable – the variable to be set

Example:

>>> from policyengine_core.country_template import CountryTaxBenefitSystem
>>> simulation = Simulation(CountryTaxBenefitSystem())
>>> simulation.set_input('age', '2018-04', [12, 14])
>>> simulation.set_input('age', '2018-05', [13, 14])
>>> simulation.get_known_periods('age')
[Period((u'month', Instant((2018, 5, 1)), 1)), Period((u'month', Instant((2018, 4, 1)), 1))]

get_memory_usage(variables: List[str] = None) → dict#: Get data about the virtual memory usage of the simulation

get_population(plural: str = None) → Population#

get_variable_population(variable_name: str) → Population#

get_weights(variable_name: str, period: Period, map_to: str = None) → Sequence[T][source]#

invalidate_cache_entry(variable: str, period: Period) → None#

invalidate_spiral_variables(variable: str) → None#

is_over_dataset: bool = False#: Whether this simulation is built over a dataset.

link_to_entities_instances() → None#

macro_cache_read: bool = False#: Whether to read from the macro cache.

macro_cache_write: bool = False#: Whether to write to the macro cache.

Maps values from one entity to another.

Parameters:

arr (np.array) – The values in their original position.
source_entity (str) – The source entity key.
target_entity (str) – The target entity key.
how (str, optional) – A function to use when mapping. Defaults to None.

Raises:

ValueError – If an invalid (dis)aggregation function is passed.

Returns:

The mapped values.

Return type:

np.array

max_spiral_loops: int#

memory_config: MemoryConfig#

opt_out_cache: bool#

purge_cache_of_invalid_values() → None#

sample_person() → dict#

Sample a person from the simulation. Returns a situation JSON with their inputs (including their containing entities).

Returns:: A dictionary containing the person’s values.
Return type:: dict

Set a variable’s value for a given period

Parameters:

variable – the variable to be set
value – the input value for the variable
period – the period for which the value is setted

start_instant: str = None#: The earliest data input instant of the simulation.

subsample(n=None, frac=None, seed=None, time_period=None) → Simulation#

Quantize the simulation to a smaller size by sampling households.

Parameters:

n (int, optional) – The number of households to sample. Defaults to 10_000.
frac (float, optional) – The fraction of households to sample. Defaults to None.
seed (int, optional) – The key used to seed the random number generator. Defaults to the dataset name.
time_period (str, optional) – Sample households based on their weight in this time period. Defaults to the default calculation period.

Returns:

The quantized simulation.

Return type:

Simulation

to_input_dataframe() → DataFrame#

Exports a DataFrame which can be loaded back to a new Simulation to reproduce the same results.

Returns:: The DataFrame containing the input values.
Return type:: pd.DataFrame

property trace: bool#

tracer: SimpleTracer#

SimulationBuilder#

class policyengine_core.simulations.simulation_builder.SimulationBuilder[source]#

Bases: object

add_default_group_entity(persons_ids: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], entity: Entity) → None[source]#

add_group_entity(persons_plural: str, persons_ids: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], entity: Entity, instances_json: dict) → None[source]#: Add all instances of one of the model’s entities as described in instances_json.

add_parallel_axis(axis)[source]#

add_perpendicular_axis(axis)[source]#

add_person_entity(entity: Entity, instances_json: dict) → List[int][source]#: Add the simulation’s instances of the persons entity as described in instances_json.

add_variable_value(entity, variable, instance_index, instance_id, period_str, value)[source]#

build(tax_benefit_system: TaxBenefitSystem) → Simulation[source]#

build_default_simulation(tax_benefit_system: TaxBenefitSystem, count: int = 1, simulation: Simulation = None) → Simulation[source]#

Build a simulation where:

There are count persons
There are count instances of each group entity, containing one person
Every person has, in each entity, the first role

build_from_dict(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) → Simulation[source]#

Build a simulation from input_dict. Optionally overwrites an existing simulation.

This method uses build_from_entities if entities are fully specified, or build_from_variables if not.

build_from_entities(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) → Simulation[source]#

Build a simulation from a Python dict input_dict fully specifying entities.

Examples:

>>> simulation_builder.build_from_entities({
    'persons': {'Javier': { 'salary': {'2018-11': 2000}}},
    'households': {'household': {'parents': ['Javier']}}
    })

build_from_variables(tax_benefit_system: TaxBenefitSystem, input_dict: dict, simulation: Simulation = None) → Simulation[source]#

Build a simulation from a Python dict input_dict describing variables values without expliciting entities.

This method uses build_default_simulation to infer an entity structure

Example:

>>> simulation_builder.build_from_variables(
    {'salary': {'2016-10': 12000}}
    )

check_persons_to_allocate(persons_plural, entity_plural, persons_ids, person_id, entity_id, role_id, persons_to_allocate, index)[source]#

create_entities(tax_benefit_system: TaxBenefitSystem) → None[source]#

declare_entity(entity_singular: str, entity_ids: Iterable) → Population[source]#

declare_person_entity(person_singular: str, persons_ids: Iterable) → None[source]#

expand_axes()[source]#

explicit_singular_entities(tax_benefit_system: TaxBenefitSystem, input_dict: dict) → dict[source]#

Preprocess input_dict to explicit entities defined using the single-entity shortcut

Example:

>>> simulation_builder.explicit_singular_entities(
    {'persons': {'Javier': {}, }, 'household': {'parents': ['Javier']}}
    )
>>> {'persons': {'Javier': {}}, 'households': {'household': {'parents': ['Javier']}}

finalize_variables_init(population)[source]#

get_count(entity_name)[source]#

get_ids(entity_name)[source]#

get_input(variable: str, period_str: str) → Any[source]#

get_memberships(entity_name)[source]#

get_roles(entity_name)[source]#

get_variable_entity(variable_name)[source]#

init_variable_values(entity, instance_object, instance_id)[source]#

join_with_persons(group_population: GroupPopulation, persons_group_assignment: Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], roles: Iterable[str]) → None[source]#

nb_persons(entity_singular: str, role: Role = None) → Union[Buffer, _SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]][source]#

raise_period_mismatch(entity, json, e)[source]#

register_variable(variable_name, entity)[source]#

set_default_period(period_str: str) → None[source]#