{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Methodology\n", "\n", "In this page, we'll walk through step-by-step the process we use to create PolicyEngine's dataset.\n", "* **Family Resources Survey**: we'll start with the FRS, looking at close it is to reality. To take an actual concrete starting point, we'll assume benefit payments are as reported in the survey.\n", "* **FRS (+ tax-benefit model)**: we need to make sure that our tax-benefit model isn't doing anything unexpected. If we turn on simulation of taxes and benefits, does anything look unexpected? If not- great, we've turned a household survey into something useful for policy analysis. We'll also take stock here of what we're missing from reality.\n", "* **Wealth and consumption**: the most obvious thing we're missing is wealth and consumption. We'll impute those here.\n", "* **Fine-tuning**: we'll use reweighting to make some final adjustments to make sure our dataset is as close to reality as possible.\n", "* **Validation**: we'll compare our dataset to the UK's official statistics, and see how we're doing.\n", "\n", "## The Family Resources Survey\n", "\n", "First, we'll start with the FRS as-is. Skipping over the technical details for how we actually feed this data into the model (you can find that in `policyengine_uk_data/datasets/frs/`), we need to decide how we're actually going to measure 'close to reality'. We need to define an objective function, and if our final dataset improves it a lot, we can call that a success.\n", " \n", "We'll define this objective function using public statistics that we can generally agree are of high importance to describing the UK household sector. These are things that, if the survey gets them wrong, we'd expect to cause inaccuracy in our model, and if we get them all mostly right, we'd expect to have confidence that it's a pretty accurate tax-benefit model.\n", " \n", "For this, we've gone through and collected:\n", " \n", "* **Demographics** from the ONS: ten-year age band populations by region of the UK, national family type populations and national tenure type populations.\n", "* **Incomes** from HMRC: for each of 14 total income bands, the number of people with income and combined income of the seven income types that account for over 99% of total income: employment, self-employment, State Pension, private pension, property, savings interest, and dividends.\n", "* **Tax-benefit programs** from the DWP and OBR: statistics on caseloads, expenditures and revenues for all 20 major tax-benefit programs.\n", " \n", "Let's first take a look at the initial FRS, our starting point, and what is generally considered the best dataset to use (mostly completely un-modified across major tax-benefit models), and see how close it is to reproducing these statistics.\n", " \n", "The table below shows the result, and: it's really quite bad! Look at the relative errors.\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "
\n", "
\n", "This is the init_notebook_mode cell from ITables v2.2.1
\n", "(you should not see this message - is your notebook trusted?)\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "
nameestimatetargeterrorabs_errorrel_errorabs_rel_errortype
\n", "\n", "
\n", "Loading ITables v2.2.1 from the init_notebook_mode cell...\n", "(need help?)
\n", "\n" ], "text/plain": [ " name estimate \\\n", "0 obr/attendance_allowance 3.733651e+09 \n", "1 obr/carers_allowance 2.882300e+09 \n", "2 obr/dla 4.081635e+09 \n", "3 obr/esa 7.209292e+09 \n", "4 obr/esa_contrib 1.777680e+09 \n", ".. ... ... \n", "330 hmrc/property_income_count_income_band_13_12_5... 1.632946e+06 \n", "331 hmrc/savings_interest_income_income_band_13_12... 4.510452e+09 \n", "332 hmrc/savings_interest_income_count_income_band... 1.416795e+07 \n", "333 hmrc/dividend_income_income_band_13_12_570.0_t... 7.626864e+09 \n", "334 hmrc/dividend_income_count_income_band_13_12_5... 2.521638e+06 \n", "\n", " target error abs_error rel_error abs_rel_error \\\n", "0 5.700000e+09 -1.966349e+09 1.966349e+09 -0.344974 0.344974 \n", "1 3.300000e+09 -4.176997e+08 4.176997e+08 -0.126576 0.126576 \n", "2 6.000000e+09 -1.918365e+09 1.918365e+09 -0.319728 0.319728 \n", "3 1.210000e+10 -4.890708e+09 4.890708e+09 -0.404191 0.404191 \n", "4 4.500000e+09 -2.722320e+09 2.722320e+09 -0.604960 0.604960 \n", ".. ... ... ... ... ... \n", "330 2.320606e+06 -6.876600e+05 6.876600e+05 -0.296328 0.296328 \n", "331 2.968650e+09 1.541802e+09 1.541802e+09 0.519361 0.519361 \n", "332 1.154405e+07 2.623900e+06 2.623900e+06 0.227295 0.227295 \n", "333 8.579240e+10 -7.816554e+10 7.816554e+10 -0.911101 0.911101 \n", "334 3.936170e+06 -1.414532e+06 1.414532e+06 -0.359368 0.359368 \n", "\n", " type \n", "0 Tax-benefit \n", "1 Tax-benefit \n", "2 Tax-benefit \n", "3 Tax-benefit \n", "4 Tax-benefit \n", ".. ... \n", "330 Income \n", "331 Income \n", "332 Income \n", "333 Income \n", "334 Income \n", "\n", "[335 rows x 8 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from policyengine_uk_data.utils import get_loss_results\n", "from policyengine_uk_data import (\n", " FRS_2022_23,\n", " ExtendedFRS_2022_23,\n", " EnhancedFRS_2022_23,\n", " ReweightedFRS_2022_23,\n", ")\n", "from policyengine_core.model_api import Reform\n", "import plotly.express as px\n", "import pandas as pd\n", "from itables import init_notebook_mode\n", "import itables.options as opt\n", "opt.maxBytes = \"1MB\"\n", "\n", "init_notebook_mode(all_interactive=True)\n", "\n", "def get_loss(dataset, reform, time_period):\n", " loss_results = get_loss_results(dataset, time_period, reform)\n", "\n", " def get_type(name):\n", " if \"hmrc\" in name:\n", " return \"Income\"\n", " if \"ons\" in name:\n", " return \"Demographics\"\n", " if \"obr\" in name:\n", " return \"Tax-benefit\"\n", " return \"Other\"\n", "\n", " loss_results[\"type\"] = loss_results.name.apply(get_type)\n", " return loss_results\n", "\n", "reported_benefits = Reform.from_dict(\n", " {\n", " \"gov.contrib.policyengine.disable_simulated_benefits\": True,\n", " }\n", ")\n", "loss_results = get_loss(\n", " dataset=FRS_2022_23, reform=reported_benefits, time_period=2022\n", ").copy()\n", "\n", "loss_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's easier to understand 'what kind of bad' this is by splitting out the statistics into those three categories. Here's a histogram of the absolute relative errors." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from policyengine.utils.charts import *\n", "add_fonts()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "alignmentgroup": "True", "bingroup": "x", "hovertemplate": "type=Tax-benefit
abs_rel_error=%{x}
count=%{y}", "legendgroup": "Tax-benefit", "marker": { "color": "#4C78A8", "pattern": { "shape": "" } }, "name": "Tax-benefit", "nbinsx": 25, "offsetgroup": "Tax-benefit", "orientation": "v", "showlegend": true, "type": "histogram", "x": [ 0.34497351753614997, 0.1265756757462519, 0.31972754384596763, 0.4041907147617359, 0.6049600312499864, 0.2853141457884297, 0.3274135549752128, 0.11088173329773643, 0.4191183193706456, 0.3659085714285714, 0.3340608333333333, 0.30739916666666667, 0.39464529411764704, 0.1859068, 0.2991657142857143, 0.07785413793103449, 0.7961391666666666, 0.09960225806451613, 0.24767184210526316, 1, 0.14347732747247105, 0.09131168474999743, 0.0637177919773684, 0.43419216978012365, 0.13124905215267785, 1.03247772, 1, 0.07377823377476285, 0.15754280884928384, 0.37521868912226813, 1, 0.08586889421155641, 0.3451732044348676, 0.06318044405405406, 0.27228357654375696, 0.7800295456698996, 0.1324254054072148, 1, 0.00784125, 0.30910951526032315 ], "xaxis": "x", "yaxis": "y" }, { "alignmentgroup": "True", "bingroup": "x", "hovertemplate": "type=Demographics
abs_rel_error=%{x}
count=%{y}", "legendgroup": "Demographics", "marker": { "color": "#F58518", "pattern": { "shape": "" } }, "name": "Demographics", "nbinsx": 25, "offsetgroup": "Demographics", "orientation": "v", "showlegend": true, "type": "histogram", "x": [ 0.025559859154929577, 0.00040575079872204474, 0.013524390243902439, 0.0024970760233918128, 0.020883870967741934, 0.00953887399463807, 0.01688988095238095, 0.0013891050583657587, 0.09569503546099291, 0.022206303724928367, 0.042, 0.04931931166347992, 0.07402355808285946, 0.023370703764320787, 0.009720430107526882, 0.007314975845410628, 0.007434731934731935, 0.07962282398452611, 0.05080597014925373, 0.07791171477079796, 0.013517355371900826, 0.03507009345794392, 0.022773489932885905, 0.006982583454281568, 0.005142348754448398, 0.008465367965367966, 0.08575889328063241, 0.04146524822695036, 0.05105740987983979, 0.05760998650472335, 0.00586625, 0.022960110041265476, 0.015024721878862793, 0.017076804915514592, 0.005548387096774194, 0.08215705128205128, 0.021647909967845658, 0.005761549925484352, 0.020465811965811966, 0.02741825613079019, 0.015170212765957447, 0.009228116710875332, 0.0043712, 0.006776209677419355, 0.08166308243727599, 0.034816939890710386, 0.043408847184986596, 0.06766016713091921, 0.05102576112412178, 0.02469889840881273, 0.01063169897377423, 0.017964788732394366, 0.01304873949579832, 0.06548, 0.1205443279313632, 0.023599043062200956, 0.0464818763326226, 0.030159763313609467, 0.021877777777777777, 0.010556370302474794, 0.006296346414073071, 0.09251804670912951, 0.01422340425531915, 0.018568965517241378, 0.07260675883256529, 0.043858646616541354, 0.04662272089761571, 0.03446005917159763, 0.009982565379825654, 0.013533333333333333, 0.019138662316476346, 0.09416155988857938, 0.048844036697247704, 0.0030576923076923077, 0.009002680965147452, 0.004681122448979592, 0.000664804469273743, 0.001471395881006865, 0.015466321243523316, 0.016930817610062893, 0.08096022727272727, 0.01151509433962264, 0.013593023255813954, 0.026297496318114875, 0.03878186968838527, 0.015093514328808446, 0.03539146800501882, 0.05295369030390738, 0.002517786561264822, 0.0013065693430656935, 0.028093220338983052, 0.00900408163265306, 0.014248868778280544, 0.010177165354330709, 0.016848360655737703, 0.010407692307692307, 0.01423474178403756, 0.03445098039215686, 0.1198953488372093 ], "xaxis": "x", "yaxis": "y" }, { "alignmentgroup": "True", "bingroup": "x", "hovertemplate": "type=Income
abs_rel_error=%{x}
count=%{y}", "legendgroup": "Income", "marker": { "color": "#E45756", "pattern": { "shape": "" } }, "name": "Income", "nbinsx": 25, "offsetgroup": "Income", "orientation": "v", "showlegend": true, "type": "histogram", "x": [ 0.4004797438271611, 0.40341418142313823, 0.3351192620126394, 0.4161058261270803, 0.14438445016049928, 0.10252840124279951, 0.23797438722701414, 0.178106243715376, 0.5186115944430726, 0.29825942305351566, 1.4567652443696477, 0.03183803889799751, 0.06734744753269177, 0.7210127454357561, 0.31651236220225987, 0.3286632022851865, 0.25359908213292875, 0.3610102717809437, 0.2253609902791182, 0.16066211060037974, 0.24849022511592575, 0.19723648887556322, 0.5268485592811336, 0.3405786927236505, 0.3796564105307725, 0.08583011715432212, 0.6870272382621185, 0.3306700332342735, 0.036074043259532375, 0.05507747114593405, 0.1295010390235976, 0.2978580465900915, 0.1482454205629078, 0.07790119716920403, 0.11942482769673059, 0.08948943429057277, 0.5313912828333406, 0.32365185255052603, 0.5463152781544124, 0.18355472346812002, 0.8522488602172827, 0.41186044074195766, 0.009575826511717534, 0.022330798960262174, 0.018152497844887632, 0.2137724215911807, 0.05671903367986946, 0.04136430621809064, 0.020176712383563968, 0.032981828477521326, 0.3933035894337455, 0.1554959274292408, 0.6705016045649548, 0.34306242316623414, 0.8776670381752926, 0.36477316095792384, 0.04948382308866023, 0.005926274747684363, 0.2748955486040605, 0.025856544955982105, 0.05861645779174272, 0.009472636815920399, 0.0674988587157871, 0.09100719243071885, 0.5683290127708115, 0.37788556431753584, 0.6599209952409205, 0.3519169402055278, 0.926093468847436, 0.5411489569515877, 0.021913807990249713, 0.027075513767568005, 0.7028186905738795, 0.12355892827231613, 0.0043947064109843, 0.07788879729874215, 0.044763069902689655, 0.03302839517531753, 0.5051594652084163, 0.3410913533638674, 1.3996416132986862, 0.4217758056659471, 0.8938109263266951, 0.3061781975546826, 0.2014107579905191, 0.1312087296882052, 1.3177206388728102, 0.41757302193739215, 0.001641796095838944, 0.055548219915163316, 0.06931179393716337, 0.014390288378878267, 0.5690920880120897, 0.3000060001200024, 2.0053488866055895, 0.5327175225989137, 0.8838300244961119, 0.2543438684070016, 0.23130324235339317, 0.13788631762127068, 1.1036684328016286, 0.2929200247985121, 0.22426015555054024, 0.10265211883453884, 0.25739055108642755, 0.1654472409939871, 0.6548776859845074, 0.21843709556569715, 0.11561609505371812, 0.9584719449988911, 0.8238162798119074, 0.2921755136510056, 0.018635970973933203, 0.03576509528003329, 0.9351859530130943, 0.42426699448120736, 0.37425958524580105, 0.11674008810572688, 0.34968426788060253, 0.013653837656312139, 0.5921018399257751, 0.10049658182465986, 0.2726359128937982, 0.5879039828349202, 0.8940928033675293, 0.41786588717333734, 0.5185132666081015, 0.3594096932497985, 0.9099574111044325, 0.36029554107246564, 0.48128512987966093, 0.05185389133627019, 0.7687347974140251, 0.41847032688035546, 0.6953032068669253, 0.1976713338246131, 0.2977332285215963, 0.5333202693056003, 0.9123670084028781, 0.2545194825807476, 0.9662612449715495, 0.8233452654788964, 1.9744929427631963, 1.047060721777013, 0.9367520883717267, 0.7770861123462279, 0.8740817696676546, 0.47741703606086017, 0.7731103112118011, 0.06645748675479657, 0.07808059242310535, 0.8882133796539666, 0.909647398254271, 0.28981260311753554, 0.9959892583333434, 0.9740408856051719, 0.13997792447293808, 0.4825510569688284, 1, 1, 1, 1, 0.8544726863824529, 0.8641782831988262, 0.954965388055215, 0.8330539969394447, 0.9959398086533165, 0.8497483037863865, 1, 1, 0.8716177565741325, 0.7557928672174088, 0.7071891843278274, 1.105994787141616, 1, 1, 0.9922833558904304, 0.8975380674541056, 0.9950033873478707, 0.8492068429237947, 1, 1, 0.017088958659321005, 0.0880552621833426, 0.10921737081870772, 0.19660075398326152, 0.1490367015000328, 0.07762594229663043, 0.11599925505202863, 0.12031066997613238, 0.5511785882770285, 0.2963277695567451, 0.5193614208138149, 0.22729454094628126, 0.9111009430952626, 0.3593676086144653 ], "xaxis": "x", "yaxis": "y" } ], "layout": { "annotations": [ { "showarrow": false, "text": "Source: PolicyEngine UK tax-benefit microsimulation model (version 2.16.0)", "x": 0, "xanchor": "left", "xref": "paper", "y": -0.2, "yanchor": "bottom", "yref": "paper" } ], "barmode": "relative", "font": { "color": "black", "family": "Roboto Serif" }, "height": 600, "images": [ { "sizex": 0.15, "sizey": 0.15, "source": "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png", "x": 1.1, "xanchor": "right", "xref": "paper", "y": -0.2, "yanchor": "bottom", "yref": "paper" } ], "legend": { "title": { "text": "Category" }, "tracegroupgap": 0 }, "margin": { "b": 120, "l": 120, "r": 120, "t": 120 }, "modebar": { "activecolor": "#F4F4F4", "bgcolor": "#F4F4F4", "color": "#F4F4F4" }, "paper_bgcolor": "#F4F4F4", "plot_bgcolor": "#F4F4F4", "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "white", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "white", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "#C8D4E3", "linecolor": "#C8D4E3", "minorgridcolor": "#C8D4E3", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "#C8D4E3", "linecolor": "#C8D4E3", "minorgridcolor": "#C8D4E3", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "white", "showlakes": true, "showland": true, "subunitcolor": "#C8D4E3" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "white", "polar": { "angularaxis": { "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "" }, "bgcolor": "white", "radialaxis": { "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" }, "yaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" }, "zaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" }, "baxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" }, "bgcolor": "white", "caxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "#EBF0F8", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "#EBF0F8", "zerolinewidth": 2 } } }, "title": { "text": "Distribution of absolute relative errors" }, "width": 800, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "gridcolor": "#F4F4F4", "tickformat": ".0%", "title": { "text": "Absolute relative error" }, "zerolinecolor": "#F4F4F4" }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "gridcolor": "#F4F4F4", "title": { "text": "Number of variables" }, "zerolinecolor": "#616161" } } } }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = px.histogram(\n", " loss_results,\n", " x=\"abs_rel_error\",\n", " nbins=25,\n", " title=\"Distribution of absolute relative errors\",\n", " labels={\n", " \"value\": \"Absolute relative error\",\n", " \"count\": \"Number of variables\",\n", " },\n", " color=\"type\",\n", " color_discrete_sequence=px.colors.qualitative.T10,\n", ").update_layout(\n", " legend_title=\"Category\",\n", " xaxis_title=\"Absolute relative error\",\n", " yaxis_title=\"Number of variables\",\n", " xaxis_tickformat=\".0%\",)\n", "format_fig(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few notes:\n", " \n", "* We're comparing things in the same relevant time period (2022), and only doing a tiny amount of adjustment to the statistics: OBR statistics are taken directly from the latest EFO, ONS statistics are the most recent projections for 2022, and HMRC statistics are uprated from 2021 to 2022 using the same standard uprating factors we use in the model (and it's only one year adjustment).\n", "* Demogaphics look basically fine: that's expected, because the DWP applies an optimisation algorithm to optimise the household weights to be as close as possible to a similar set of demographic statistics. It's a good sign that we use slightly different statistics than it was trained on and get good accuracy.\n", "* Incomes look *not great at all*. We'll take a closer look below to understand why. But the FRS is well-known to under-report income significantly.\n", "* Tax-benefit programs also look *not good*. And this is a concern! Because we're using this dataset to answer questions about tax-benefit programs, and the FRS isn't even providing a good representation of them under baseline law.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "alignmentgroup": "True", "hovertemplate": "_variable=target
band=%{x}
value=%{y}", "legendgroup": "target", "marker": { "color": "#4C78A8", "pattern": { "shape": "" } }, "name": "target", "offsetgroup": "target", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 1617313, 4014377, 7638005, 4812367, 3006948, 2630421, 1271798, 597630, 206654, 141418, 73471, 40063, 20945, 26071355 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=estimate
band=%{x}
value=%{y}", "legendgroup": "estimate", "marker": { "color": "#F58518", "pattern": { "shape": "" } }, "name": "estimate", "offsetgroup": "estimate", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 964866, 2694999, 7217323, 4704903, 2989128, 2559201, 1438669, 680035, 199263, 192245, 133963, 1040, 0, 23775635 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=error
band=%{x}
value=%{y}", "legendgroup": "error", "marker": { "color": "#E45756", "pattern": { "shape": "" } }, "name": "error", "offsetgroup": "error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ -652447, -1319378, -420682, -107464, -17820, -71220, 166871, 82405, -7391, 50827, 60492, -39023, -20945, -2295720 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=rel_error
band=%{x}
value=%{y}", "legendgroup": "rel_error", "marker": { "color": "#72B7B2", "pattern": { "shape": "" } }, "name": "rel_error", "offsetgroup": "rel_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ -0.40341418142313823, -0.3286632022851865, -0.05507747114593405, -0.022330798960262174, -0.005926274747684363, -0.027075513767568005, 0.1312087296882052, 0.13788631762127068, -0.03576509528003329, 0.3594096932497985, 0.8233452654788964, -0.9740408856051719, -1, -0.0880552621833426 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=abs_error
band=%{x}
value=%{y}", "legendgroup": "abs_error", "marker": { "color": "#54A24B", "pattern": { "shape": "" } }, "name": "abs_error", "offsetgroup": "abs_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 652447, 1319378, 420682, 107464, 17820, 71220, 166871, 82405, 7391, 50827, 60492, 39023, 20945, 2295720 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=abs_rel_error
band=%{x}
value=%{y}", "legendgroup": "abs_rel_error", "marker": { "color": "#EECA3B", "pattern": { "shape": "" } }, "name": "abs_rel_error", "offsetgroup": "abs_rel_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 0.40341418142313823, 0.3286632022851865, 0.05507747114593405, 0.022330798960262174, 0.005926274747684363, 0.027075513767568005, 0.1312087296882052, 0.13788631762127068, 0.03576509528003329, 0.3594096932497985, 0.8233452654788964, 0.9740408856051719, 1, 0.0880552621833426 ], "yaxis": "y" } ], "layout": { "annotations": [ { "showarrow": false, "text": "Source: PolicyEngine UK tax-benefit microsimulation model (version 2.16.0)", "x": 0, "xanchor": "left", "xref": "paper", "y": -0.2, "yanchor": "bottom", "yref": "paper" } ], "barmode": "group", "font": { "color": "black", "family": "Roboto Serif" }, "height": 600, "images": [ { "sizex": 0.15, "sizey": 0.15, "source": "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png", "x": 1.1, "xanchor": "right", "xref": "paper", "y": -0.2, "yanchor": "bottom", "yref": "paper" } ], "legend": { "title": { "text": "Variable" }, "tracegroupgap": 0 }, "margin": { "b": 120, "l": 120, "r": 120, "t": 120 }, "modebar": { "activecolor": "#F4F4F4", "bgcolor": "#F4F4F4", "color": "#F4F4F4" }, "paper_bgcolor": "#F4F4F4", "plot_bgcolor": "#F4F4F4", "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "white", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "white", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "#C8D4E3", "linecolor": "#C8D4E3", "minorgridcolor": "#C8D4E3", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "#C8D4E3", "linecolor": "#C8D4E3", "minorgridcolor": "#C8D4E3", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "white", "showlakes": true, "showland": true, "subunitcolor": "#C8D4E3" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "white", "polar": { "angularaxis": { "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "" }, "bgcolor": "white", "radialaxis": { "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" }, "yaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" }, "zaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" }, "baxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" }, "bgcolor": "white", "caxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "#EBF0F8", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "#EBF0F8", "zerolinewidth": 2 } } }, "title": { "text": "Estimates and ground truth for employment income band counts" }, "width": 800, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "gridcolor": "#F4F4F4", "title": { "text": "" }, "zerolinecolor": "#F4F4F4" }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "gridcolor": "#F4F4F4", "title": { "text": "Value" }, "zerolinecolor": "#616161" } } } }, "metadata": {}, "output_type": "display_data" } ], "source": [ "incomes = loss_results[loss_results.type == \"Income\"]\n", "incomes[\"band\"] = incomes.name.apply(\n", " lambda x: \"_\".join(x.split(\"band_\")[1].split(\"_\")[1:])\n", ")\n", "incomes[\"count\"] = incomes.name.apply(lambda x: \"count\" in x)\n", "incomes[\"variable\"] = incomes.name.apply(\n", " lambda x: x.split(\"_income_band\")[0].split(\"_count\")[0].split(\"hmrc/\")[-1]\n", ")\n", "\n", "variable = \"employment_income\"\n", "count = True\n", "variable_df = incomes[\n", " (incomes.variable == variable) & (incomes[\"count\"] == count)\n", "]\n", "\n", "fig = px.bar(\n", " variable_df,\n", " x=\"band\",\n", " y=[\n", " \"target\",\n", " \"estimate\",\n", " \"error\",\n", " \"rel_error\",\n", " \"abs_error\",\n", " \"abs_rel_error\",\n", " ],\n", " barmode=\"group\",\n", " color_discrete_sequence=px.colors.qualitative.T10,\n", ")\n", "\n", "fig = fig.update_layout(\n", " title=\"Estimates and ground truth for employment income band counts\",\n", " xaxis_title=\"\",\n", " yaxis_title=\"Value\",\n", " legend_title=\"Variable\",\n", ")\n", "format_fig(fig)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "alignmentgroup": "True", "hovertemplate": "_variable=target
band=%{x}
value=%{y}", "legendgroup": "target", "marker": { "color": "#4C78A8", "pattern": { "shape": "" } }, "name": "target", "offsetgroup": "target", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 87090, 409517, 849159, 670784, 620441, 560827, 310898, 197751, 80052, 63005, 40609, 27414, 18622, 3936170 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=estimate
band=%{x}
value=%{y}", "legendgroup": "estimate", "marker": { "color": "#F58518", "pattern": { "shape": "" } }, "name": "estimate", "offsetgroup": "estimate", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 94007.38010162115, 431263.90768535435, 853037.4832375497, 687032.4892986268, 634663.1030393243, 594413.8105289042, 326867.89287211, 203441.7492004782, 82315.94621220231, 69238.60412976146, 42827.042475283146, 29137.24017497897, 19222.35559424758, 4067469.004550442 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=error
band=%{x}
value=%{y}", "legendgroup": "error", "marker": { "color": "#E45756", "pattern": { "shape": "" } }, "name": "error", "offsetgroup": "error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 6917.380101621151, 21746.907685354352, 3878.4832375496626, 16248.48929862678, 14222.103039324284, 33586.8105289042, 15969.89287211001, 5690.749200478196, 2263.9462122023106, 6233.604129761457, 2218.042475283146, 1723.2401749789715, 600.3555942475796, 131299.0045504421 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=rel_error
band=%{x}
value=%{y}", "legendgroup": "rel_error", "marker": { "color": "#72B7B2", "pattern": { "shape": "" } }, "name": "rel_error", "offsetgroup": "rel_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 0.07942794926651912, 0.05310379712039879, 0.0045674405353410405, 0.024223131885415843, 0.02292257126676716, 0.059888005621883754, 0.05136698490215443, 0.028777347272469906, 0.028280945038254016, 0.09893824505612979, 0.05461948029459346, 0.0628598590128756, 0.03223905027642464, 0.03335704620238508 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=abs_error
band=%{x}
value=%{y}", "legendgroup": "abs_error", "marker": { "color": "#54A24B", "pattern": { "shape": "" } }, "name": "abs_error", "offsetgroup": "abs_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 6917.380101621151, 21746.907685354352, 3878.4832375496626, 16248.48929862678, 14222.103039324284, 33586.8105289042, 15969.89287211001, 5690.749200478196, 2263.9462122023106, 6233.604129761457, 2218.042475283146, 1723.2401749789715, 600.3555942475796, 131299.0045504421 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=abs_rel_error
band=%{x}
value=%{y}", "legendgroup": "abs_rel_error", "marker": { "color": "#EECA3B", "pattern": { "shape": "" } }, "name": "abs_rel_error", "offsetgroup": "abs_rel_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 0.07942794926651912, 0.05310379712039879, 0.0045674405353410405, 0.024223131885415843, 0.02292257126676716, 0.059888005621883754, 0.05136698490215443, 0.028777347272469906, 0.028280945038254016, 0.09893824505612979, 0.05461948029459346, 0.0628598590128756, 0.03223905027642464, 0.03335704620238508 ], "yaxis": "y" } ], "layout": { "annotations": [ { "showarrow": false, "text": "Source: PolicyEngine UK tax-benefit microsimulation model (version 2.16.0)", "x": 0, "xanchor": "left", "xref": "paper", "y": -0.2, "yanchor": "bottom", "yref": "paper" } ], "barmode": "group", "font": { "color": "black", "family": "Roboto Serif" }, "height": 600, "images": [ { "sizex": 0.15, "sizey": 0.15, "source": "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png", "x": 1.1, "xanchor": "right", "xref": "paper", "y": -0.2, "yanchor": "bottom", "yref": "paper" } ], "legend": { "title": { "text": "Variable" }, "tracegroupgap": 0 }, "margin": { "b": 120, "l": 120, "r": 120, "t": 120 }, "modebar": { "activecolor": "#F4F4F4", "bgcolor": "#F4F4F4", "color": "#F4F4F4" }, "paper_bgcolor": "#F4F4F4", "plot_bgcolor": "#F4F4F4", "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "white", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "white", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "#C8D4E3", "linecolor": "#C8D4E3", "minorgridcolor": "#C8D4E3", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "#C8D4E3", "linecolor": "#C8D4E3", "minorgridcolor": "#C8D4E3", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "white", "showlakes": true, "showland": true, "subunitcolor": "#C8D4E3" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "white", "polar": { "angularaxis": { "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "" }, "bgcolor": "white", "radialaxis": { "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" }, "yaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" }, "zaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" }, "baxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" }, "bgcolor": "white", "caxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "#EBF0F8", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "#EBF0F8", "zerolinewidth": 2 } } }, "title": { "text": "Estimates and ground truth for dividend income band counts" }, "width": 800, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "gridcolor": "#F4F4F4", "title": { "text": "" }, "zerolinecolor": "#F4F4F4" }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "gridcolor": "#F4F4F4", "title": { "text": "Value" }, "zerolinecolor": "#616161" } } } }, "metadata": {}, "output_type": "display_data" } ], "source": [ "variable = \"dividend_income\"\n", "count = True\n", "variable_df = incomes[\n", " (incomes.variable == variable) & (incomes[\"count\"] == count)\n", "]\n", "\n", "fig = px.bar(\n", " variable_df,\n", " x=\"band\",\n", " y=[\n", " \"target\",\n", " \"estimate\",\n", " \"error\",\n", " \"rel_error\",\n", " \"abs_error\",\n", " \"abs_rel_error\",\n", " ],\n", " barmode=\"group\",\n", " color_discrete_sequence=px.colors.qualitative.T10,\n", ")\n", "\n", "fig = fig.update_layout(\n", " title=\"Estimates and ground truth for dividend income band counts\",\n", " xaxis_title=\"\",\n", " yaxis_title=\"Value\",\n", " legend_title=\"Variable\",\n", ")\n", "format_fig(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a few interesting things here:\n", " \n", "* The FRS over-estimates incomes in the upper-middle of the distribution and under-estimates them in the top of the distribution. The reason for this is probably: the FRS misses out the top completely, and then because of the weight optimisation (which scales up the working-age age groups to hit their population targets), the middle of the distribution is inflated, overcompensating.\n", "* Some income types are severely under-estimated across all bands: notably capital incomes. This probably reflects issues with the survey questionnaire design more than sampling bias.\n", "\n", "OK, so what can we do about it?\n", "\n", "## Simulating benefits\n", "\n", "First, let's turn on the model and check nothing unexpected happens. The table below shows each of our known statistics, and how they changed after replacing reported benefits with simulated benefits." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "
abs_error_originalabs_error_simulatedabs_rel_error_originalabs_rel_error_simulatedchange_in_abs_rel_errorerror_originalerror_simulatedestimate_originalestimate_simulatedrel_error_originalrel_error_simulatedtarget_originaltarget_simulatedtype_originaltype_simulated
name
\n", "\n", "
\n", "Loading ITables v2.2.1 from the init_notebook_mode cell...\n", "(need help?)
\n", "\n" ], "text/plain": [ " abs_error_original \\\n", "name \n", "obr/attendance_allowance 1.966349e+09 \n", "obr/carers_allowance 4.176997e+08 \n", "obr/dla 1.918365e+09 \n", "obr/esa 4.890708e+09 \n", "obr/esa_contrib 2.722320e+09 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... 6.876600e+05 \n", "hmrc/savings_interest_income_income_band_13_12_... 1.541802e+09 \n", "hmrc/savings_interest_income_count_income_band_... 2.623900e+06 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... 7.816554e+10 \n", "hmrc/dividend_income_count_income_band_13_12_57... 1.414532e+06 \n", "\n", " abs_error_simulated \\\n", "name \n", "obr/attendance_allowance 2.079068e+09 \n", "obr/carers_allowance 4.909088e+08 \n", "obr/dla 2.110043e+09 \n", "obr/esa 4.890708e+09 \n", "obr/esa_contrib 2.722320e+09 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... 6.844470e+05 \n", "hmrc/savings_interest_income_income_band_13_12_... 1.551142e+09 \n", "hmrc/savings_interest_income_count_income_band_... 2.674657e+06 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... 7.815806e+10 \n", "hmrc/dividend_income_count_income_band_13_12_57... 1.410788e+06 \n", "\n", " abs_rel_error_original \\\n", "name \n", "obr/attendance_allowance 0.344974 \n", "obr/carers_allowance 0.126576 \n", "obr/dla 0.319728 \n", "obr/esa 0.404191 \n", "obr/esa_contrib 0.604960 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... 0.296328 \n", "hmrc/savings_interest_income_income_band_13_12_... 0.519361 \n", "hmrc/savings_interest_income_count_income_band_... 0.227295 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... 0.911101 \n", "hmrc/dividend_income_count_income_band_13_12_57... 0.359368 \n", "\n", " abs_rel_error_simulated \\\n", "name \n", "obr/attendance_allowance 0.364749 \n", "obr/carers_allowance 0.148760 \n", "obr/dla 0.351674 \n", "obr/esa 0.404191 \n", "obr/esa_contrib 0.604960 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... 0.294943 \n", "hmrc/savings_interest_income_income_band_13_12_... 0.522508 \n", "hmrc/savings_interest_income_count_income_band_... 0.231691 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... 0.911014 \n", "hmrc/dividend_income_count_income_band_13_12_57... 0.358416 \n", "\n", " change_in_abs_rel_error \\\n", "name \n", "obr/attendance_allowance 0.019775 \n", "obr/carers_allowance 0.022185 \n", "obr/dla 0.031946 \n", "obr/esa 0.000000 \n", "obr/esa_contrib 0.000000 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... -0.001385 \n", "hmrc/savings_interest_income_income_band_13_12_... 0.003146 \n", "hmrc/savings_interest_income_count_income_band_... 0.004397 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... -0.000087 \n", "hmrc/dividend_income_count_income_band_13_12_57... -0.000951 \n", "\n", " error_original \\\n", "name \n", "obr/attendance_allowance -1.966349e+09 \n", "obr/carers_allowance -4.176997e+08 \n", "obr/dla -1.918365e+09 \n", "obr/esa -4.890708e+09 \n", "obr/esa_contrib -2.722320e+09 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... -6.876600e+05 \n", "hmrc/savings_interest_income_income_band_13_12_... 1.541802e+09 \n", "hmrc/savings_interest_income_count_income_band_... 2.623900e+06 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... -7.816554e+10 \n", "hmrc/dividend_income_count_income_band_13_12_57... -1.414532e+06 \n", "\n", " error_simulated \\\n", "name \n", "obr/attendance_allowance -2.079068e+09 \n", "obr/carers_allowance -4.909088e+08 \n", "obr/dla -2.110043e+09 \n", "obr/esa -4.890708e+09 \n", "obr/esa_contrib -2.722320e+09 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... -6.844470e+05 \n", "hmrc/savings_interest_income_income_band_13_12_... 1.551142e+09 \n", "hmrc/savings_interest_income_count_income_band_... 2.674657e+06 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... -7.815806e+10 \n", "hmrc/dividend_income_count_income_band_13_12_57... -1.410788e+06 \n", "\n", " estimate_original \\\n", "name \n", "obr/attendance_allowance 3.733651e+09 \n", "obr/carers_allowance 2.882300e+09 \n", "obr/dla 4.081635e+09 \n", "obr/esa 7.209292e+09 \n", "obr/esa_contrib 1.777680e+09 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... 1.632946e+06 \n", "hmrc/savings_interest_income_income_band_13_12_... 4.510452e+09 \n", "hmrc/savings_interest_income_count_income_band_... 1.416795e+07 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... 7.626864e+09 \n", "hmrc/dividend_income_count_income_band_13_12_57... 2.521638e+06 \n", "\n", " estimate_simulated \\\n", "name \n", "obr/attendance_allowance 3.620932e+09 \n", "obr/carers_allowance 2.809091e+09 \n", "obr/dla 3.889957e+09 \n", "obr/esa 7.209292e+09 \n", "obr/esa_contrib 1.777680e+09 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... 1.636159e+06 \n", "hmrc/savings_interest_income_income_band_13_12_... 4.519792e+09 \n", "hmrc/savings_interest_income_count_income_band_... 1.421871e+07 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... 7.634344e+09 \n", "hmrc/dividend_income_count_income_band_13_12_57... 2.525382e+06 \n", "\n", " rel_error_original \\\n", "name \n", "obr/attendance_allowance -0.344974 \n", "obr/carers_allowance -0.126576 \n", "obr/dla -0.319728 \n", "obr/esa -0.404191 \n", "obr/esa_contrib -0.604960 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... -0.296328 \n", "hmrc/savings_interest_income_income_band_13_12_... 0.519361 \n", "hmrc/savings_interest_income_count_income_band_... 0.227295 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... -0.911101 \n", "hmrc/dividend_income_count_income_band_13_12_57... -0.359368 \n", "\n", " rel_error_simulated \\\n", "name \n", "obr/attendance_allowance -0.364749 \n", "obr/carers_allowance -0.148760 \n", "obr/dla -0.351674 \n", "obr/esa -0.404191 \n", "obr/esa_contrib -0.604960 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... -0.294943 \n", "hmrc/savings_interest_income_income_band_13_12_... 0.522508 \n", "hmrc/savings_interest_income_count_income_band_... 0.231691 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... -0.911014 \n", "hmrc/dividend_income_count_income_band_13_12_57... -0.358416 \n", "\n", " target_original \\\n", "name \n", "obr/attendance_allowance 5.700000e+09 \n", "obr/carers_allowance 3.300000e+09 \n", "obr/dla 6.000000e+09 \n", "obr/esa 1.210000e+10 \n", "obr/esa_contrib 4.500000e+09 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... 2.320606e+06 \n", "hmrc/savings_interest_income_income_band_13_12_... 2.968650e+09 \n", "hmrc/savings_interest_income_count_income_band_... 1.154405e+07 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... 8.579240e+10 \n", "hmrc/dividend_income_count_income_band_13_12_57... 3.936170e+06 \n", "\n", " target_simulated \\\n", "name \n", "obr/attendance_allowance 5.700000e+09 \n", "obr/carers_allowance 3.300000e+09 \n", "obr/dla 6.000000e+09 \n", "obr/esa 1.210000e+10 \n", "obr/esa_contrib 4.500000e+09 \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... 2.320606e+06 \n", "hmrc/savings_interest_income_income_band_13_12_... 2.968650e+09 \n", "hmrc/savings_interest_income_count_income_band_... 1.154405e+07 \n", "hmrc/dividend_income_income_band_13_12_570.0_to... 8.579240e+10 \n", "hmrc/dividend_income_count_income_band_13_12_57... 3.936170e+06 \n", "\n", " type_original \\\n", "name \n", "obr/attendance_allowance Tax-benefit \n", "obr/carers_allowance Tax-benefit \n", "obr/dla Tax-benefit \n", "obr/esa Tax-benefit \n", "obr/esa_contrib Tax-benefit \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... Income \n", "hmrc/savings_interest_income_income_band_13_12_... Income \n", "hmrc/savings_interest_income_count_income_band_... Income \n", "hmrc/dividend_income_income_band_13_12_570.0_to... Income \n", "hmrc/dividend_income_count_income_band_13_12_57... Income \n", "\n", " type_simulated \n", "name \n", "obr/attendance_allowance Tax-benefit \n", "obr/carers_allowance Tax-benefit \n", "obr/dla Tax-benefit \n", "obr/esa Tax-benefit \n", "obr/esa_contrib Tax-benefit \n", "... ... \n", "hmrc/property_income_count_income_band_13_12_57... Income \n", "hmrc/savings_interest_income_income_band_13_12_... Income \n", "hmrc/savings_interest_income_count_income_band_... Income \n", "hmrc/dividend_income_income_band_13_12_570.0_to... Income \n", "hmrc/dividend_income_count_income_band_13_12_57... Income \n", "\n", "[334 rows x 15 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "original_frs_loss = loss_results.copy()\n", "frs_loss = get_loss(FRS_2022_23, None, 2022).copy()\n", "combined_frs_loss = pd.merge(\n", " on=\"name\",\n", " left=original_frs_loss,\n", " right=frs_loss,\n", " suffixes=(\"_original\", \"_simulated\"),\n", ")\n", "combined_frs_loss[\"change_in_abs_rel_error\"] = (\n", " combined_frs_loss[\"abs_rel_error_simulated\"]\n", " - combined_frs_loss[\"abs_rel_error_original\"]\n", ")\n", "# Sort columns\n", "combined_frs_loss.sort_index(axis=1, inplace=True)\n", "combined_frs_loss = combined_frs_loss.set_index(\"name\")\n", "\n", "combined_frs_loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, a few notes:\n", " \n", "* You might be thinking: 'why do some of the HMRC income statistics change?'. That's because of the State Pension, which is simulated in the model. The State Pension is a component of total income, so people might be moved from one income band to another if we adjust their State Pension payments slightly.\n", "* Some of the tax-benefit statistics change, and get better and worse. This is expected for a variety of reasons- one is that incomes and benefits are often out of sync with each other in the data (the income in the survey week might not match income in the benefits assessment time period).\n", "\n", "## Adding imputations\n", "\n", "Now, let's add in the imputations for wealth and consumption. For this, we train *quantile regression forests* (essentially, random forest models that capture the conditional distribution of the data) to predict wealth and consumption variables from FRS-shared variables in other surveys.\n", "\n", "The datasets we use are:\n", "* The Wealth and Assets Survey (WAS) for wealth imputations.\n", "* The Living Costs and Food Survey (LCFS) for most consumption imputations. \n", "* The Effects of Taxes and Benefits on Household Income (ETB) for '£ consumption that is full VAT rateable'. For example, different households will have different profiles in terms of the share of their consumption that falls on the VATable items.\n", " \n", "Below is a table showing how just adding these imputations changes our objective statistics (filtered to just rows which changed). Not bad pre-calibrated performance! And we've picked up an extra £200bn in taxes.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "
nameestimate_simulatedtarget_simulatederror_simulatedabs_error_simulatedrel_error_simulatedabs_rel_error_simulatedtype_simulatedestimate_imputedtarget_imputederror_imputedabs_error_imputedrel_error_imputedabs_rel_error_imputedtype_imputedchange_in_abs_rel_error
\n", "\n", "
\n", "Loading ITables v2.2.1 from the init_notebook_mode cell...\n", "(need help?)
\n", "\n" ], "text/plain": [ " name estimate_simulated \\\n", "0 obr/attendance_allowance 3.620932e+09 \n", "1 obr/carers_allowance 2.809091e+09 \n", "2 obr/dla 3.889957e+09 \n", "3 obr/esa 7.209292e+09 \n", "4 obr/esa_contrib 1.777680e+09 \n", ".. ... ... \n", "329 hmrc/property_income_count_income_band_13_12_5... 1.636159e+06 \n", "330 hmrc/savings_interest_income_income_band_13_12... 4.519792e+09 \n", "331 hmrc/savings_interest_income_count_income_band... 1.421871e+07 \n", "332 hmrc/dividend_income_income_band_13_12_570.0_t... 7.634344e+09 \n", "333 hmrc/dividend_income_count_income_band_13_12_5... 2.525382e+06 \n", "\n", " target_simulated error_simulated abs_error_simulated \\\n", "0 5.700000e+09 -2.079068e+09 2.079068e+09 \n", "1 3.300000e+09 -4.909088e+08 4.909088e+08 \n", "2 6.000000e+09 -2.110043e+09 2.110043e+09 \n", "3 1.210000e+10 -4.890708e+09 4.890708e+09 \n", "4 4.500000e+09 -2.722320e+09 2.722320e+09 \n", ".. ... ... ... \n", "329 2.320606e+06 -6.844470e+05 6.844470e+05 \n", "330 2.968650e+09 1.551142e+09 1.551142e+09 \n", "331 1.154405e+07 2.674657e+06 2.674657e+06 \n", "332 8.579240e+10 -7.815806e+10 7.815806e+10 \n", "333 3.936170e+06 -1.410788e+06 1.410788e+06 \n", "\n", " rel_error_simulated abs_rel_error_simulated type_simulated \\\n", "0 -0.364749 0.364749 Tax-benefit \n", "1 -0.148760 0.148760 Tax-benefit \n", "2 -0.351674 0.351674 Tax-benefit \n", "3 -0.404191 0.404191 Tax-benefit \n", "4 -0.604960 0.604960 Tax-benefit \n", ".. ... ... ... \n", "329 -0.294943 0.294943 Income \n", "330 0.522508 0.522508 Income \n", "331 0.231691 0.231691 Income \n", "332 -0.911014 0.911014 Income \n", "333 -0.358416 0.358416 Income \n", "\n", " estimate_imputed target_imputed error_imputed abs_error_imputed \\\n", "0 3.620932e+09 5.700000e+09 -2.079068e+09 2.079068e+09 \n", "1 2.809091e+09 3.300000e+09 -4.909088e+08 4.909088e+08 \n", "2 3.889957e+09 6.000000e+09 -2.110043e+09 2.110043e+09 \n", "3 7.209292e+09 1.210000e+10 -4.890708e+09 4.890708e+09 \n", "4 1.777680e+09 4.500000e+09 -2.722320e+09 2.722320e+09 \n", ".. ... ... ... ... \n", "329 1.636159e+06 2.320606e+06 -6.844470e+05 6.844470e+05 \n", "330 4.519792e+09 2.968650e+09 1.551142e+09 1.551142e+09 \n", "331 1.421871e+07 1.154405e+07 2.674657e+06 2.674657e+06 \n", "332 7.634344e+09 8.579240e+10 -7.815806e+10 7.815806e+10 \n", "333 2.525382e+06 3.936170e+06 -1.410788e+06 1.410788e+06 \n", "\n", " rel_error_imputed abs_rel_error_imputed type_imputed \\\n", "0 -0.364749 0.364749 Tax-benefit \n", "1 -0.148760 0.148760 Tax-benefit \n", "2 -0.351674 0.351674 Tax-benefit \n", "3 -0.404191 0.404191 Tax-benefit \n", "4 -0.604960 0.604960 Tax-benefit \n", ".. ... ... ... \n", "329 -0.294943 0.294943 Income \n", "330 0.522508 0.522508 Income \n", "331 0.231691 0.231691 Income \n", "332 -0.911014 0.911014 Income \n", "333 -0.358416 0.358416 Income \n", "\n", " change_in_abs_rel_error \n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 0.0 \n", ".. ... \n", "329 0.0 \n", "330 0.0 \n", "331 0.0 \n", "332 0.0 \n", "333 0.0 \n", "\n", "[334 rows x 16 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_loss = get_loss(ExtendedFRS_2022_23, None, 2022).copy()\n", "new_loss_against_old = pd.merge(\n", " on=\"name\",\n", " left=frs_loss,\n", " right=new_loss,\n", " suffixes=(\"_simulated\", \"_imputed\"),\n", ")\n", "new_loss_against_old[\"change_in_abs_rel_error\"] = (\n", " new_loss_against_old[\"abs_rel_error_imputed\"]\n", " - new_loss_against_old[\"abs_rel_error_simulated\"]\n", ")\n", "\n", "new_loss_against_old" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calibration\n", "\n", "Now, we've got a dataset that's performs pretty well without explicitly targeting the official statistics we care about. So it's time to add the final touch- calibrating the weights to explicitly minimise error against the target set." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'new_loss' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/r_/j9kk4vmd3tj29ljn52_76m4h0000gn/T/ipykernel_11352/1454126355.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m calibrated_loss_against_imputed = pd.merge(\n\u001b[1;32m 3\u001b[0m \u001b[0mon\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"name\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mleft\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mnew_loss\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mright\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcalibrated_loss\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0msuffixes\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"_imputed\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"_calibrated\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'new_loss' is not defined" ] } ], "source": [ "calibrated_loss = get_loss(ReweightedFRS_2022_23, None, 2022).copy()\n", "calibrated_loss_against_imputed = pd.merge(\n", " on=\"name\",\n", " left=new_loss,\n", " right=calibrated_loss,\n", " suffixes=(\"_imputed\", \"_calibrated\"),\n", ")\n", "\n", "calibrated_loss_against_imputed[\"change_in_abs_rel_error\"] = (\n", " calibrated_loss_against_imputed[\"abs_rel_error_calibrated\"]\n", " - calibrated_loss_against_imputed[\"abs_rel_error_imputed\"]\n", ")\n", "calibrated_loss_against_imputed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also look at incomes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'calibrated_loss' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m/var/folders/r_/j9kk4vmd3tj29ljn52_76m4h0000gn/T/ipykernel_11352/4006190380.py\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mincomes\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcalibrated_loss\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mloss_results\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtype\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m\"Income\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m incomes[\"band\"] = incomes.name.apply(\n\u001b[1;32m 3\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m\"_\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"band_\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"_\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m )\n\u001b[1;32m 5\u001b[0m \u001b[0mincomes\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"count\"\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mincomes\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;32mlambda\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m\"count\"\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'calibrated_loss' is not defined" ] } ], "source": [ "incomes = calibrated_loss[loss_results.type == \"Income\"]\n", "incomes[\"band\"] = incomes.name.apply(\n", " lambda x: \"_\".join(x.split(\"band_\")[1].split(\"_\")[1:])\n", ")\n", "incomes[\"count\"] = incomes.name.apply(lambda x: \"count\" in x)\n", "incomes[\"variable\"] = incomes.name.apply(\n", " lambda x: x.split(\"_income_band\")[0].split(\"_count\")[0].split(\"hmrc/\")[-1]\n", ")\n", "\n", "variable = \"employment_income\"\n", "count = True\n", "variable_df = incomes[\n", " (incomes.variable == variable) & (incomes[\"count\"] == count)\n", "]\n", "\n", "fig = px.bar(\n", " variable_df,\n", " x=\"band\",\n", " y=[\n", " \"target\",\n", " \"estimate\",\n", " \"error\",\n", " \"rel_error\",\n", " \"abs_error\",\n", " \"abs_rel_error\",\n", " ],\n", " barmode=\"group\",\n", " color_discrete_sequence=px.colors.qualitative.T10,\n", ")\n", "format_fig(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, what's happening here seems like: the FRS just doesn't have enough high-income records for calibration to work straight away. The optimiser can't just set really high weights for the few rich people we do have, because it'd hurt performance on the demographic statistics.\n", " \n", "So, we need a solution to add more high-income records. What we'll do is:\n", " \n", "* Train a QRF model to predict the distributions of income variables from the Survey of Personal Incomes from FRS demographic variables.\n", "* For each FRS person, add an 'imputed income' clone with zero weight.\n", "* Run the calibration again.\n", "\n", "## The Enhanced FRS\n", "\n", "Let's see how this new dataset performs." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "
nameestimate_calibratedtarget_calibratederror_calibratedabs_error_calibratedrel_error_calibratedabs_rel_error_calibratedtype_calibratedestimate_enhancedtarget_enhancederror_enhancedabs_error_enhancedrel_error_enhancedabs_rel_error_enhancedtype_enhancedchange_in_abs_rel_error
\n", "\n", "
\n", "Loading ITables v2.2.1 from the init_notebook_mode cell...\n", "(need help?)
\n", "\n" ], "text/plain": [ " name estimate_calibrated \\\n", "0 obr/attendance_allowance 6.107293e+09 \n", "1 obr/carers_allowance 3.873754e+09 \n", "2 obr/dla 6.378101e+09 \n", "3 obr/esa 1.263180e+10 \n", "4 obr/esa_contrib 4.737324e+09 \n", ".. ... ... \n", "330 hmrc/property_income_count_income_band_13_12_5... 2.403394e+06 \n", "331 hmrc/savings_interest_income_income_band_13_12... 2.708711e+09 \n", "332 hmrc/savings_interest_income_count_income_band... 1.186037e+07 \n", "333 hmrc/dividend_income_income_band_13_12_570.0_t... 6.534749e+10 \n", "334 hmrc/dividend_income_count_income_band_13_12_5... 4.076704e+06 \n", "\n", " target_calibrated error_calibrated abs_error_calibrated \\\n", "0 5.700000e+09 4.072930e+08 4.072930e+08 \n", "1 3.300000e+09 5.737544e+08 5.737544e+08 \n", "2 6.000000e+09 3.781010e+08 3.781010e+08 \n", "3 1.210000e+10 5.318008e+08 5.318008e+08 \n", "4 4.500000e+09 2.373239e+08 2.373239e+08 \n", ".. ... ... ... \n", "330 2.320606e+06 8.278751e+04 8.278751e+04 \n", "331 2.968650e+09 -2.599383e+08 2.599383e+08 \n", "332 1.154405e+07 3.163202e+05 3.163202e+05 \n", "333 8.579240e+10 -2.044491e+10 2.044491e+10 \n", "334 3.936170e+06 1.405341e+05 1.405341e+05 \n", "\n", " rel_error_calibrated abs_rel_error_calibrated type_calibrated \\\n", "0 0.071455 0.071455 Tax-benefit \n", "1 0.173865 0.173865 Tax-benefit \n", "2 0.063017 0.063017 Tax-benefit \n", "3 0.043950 0.043950 Tax-benefit \n", "4 0.052739 0.052739 Tax-benefit \n", ".. ... ... ... \n", "330 0.035675 0.035675 Income \n", "331 -0.087561 0.087561 Income \n", "332 0.027401 0.027401 Income \n", "333 -0.238307 0.238307 Income \n", "334 0.035703 0.035703 Income \n", "\n", " estimate_enhanced target_enhanced error_enhanced abs_error_enhanced \\\n", "0 5.811105e+09 5.700000e+09 1.111048e+08 1.111048e+08 \n", "1 3.696651e+09 3.300000e+09 3.966507e+08 3.966507e+08 \n", "2 6.123486e+09 6.000000e+09 1.234859e+08 1.234859e+08 \n", "3 1.192624e+10 1.210000e+10 -1.737601e+08 1.737601e+08 \n", "4 4.363998e+09 4.500000e+09 -1.360016e+08 1.360016e+08 \n", ".. ... ... ... ... \n", "330 2.418048e+06 2.320606e+06 9.744232e+04 9.744232e+04 \n", "331 3.047609e+09 2.968650e+09 7.895900e+07 7.895900e+07 \n", "332 1.203166e+07 1.154405e+07 4.876036e+05 4.876036e+05 \n", "333 8.355091e+10 8.579240e+10 -2.241495e+09 2.241495e+09 \n", "334 4.067469e+06 3.936170e+06 1.312990e+05 1.312990e+05 \n", "\n", " rel_error_enhanced abs_rel_error_enhanced type_enhanced \\\n", "0 0.019492 0.019492 Tax-benefit \n", "1 0.120197 0.120197 Tax-benefit \n", "2 0.020581 0.020581 Tax-benefit \n", "3 -0.014360 0.014360 Tax-benefit \n", "4 -0.030223 0.030223 Tax-benefit \n", ".. ... ... ... \n", "330 0.041990 0.041990 Income \n", "331 0.026598 0.026598 Income \n", "332 0.042239 0.042239 Income \n", "333 -0.026127 0.026127 Income \n", "334 0.033357 0.033357 Income \n", "\n", " change_in_abs_rel_error \n", "0 -0.051963 \n", "1 -0.053668 \n", "2 -0.042436 \n", "3 -0.029590 \n", "4 -0.022516 \n", ".. ... \n", "330 0.006315 \n", "331 -0.060964 \n", "332 0.014837 \n", "333 -0.212180 \n", "334 -0.002346 \n", "\n", "[335 rows x 16 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "efrs_loss = get_loss(EnhancedFRS_2022_23, None, 2022).copy()\n", "efrs_loss_against_calibrated = pd.merge(\n", " on=\"name\",\n", " left=calibrated_loss,\n", " right=efrs_loss,\n", " suffixes=(\"_calibrated\", \"_enhanced\"),\n", ")\n", "efrs_loss_against_calibrated[\"change_in_abs_rel_error\"] = (\n", " efrs_loss_against_calibrated[\"abs_rel_error_enhanced\"]\n", " - efrs_loss_against_calibrated[\"abs_rel_error_calibrated\"]\n", ")\n", "efrs_loss_against_calibrated" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally, let's look at those incomes again." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "alignmentgroup": "True", "hovertemplate": "_variable=target
band=%{x}
value=%{y}", "legendgroup": "target", "marker": { "color": "#4C78A8", "pattern": { "shape": "" } }, "name": "target", "offsetgroup": "target", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 1617313, 4014377, 7638005, 4812367, 3006948, 2630421, 1271798, 597630, 206654, 141418, 73471, 40063, 20945, 26071355 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=estimate
band=%{x}
value=%{y}", "legendgroup": "estimate", "marker": { "color": "#F58518", "pattern": { "shape": "" } }, "name": "estimate", "offsetgroup": "estimate", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 1642834.9168719798, 4202589.150144756, 7492887.128450707, 4769640.7426556945, 3029480.062438652, 2652012.799513504, 1244074.597821325, 594366.0333105773, 218670.74457973242, 154077.49604947865, 80522.84698942304, 41371.39799979329, 23998.115089908242, 26146526.03191553 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=error
band=%{x}
value=%{y}", "legendgroup": "error", "marker": { "color": "#E45756", "pattern": { "shape": "" } }, "name": "error", "offsetgroup": "error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 25521.916871979833, 188212.15014475584, -145117.8715492934, -42726.257344305515, 22532.06243865192, 21591.79951350391, -27723.402178674936, -3263.9666894227266, 12016.744579732418, 12659.49604947865, 7051.846989423037, 1308.397999793291, 3053.115089908242, 75171.03191553056 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=rel_error
band=%{x}
value=%{y}", "legendgroup": "rel_error", "marker": { "color": "#72B7B2", "pattern": { "shape": "" } }, "name": "rel_error", "offsetgroup": "rel_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 0.015780443780504968, 0.046884522839971396, -0.01899944704792592, -0.008878428711755674, 0.0074933329205067465, 0.008208495717417063, -0.021798589224605588, -0.005461517476403003, 0.058149102266263505, 0.08951827949397283, 0.09598136665382309, 0.03265851283711382, 0.14576820672753604, 0.0028832805934149016 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=abs_error
band=%{x}
value=%{y}", "legendgroup": "abs_error", "marker": { "color": "#54A24B", "pattern": { "shape": "" } }, "name": "abs_error", "offsetgroup": "abs_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 25521.916871979833, 188212.15014475584, 145117.8715492934, 42726.257344305515, 22532.06243865192, 21591.79951350391, 27723.402178674936, 3263.9666894227266, 12016.744579732418, 12659.49604947865, 7051.846989423037, 1308.397999793291, 3053.115089908242, 75171.03191553056 ], "yaxis": "y" }, { "alignmentgroup": "True", "hovertemplate": "_variable=abs_rel_error
band=%{x}
value=%{y}", "legendgroup": "abs_rel_error", "marker": { "color": "#EECA3B", "pattern": { "shape": "" } }, "name": "abs_rel_error", "offsetgroup": "abs_rel_error", "orientation": "v", "showlegend": true, "textposition": "auto", "type": "bar", "x": [ "12_570.0_to_15_000.0", "15_000.0_to_20_000.0", "20_000.0_to_30_000.0", "30_000.0_to_40_000.0", "40_000.0_to_50_000.0", "50_000.0_to_70_000.0", "70_000.0_to_100_000.0", "100_000.0_to_150_000.0", "150_000.0_to_200_000.0", "200_000.0_to_300_000.0", "300_000.0_to_500_000.0", "500_000.0_to_1_000_000.0", "1_000_000.0_to_inf", "12_570.0_to_inf" ], "xaxis": "x", "y": [ 0.015780443780504968, 0.046884522839971396, 0.01899944704792592, 0.008878428711755674, 0.0074933329205067465, 0.008208495717417063, 0.021798589224605588, 0.005461517476403003, 0.058149102266263505, 0.08951827949397283, 0.09598136665382309, 0.03265851283711382, 0.14576820672753604, 0.0028832805934149016 ], "yaxis": "y" } ], "layout": { "annotations": [ { "showarrow": false, "text": "Source: PolicyEngine UK tax-benefit microsimulation model (version 2.16.0)", "x": 0, "xanchor": "left", "xref": "paper", "y": -0.2, "yanchor": "bottom", "yref": "paper" } ], "barmode": "group", "font": { "color": "black", "family": "Roboto Serif" }, "height": 600, "images": [ { "sizex": 0.15, "sizey": 0.15, "source": "https://raw.githubusercontent.com/PolicyEngine/policyengine-app/master/src/images/logos/policyengine/blue.png", "x": 1.1, "xanchor": "right", "xref": "paper", "y": -0.2, "yanchor": "bottom", "yref": "paper" } ], "legend": { "title": { "text": "_variable" }, "tracegroupgap": 0 }, "margin": { "b": 120, "l": 120, "r": 120, "t": 120 }, "modebar": { "activecolor": "#F4F4F4", "bgcolor": "#F4F4F4", "color": "#F4F4F4" }, "paper_bgcolor": "#F4F4F4", "plot_bgcolor": "#F4F4F4", "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "white", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "white", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "#C8D4E3", "linecolor": "#C8D4E3", "minorgridcolor": "#C8D4E3", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "#C8D4E3", "linecolor": "#C8D4E3", "minorgridcolor": "#C8D4E3", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "white", "showlakes": true, "showland": true, "subunitcolor": "#C8D4E3" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "white", "polar": { "angularaxis": { "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "" }, "bgcolor": "white", "radialaxis": { "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" }, "yaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" }, "zaxis": { "backgroundcolor": "white", "gridcolor": "#DFE8F3", "gridwidth": 2, "linecolor": "#EBF0F8", "showbackground": true, "ticks": "", "zerolinecolor": "#EBF0F8" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" }, "baxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" }, "bgcolor": "white", "caxis": { "gridcolor": "#DFE8F3", "linecolor": "#A2B1C6", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "#EBF0F8", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "#EBF0F8", "linecolor": "#EBF0F8", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "#EBF0F8", "zerolinewidth": 2 } } }, "width": 800, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "gridcolor": "#F4F4F4", "title": { "text": "band" }, "zerolinecolor": "#F4F4F4" }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "gridcolor": "#F4F4F4", "title": { "text": "value" }, "zerolinecolor": "#616161" } } } }, "metadata": {}, "output_type": "display_data" } ], "source": [ "incomes = efrs_loss[loss_results.type == \"Income\"]\n", "incomes[\"band\"] = incomes.name.apply(\n", " lambda x: \"_\".join(x.split(\"band_\")[1].split(\"_\")[1:])\n", ")\n", "incomes[\"count\"] = incomes.name.apply(lambda x: \"count\" in x)\n", "incomes[\"variable\"] = incomes.name.apply(\n", " lambda x: x.split(\"_income_band\")[0].split(\"_count\")[0].split(\"hmrc/\")[-1]\n", ")\n", "\n", "variable = \"employment_income\"\n", "count = True\n", "variable_df = incomes[\n", " (incomes.variable == variable) & (incomes[\"count\"] == count)\n", "]\n", "\n", "fig = px.bar(\n", " variable_df,\n", " x=\"band\",\n", " y=[\n", " \"target\",\n", " \"estimate\",\n", " \"error\",\n", " \"rel_error\",\n", " \"abs_error\",\n", " \"abs_rel_error\",\n", " ],\n", " barmode=\"group\",\n", " color_discrete_sequence=px.colors.qualitative.T10,\n", ")\n", "format_fig(fig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Everything looks healthy here! We've got a dataset that's close to reality, and we can have confidence in our tax-benefit model." ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 2 }