Results#
TL;DR: What are the effects of different microdata adjustment methods on the accuracy of the microdata in reproducing official statistics? Findings (on the UK Family Resources Survey 2018/19) in the table below.
Show code cell source
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter(action="ignore")
df = pd.DataFrame(
{
"Adjustment": [
"None",
"Percentile matching (all)",
"Percentile matching (pensioner/non-pensioner split)",
"Percentile matching (dividends only)",
"Gradient descent-based reweighting",
"SPI RF imputation + reweighting",
],
"Loss": [
1.0,
1.039242148399353,
1.0089750289916992,
0.9986706972122192,
0.40872600000000003,
0.12000000000000001,
],
}
)
df["Loss change"] = [
f"{x:+.2%}" if x != 0 else f"{x:.2%}"
for x in (df["Loss"] / df["Loss"].iloc[0]) - 1
]
df = df.sort_values("Loss", ascending=False).drop(columns=["Loss"])
df.style.hide_index().set_properties(
**{"font-weight": "bold"}, subset=pd.IndexSlice[[0], :]
)
Adjustment | Loss change |
---|---|
Percentile matching (all) | +3.92% |
Percentile matching (pensioner/non-pensioner split) | +0.90% |
None | 0.00% |
Percentile matching (dividends only) | -0.13% |
Gradient descent-based reweighting | -59.13% |
SPI RF imputation + reweighting | -88.00% |