Results#
TL;DR: What are the effects of different microdata adjustment methods on the accuracy of the microdata in reproducing official statistics? Findings (on the UK Family Resources Survey 2018/19) in the table below.
Show code cell source
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter(action="ignore")
df = pd.DataFrame(
    {
        "Adjustment": [
            "None",
            "Percentile matching (all)",
            "Percentile matching (pensioner/non-pensioner split)",
            "Percentile matching (dividends only)",
            "Gradient descent-based reweighting",
            "SPI RF imputation + reweighting",
        ],
        "Loss": [
            1.0,
            1.039242148399353,
            1.0089750289916992,
            0.9986706972122192,
            0.40872600000000003,
            0.12000000000000001,
        ],
    }
)
df["Loss change"] = [
    f"{x:+.2%}" if x != 0 else f"{x:.2%}"
    for x in (df["Loss"] / df["Loss"].iloc[0]) - 1
]
df = df.sort_values("Loss", ascending=False).drop(columns=["Loss"])
df.style.hide_index().set_properties(
    **{"font-weight": "bold"}, subset=pd.IndexSlice[[0], :]
)
| Adjustment | Loss change | 
|---|---|
| Percentile matching (all) | +3.92% | 
| Percentile matching (pensioner/non-pensioner split) | +0.90% | 
| None | 0.00% | 
| Percentile matching (dividends only) | -0.13% | 
| Gradient descent-based reweighting | -59.13% | 
| SPI RF imputation + reweighting | -88.00% |