Results#

TL;DR: What are the effects of different microdata adjustment methods on the accuracy of the microdata in reproducing official statistics? Findings (on the UK Family Resources Survey 2018/19) in the table below.

Hide code cell source
import pandas as pd
import warnings

warnings.filterwarnings("ignore")
warnings.simplefilter(action="ignore")

df = pd.DataFrame(
    {
        "Adjustment": [
            "None",
            "Percentile matching (all)",
            "Percentile matching (pensioner/non-pensioner split)",
            "Percentile matching (dividends only)",
            "Gradient descent-based reweighting",
            "SPI RF imputation + reweighting",
        ],
        "Loss": [
            1.0,
            1.039242148399353,
            1.0089750289916992,
            0.9986706972122192,
            0.40872600000000003,
            0.12000000000000001,
        ],
    }
)

df["Loss change"] = [
    f"{x:+.2%}" if x != 0 else f"{x:.2%}"
    for x in (df["Loss"] / df["Loss"].iloc[0]) - 1
]
df = df.sort_values("Loss", ascending=False).drop(columns=["Loss"])

df.style.hide_index().set_properties(
    **{"font-weight": "bold"}, subset=pd.IndexSlice[[0], :]
)
Adjustment Loss change
Percentile matching (all) +3.92%
Percentile matching (pensioner/non-pensioner split) +0.90%
None 0.00%
Percentile matching (dividends only) -0.13%
Gradient descent-based reweighting -59.13%
SPI RF imputation + reweighting -88.00%