py-statmatch Documentation

py-statmatch

```{include} ../README.md :start-after: # py-statmatch :end-before: ## Installation


```{toctree}
:hidden:
:maxdepth: 2

getting-started
api-reference
examples/index
methodology
contributing
changelog

What is Statistical Matching?

Statistical matching, also known as data fusion or synthetic data matching, is a technique used to integrate information from different data sources that share some common variables but have no or few units in common. This is particularly useful when:

Key Features

```{include} ../README.md :start-after: ## Features :end-before: ## Installation


## Quick Example

Here's a simple example of using py-statmatch for nearest neighbor matching:

```python
import pandas as pd
from statmatch import nnd_hotdeck

# Donor data has variables X and Y
donor_data = pd.DataFrame({
    'age': [25, 30, 35, 40],
    'income': [30000, 45000, 55000, 65000],
    'satisfaction': [7, 8, 6, 9]  # This will be donated
})

# Recipient data has only X variables
recipient_data = pd.DataFrame({
    'age': [28, 33, 42],
    'income': [35000, 50000, 70000]
})

# Perform matching
result = nnd_hotdeck(
    data_rec=recipient_data,
    data_don=donor_data,
    match_vars=['age', 'income']
)

# Create fused dataset
fused = recipient_data.copy()
fused['satisfaction'] = donor_data.iloc[result['noad.index']]['satisfaction'].values

Next Steps