Skip to article frontmatterSkip to article content

Quantile Random Forests

The QRF model harnesses the power of ensemble learning by utilizing multiple decision trees to predict different quantiles of the target variable distribution. This sophisticated approach allows for flexible modeling of complex relationships while providing robust estimates of uncertainty.

Variable type support

QRF seamlessly handles both numerical and categorical variables. For numerical targets, it uses quantile regression forests. For categorical targets (strings, booleans, numerically-encoded categorical variables), it automatically employs a Random Forest Classifier. The model detects variable types automatically and applies the appropriate method internally, requiring no manual specification from users.

How it works

Quantile Random Forests build upon the foundation of random forests by implementing a specialized algorithm from the quantile_forest package. The method begins by constructing an ensemble of decision trees, each trained on different bootstrapped samples of the original data. This process, known as bagging, introduces diversity among the individual trees and helps reduce overfitting.

During training, each tree in the forest predicts the target variable using only a random subset of the available features at each split point. This feature randomization further enhances diversity within the ensemble and improves its ability to capture various aspects of the underlying data relationships.

Key features

The Quantile Random Forest (QRF) imputer provides a robust non-parametric method particularly effective for datasets exhibiting complex, non-linear relationships and heteroscedasticity. Unlike linear models, which rely on strong distributional assumptions, QRF makes minimal assumptions about the underlying data structure, adapting its uncertainty measures to reflect varying levels of variability within different regions of the input data.

QRF’s primary strength lies in its predictive approach. While traditional random forests aggregate predictions into averages, QRF maintains the entire predictive distribution from each tree, directly estimating quantiles based on this empirical distribution. It also quantifies uncertainty through robust prediction intervals derived directly from its quantile estimates. These intervals dynamically adjust across the feature space, effectively signaling areas with varying levels of predictive certainty.

Although QRF typically involves higher computational demands compared to simpler linear models, its enhanced accuracy on datasets with complex, non-linear relationships frequently justifies this trade-off. For applications where accurate predictive performance and meaningful uncertainty quantification are critical, QRF emerges as an especially valuable approach.