Purged K-Fold Cross-Validation
A cross-validation scheme that removes overlapping training samples to prevent look-ahead leakage.
Definition
Standard K-Fold cross-validation is unsafe in finance because training samples often overlap with validation samples in time — a model can 'see the future' through correlated features. Purged K-Fold, introduced by Marcos López de Prado in 'Advances in Financial Machine Learning' (2018), purges any training sample whose label window overlaps the validation fold, then optionally embargoes additional samples to break serial correlation. The result is a far more honest estimate of out-of-sample skill.
Formula
For each fold k:
- Validation set V_k = fold k
- Training set T_k = all data except V_k AND
except samples whose label window
overlaps with V_k (purge)
AND optionally except samples within
embargo period after V_kWorked example
Predicting 20-day forward returns. A standard 5-fold CV would let a sample dated 2024-06-01 (training) overlap with its 20-day label running into a validation fold starting 2024-06-15. Purged K-Fold removes the 2024-06-01 sample from training, restoring temporal independence.
How ARIA Analyst uses it
ARIA uses Purged K-Fold + embargo in all internal model validation and reports the gap between in-sample and out-of-sample metrics as the primary overfitting indicator.
Related terms
Walk-Forward Analysis
A backtesting procedure that retrains the model on a rolling window and tests on the next out-of-sample period.
LightGBM
A fast gradient-boosted decision tree framework from Microsoft, dominant on tabular financial data.
XGBoost
Extreme Gradient Boosting — the original modern GBDT library, slightly slower but very robust.
Isotonic Calibration
A non-parametric monotonic transformation that maps raw model scores to well-calibrated probabilities.
See Purged K-Fold Cross-Validation in action on any asset
ARIA Analyst computes Purged K-Fold Cross-Validation automatically as part of a hybrid multi-agent investment report — 5 deterministic scoring agents plus AI augmentation (ML ensemble, Bull vs Bear debate, 10 Deep Search agents on Premium). Get yours in seconds.
Try ARIA Analyst free →