Quantcast
Channel: cross-validation – Win-Vector Blog
Viewing all articles
Browse latest Browse all 13

Cross-Methods are a Leak/Variance Trade-Off

$
0
0

We have a new Win Vector data science article to share:

Cross-Methods are a Leak/Variance Trade-Off

John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC)

March 10, 2020

We work some exciting examples of when cross-methods (cross validation, and also cross-frames) work, and when they do not work.

Abstract

Cross-methods such as cross-validation, and cross-prediction are effective tools for many machine learning, statisitics, and data science related applications. They are useful for parameter selection, model selection, impact/target encoding of high cardinality variables, stacking models, and super learning. They are more statistically efficient than partitioning training data into calibration/training/holdout sets, but do not satisfy the full exchangeability conditions that full hold-out methods have. This introduces some additional statistical trade-offs when using cross-methods, beyond the obvious increases in computational cost.

Specifically, cross-methods can introduce an information leak into the modeling process. This information leak will be the subject of this post.

The entire article is a JupyterLab notebook, and can be found here. Please check it out, and share it with your favorite statisticians, machine learning researchers, and data scientists.


Viewing all articles
Browse latest Browse all 13

Trending Articles