Quantcast
Channel: cross-validation – Win-Vector Blog
Browsing latest articles
Browse All 13 View Live

Image may be NSFW.
Clik here to view.

On Nested Models

We have been recently working on and presenting on nested modeling issues. These are situations where the output of one trained machine learning model is part of the input of a later model or...

View Article



Image may be NSFW.
Clik here to view.

vtreat cross frames

vtreat cross frames John Mount, Nina Zumel 2016-05-05 As a follow on to “On Nested Models” we work R examples demonstrating “cross validated training frames” (or “cross frames”) in vtreat. Consider the...

View Article

Image may be NSFW.
Clik here to view.

Laplace noising versus simulated out of sample methods (cross frames)

Nina Zumel recently mentioned the use of Laplace noise in “count codes” by Misha Bilenko (see here and here) as a known method to break the overfit bias that comes from using the same data to design...

View Article

Image may be NSFW.
Clik here to view.

A Theory of Nested Cross Simulation

[Reader’s Note. Some of our articles are applied and some of our articles are more theoretical. The following article is more theoretical, and requires fairly formal notation to even work through....

View Article

Image may be NSFW.
Clik here to view.

Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes...

View Article


Image may be NSFW.
Clik here to view.

When Cross-Validation is More Powerful than Regularization

Regularization is a way of avoiding overfit by restricting the magnitude of model coefficients (or in deep learning, node weights). A simple example of regularization is the use of ridge or lasso...

View Article

PyData Los Angeles 2019 talk: Preparing Messy Real World Data for Supervised...

Video of our PyData Los Angeles 2019 talk Preparing Messy Real World Data for Supervised Machine Learning is now available. In this talk describe how to use vtreat, a package available in R and in...

View Article

Python Data Science Tip: Don’t use Default Cross Validation Settings

Here is a quick, simple, and important tip for doing machine learning, data science, or statistics in Python: don’t use the default cross validation settings. The default can default to a...

View Article


Cross-Methods are a Leak/Variance Trade-Off

We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting...

View Article


Use the Same Cross-Plan Between Steps

Students have asked me if it is better to use the same cross-validation plan in each step of an analysis or to use different ones. Our answer is: unless you are coordinating the many plans in some way...

View Article
Browsing latest articles
Browse All 13 View Live




Latest Images