Test data isn’t the best validation for your model
On a crisp autumn morning, Sarah, a meteorologist, proudly unveiled her new weather forecasting model. She had meticulously tested it using data from the past decade, and the results were impressive.
Until she looked outside, surprised by a sudden glowing white — she saw snow, early snow brought in by an unexpected change in climate patterns. A different environment she hadn’t foreseen, and of course, neither had her model, optimized in one environment.
We love to build data science and machine learning models by diligently cutting out test data sets, then training and creating models, and finally testing them on the test data to measure performance.
But what we’re optimizing for with that is only one certain situation, one given environment, and that specific time period.
Instead, I think great models should be:
- Timeless
- and universal.
Timeless in the sense that they should work on many different periods of time in much the same way. Universal in that they should work in different kinds of contexts, industries, and environments. If I have a recommendation engine and it works on my book selection, it should also work on my DVD selection! If not, I should build a better model that works on both.