How SherlockPicks Builds Its Prediction Models
A high-level look at how SherlockPicks trains machine learning models on historical game data, selects the best model per sport, and generates calibrated probability estimates.
Overview: what kind of system is this?
SherlockPicks is a data-driven prediction system built on gradient boosting machine learning models trained on years of historical sports data. It is not a handicapper making gut-feel picks, not a crowd-sourced consensus tool, and not a statistical regression on win totals. It is a probability estimation engine trained to identify when sportsbook prices diverge from true outcome probabilities.
Feature engineering
Raw game data becomes meaningful signals through feature engineering. Key feature groups include:
- Recent form: Rolling win rates, point differentials, offensive and defensive efficiency over [3, 7, 10, 20, 30]-game windows
- Schedule context: Rest days since last game, back-to-back flags, 3-in-4 fatigue, days of road trip
- Travel and geography: Haversine distance between venues, time zone crossings, flight direction
- Head-to-head history: Rolling win rates for this specific matchup over the last 10 meetings
- Lineup quality: Starting pitcher ERA and WHIP (MLB), roster weighted averages for NBA/NHL
- Historical injury data: Weighted injury scores (OUT = 1.0, Questionable = 0.4) per team
- Odds-derived features: Opening and closing market implied probability, juice source
Model training: walk-forward backtesting
Models are trained using walk-forward cross-validation — the same method used in quantitative finance. Each "fold" trains on past games and validates on the next period's games, simulating real-world deployment. This prevents data leakage (the model never trains on outcomes it would not have seen at prediction time) and produces realistic performance estimates.
We test four model types on every sport and target:
- LightGBM (gradient boosting)
- XGBoost (gradient boosting)
- CatBoost (gradient boosting)
- Logistic Regression (baseline)
The model with the best composite score across log-loss, Brier score, AUC, and variance is selected as the active model for that sport/target combination.
Probability calibration
A model that says "60% probability" needs to actually win 60% of the time to be useful — not 55% or 70%. This property is called calibration. After backtesting, we apply a calibration step (isotonic regression, Platt scaling, or temperature scaling depending on which performs best) to align model probabilities with observed frequencies.
Calibrated probabilities are what feed into EV and edge calculations. Without calibration, EV estimates would be unreliable even if the model correctly ranks outcomes.
From probabilities to recommendations
Once a calibrated probability is produced, it is compared to the market's implied probability derived from current market odds. If the gap (edge) exceeds a minimum threshold and the resulting EV is positive, the bet is flagged as a recommendation. Stake sizing follows a fractional Kelly formula capped at 1.5 units.
The Performance page shows the real, tracked outcome of all recommendations — not in-sample backtest results — so you can verify the system is performing as described.
Frequently Asked Questions
SherlockPicks tests LightGBM, XGBoost, CatBoost, and Logistic Regression on each sport and market. The best-performing model (by composite score of log-loss, Brier score, AUC, and fold variance) is selected as the active model. The winning model changes per sport and target — there is no single "one size fits all" algorithm.
Walk-forward backtesting trains a model on historical games up to a point in time, then evaluates it on the next period — simulating exactly what would happen in deployment. It prevents data leakage (the model never trains on future outcomes) and produces realistic performance estimates. Standard k-fold cross-validation does not do this and inflates results.
The pipeline collects game results, box scores, schedules, confirmed lineups, current market odds, and situational context (rest days, travel, weather for outdoor venues) across all covered sports. All inputs are refreshed daily before predictions are generated.
A calibrated model is one where a stated 60% probability actually wins 60% of the time. Without calibration, EV calculations are unreliable — the model might correctly rank outcomes but systematically overstate or understate probabilities. SherlockPicks applies isotonic regression, Platt scaling, or temperature scaling (whichever performs best on each sport) to ensure probabilities are accurate before computing EV.
See EV and edge live
SherlockPicks calculates all of this automatically for every game, every day.