How to Evaluate a Sports Betting Service's Track Record
Win rate alone is a misleading metric. Learn how to properly evaluate any picks service using ROI, sample size, and out-of-sample backtesting — including how SherlockPicks measures its own performance.
Why win rate is a misleading metric
The most commonly quoted metric for sports betting services is win percentage — "We hit 64% of our picks!" But win rate in isolation tells you almost nothing. A 64% win rate on a slate of heavy favourites (all priced at −250+) can easily be a losing strategy because you need to win 71%+ of those bets just to break even. A 52% win rate on spread bets at standard −110 odds is genuinely profitable.
The metric that actually matters is ROI: return on investment, calculated as total profit divided by total amount wagered. It accounts for both the win rate and the payout structure simultaneously.
What counts as a good ROI?
Context matters enormously:
- −5% or worse: Normal for recreational bettors paying vig over time
- 0% to +5%: Breaking even or slight edge — better than most people
- +5% to +15%: Good. What professional sports bettors average on a full season
- +15% to +40%: Excellent. Achievable with EV selectivity and discipline, involves variance
- +40%+ over a large sample: Exceptional, verify the sample size carefully
Sample size: the most underrated factor
A 100-bet sample has a standard deviation of roughly ±10% ROI just from randomness. That means a strategy producing +8% true ROI could easily show −2% to +18% over 100 bets by chance alone. To reach statistical significance at 95% confidence, most analysts require 500+ bets at standard odds. Any service citing ROI over fewer than 200–300 bets should be viewed with skepticism.
In-sample vs out-of-sample results
This is the most important distinction when evaluating any prediction service:
- In-sample (backtest) results: Performance on the historical data used to build the model. Always looks better than reality because the model was optimised on that data. Backtests should be viewed as an upper bound, not an expected result.
- Out-of-sample (live) results: Performance on games the model had never seen. This is the only number that tells you if the edge is real.
SherlockPicks uses walk-forward backtesting to approximate out-of-sample conditions during development, and tracks live recommendations separately on the Performance page. The two numbers should be compared — a large gap between backtest and live performance signals overfitting.
Red flags when evaluating any picks service
- Win percentage quoted without context of the odds or market type
- Results over fewer than 100 bets presented as proof of edge
- Cherry-picked date ranges ("We were 18–7 in March!") without showing other periods
- No distinction between in-sample backtests and live tracked picks
- Inconsistent unit sizes that make losing bets smaller and winning bets larger
- No tracking of the odds available at the time of the pick
How SherlockPicks tracks its own performance
Every recommendation is logged with the odds available at the time it was generated, the stake in units, and the outcome after the game resolves. The Performance page shows real, unfiltered ROI across every sport and market — with no cherry-picking. Walk-forward fold metrics and calibration data are available in the admin analytics hub for members who want to dig deeper into model performance by period.
If the model stops generating positive ROI on a sport, that is surfaced immediately — not hidden. The point is finding real edge, not selling a story.
Frequently Asked Questions
Win percentage does not account for the odds. A 64% win rate on heavy favourites priced at −250 is a losing strategy because you need 71%+ just to break even. A 52% win rate on standard −110 spread bets is profitable. ROI (total profit / total wagered) is the correct metric because it accounts for both win rate and payout structure.
At standard −110 odds, the standard deviation of ROI over 100 bets is roughly ±10% from randomness alone. To reach statistical significance at 95% confidence, you typically need 300–500+ bets. Any service citing strong ROI over fewer than 200 bets is showing you noise, not signal.
Backtests are run on historical data the model was built on — they always look better than reality because the model was optimised on that data. Live results track picks the model had never seen before the bet was placed. The live ROI is the only number that matters. A large gap between backtest and live performance usually indicates overfitting.
Every recommendation is logged with the odds available at generation time, the unit stake, and the settled outcome. The Performance page shows real, unfiltered ROI across all sports and markets — no cherry-picking, no selective date ranges. Walk-forward fold metrics are also visible in the analytics hub for deeper analysis.
See EV and edge live
SherlockPicks calculates all of this automatically for every game, every day.