Saturday, July 11, 2015
How DataBucket's Wimbledon Model Can Be Better
This year's Wimbledon tournament culminates tomorrow in a final blockbuster showdown between Novak Djokovic and Roger Federer. Over the past two weeks, we have developed a probability model that determines the odds of each player reaching each stage of the tournament (these odds were updated after every round). Now with two competitors remaining, our model claims that Novak Djokovic has a 63.8% chance of beating Roger Federer.
But should you trust our model's results?
Upon inspection of my model assumptions, there are areas where our probability model can be improved.
Match Scores are Not Always an Accurate Indicator of Current Form
Here's one of the key concerns: while our model accounts for how close a match was (e.g. winners in straight sets are rewarded more than winners in 5 sets), match scores are not always a true indicator of form or current ability. Roger Federer was rewarded significantly for beating Andy Murray in straight sets, especially since he only had a 55% chance of winning, but in my opinion he should be awarded more. Federer faced only one break point in the entire match (and that was in the first game). He hit 20 aces, had over 5 winners for every unforced error he made, and won over 80% of the time when serving at 30-30 or deuce. Legends of the games proclaimed it as one of the best serving performances ever witnessed. Even Federer acknowledged this match as "definitely one of the best matches I've played in my career."
Likewise, Andy Murray should not be penalized as heavily for losing this match in straight sets. He hit over 2 winners for every unforced error he made (the average for the tournament was 1.5). He served a respectable 12 aces compared to only 1 double fault. And he managed to stay with Federer to the end of each set, only for his opponent to step up a gear. This is not a demoralizing defeat on Murray, but rather a performance many would call a valiant effort. As Sports Illustrated cited in their live blog, "So good. Too good. Too, too good from Roger Federer." My model can be improved by incorporating some of these detailed match statistics, but how much they should influence these probabilities is very much up for debate.
A Career-Defining Win Can Go Many Ways
We all would probably agree that this match is one of the highlights of Roger Federer's already illustrious career. We can classify this match as one of his career-defining matches. But such a strong performance from Federer can go either way. He may gain plenty of momentum from this performance and play superbly against Djokovic in the final. Or he may expend too much energy and suffer from mental or physical fatigue and fall to the steady Serb. This is what our model also lacks - the ability to capture a player's reaction to a career-defining win. Will a player succumb to the pressures of playing the match of his life in the next match, like Lukas Rosol after beating Djokovic in 2012 Wimbledon or Kei Nishikori after beating Djokovic in 2014 US Open? Or will a player rise to the occasion, gain confidence and play at a much higher level after a career-changing win, like Robin Soderling in 2009 French Open, or Stanislas Wawrinka in 2014 Australian Open?
For these reasons, while listing Djokovic as a 63.8% favorite seems reasonable to most, there are just many factors in tennis that are difficult to quantify. DataBucket will continue to try to incorporate as many of these factors as possible, especially as the US Open is just around the corner.