Sunday, September 20, 2015

Be Suspicious of NYC Restaurant Health Ratings

Whether you pay serious attention to them, or don't, each restaurant in New York City has a grade posted outside indicating its health grade. Among the 24639 restaurants in the recently published NYC Restaurant Inspection Result Dataset, nearly 80% of the restaurants have been awarded an "A" safety rating, 15% have been awarded an "B" and 5% have gotten "C" or worse.

However, should New Yorkers really trust these health ratings? Ben Wellington of I Quant NY made an compelling case showing that health inspection scores, which help classify restaurants into grades (score of 0-13 is an "A", 14-27 is a "B", 28+ is a "C"), suffer from the "bumping up" syndrome, meaning that restaurants on the cusp of a higher grade tend to be bumped up to the higher grade.
Furthermore, health inspections are made on a random basis annually, meaning that health grades only represent safety conditions in the past year.

Sadly, an examination of the history of health inspection grades of New York restaurants suggest an inconvenient truth - many restaurants considered "safe" today have not always been "safe" in the past. This may mean either that restaurants improve their sanitary conditions significantly after a health inspection, or that inspectors tend to "bump up" restaurant grades from one year to the next.

As seen in the graph above, a majority of restaurants that were rated "B" and "C" in their infancy end up becoming grade "A" restaurants - at an astonishing 72% and 65% respectively. From the opposite point of view, 25% and 15% of restaurants which are now grade "A" restaurants actually started off at a grade "B" and "C" sanitary level. Given that most restaurants in NYC do not have a very long life span (80% of NYC restaurants close after 5 years) and that this grading system was only formally established 5 years ago, having so many restaurant grades increase in such a short time leads us to question whether the letters at the front of every NYC eatery are truly reliable.

Monday, September 14, 2015

Are Women's Tennis Rankings More Volatile than Men's Rankings?
Serena Williams' shocking loss in the 2015 US Open semi-finals left number 26 and number 43 in the world to face off in the finals. Meanwhile, the finals on the men's' side was populated by number 1 and number 2, Novak Djokovic and Roger Federer, respectively. That, in combination with the fact that many female players now far off-the-radar, such as Ivanovic and Jankovic, have been former world number 1, led us back to the question - how volatile are women's' rankings in comparison to men's?

Using weekly ATP and WTA weekly rankings data from Jeff Sackmann's Github, we analyze the variance of the rankings of players currently in the top 30. We also exclude rankings data outside of the top 100 to minimize the variance impact of when these players first became professional, which is not indicative of their pro performance.

Looking at the WTA rank variance of the current top 30 players, we see that as expected, strong players like Serena Williams and Maria Sharapova who rank consistently at the top (excluding injuries) have low mean rank and low rank variance. For mid-tier players, such as Sam Stosur and Roberta Vinci, the variance on the whole becomes much higher.

Smaller circles, which indicate newer players with fewer weeks in the top 100 under their belts - such as Sloane Stephens and Petra Kvitova - have markedly higher mean and variance than the "power cluster" of consistent, top players. However, there are many mid-tier players with many weeks in the top 100, but still large variance and average rank. For newcomers, their ranking behavior is still yet to be determined - they could join either the consistent top players or the varying mid-tier players.

The graph of the top 30 ATP players show that ranking means are similar across men and women, ranking from 5 to 55. However, the variance is lower for men on the whole. Similar to the WTA results, small circles indicating newcomers generally trend to the right and the top of the graph, meaning higher variance and rank. This is due to the fact that these players undergo a lot of ranking movement when they first go pro, which is not indicating of their long-run ranking behavior.

Again, like in the women's results, men's ranking behavior breaks into two camps: the consistent, top players like Roger Federer and Rafael Nadal, and the mid-tier players who vary more, such as David Ferrer and Philipp Kohlschreiber. One surprise is that Novak Djokovic has such a low average rank but such a high ranking variance - Djokovic has sharp rises in the rankings, and variance penalizes that over small incremental increases.

Finally, looking at the graph of WTA rank variance vs ATP rank variance over the years with regards to the current top 30 players, we see that WTA is significantly higher than ATP variance. This is mostly attributed to periods of extreme variance exhibited by certain players, such as Maria Kirilenko and Jamie Hampton. On the whole, however, looking at the individual variances of the top 30 players, women do have higher rank variance than do men.

Saturday, September 12, 2015

The Odds of an All-Italian US Open Final Were Less than 1%

The semi finals of the women's US Open produced two monumental upsets. Just like in the men's final four last year, where two significantly lower ranked players upset the top two seeds, Flavia Pennetta dismissed Simona Halep in straight sets, and Roberta Vinci came for a set down to deny Serena Williams from achieving the first Grand Slam since Steffi Graf in 1988.

FiveThirtyEight declared Serena William's loss as the greatest loss of all time, according to the current Elo Ratings of Williams and Vinci at the time of their semi final match up. That said, we wanted to measure this upset in a probabilistic manner. How likely was it that both Italian players upset the top seeds?

To answer this question, we refer to our tennis prediction model, which also uses an Elo-Rating style metric to calculate a player's ability. However, our system only incorporates matches played in the past year and head-to-head matches between players in the past 5 year. Our method also places more emphasis on detailed tennis metrics such as sets and games won in each match, the court surface being played on, and the stage and quality of tournaments being played. This allows our model to make accurate predictions for any tournament at any given time.

The table above represents each of the semi-finalists chance of reaching each stage of the tournament (Finalist or Winner) before the Friday matches. Notice that the Italians only had a 21% and 3.7% chance of winning their matches. Further analysis of past tennis data (from 1968 to 2015) suggested that semi final match outcomes are essentially independent of each other. Probabilistically, the chance that the second match would be an upset is the same as the chance that the second match would be an upset given that the first match was an upset. Thus, the probability that an All-Italian US Open final would have occurred is 21% x 3.7% = 0.8%.

In terms of who would win the final tomorrow, betting odds have declared Flavia Pennetta a 4/9 favorite, or an implied winning probability of 69.3%. Our model suggests otherwise, declaring Pennetta as merely 54.7%. As our model places more weighting on later stage matches and strength of opponent, Vinci's ability improved much more than Pennetta's, as Williams has a significantly higher rating than the rest of the field. Thus, despite being ranked over 15 places higher than Vinci, Pennetta is only a slight favorite in this final matchup. You can essentially treat this final as a toss up.

Stay tuned for our preview of the men's final on Sunday.

Tuesday, September 8, 2015

Study of the VIX

There is plenty of analysis out there about the stock market. Much of this analysis is on an intraday basis, analyzing how individual stocks moves based on oil prices or geopolitical turmoil. Sometimes these explanations have an obvious correlation to the markets; other times, these explanations are nothing more than educated guesses.

Instead, we're interested in long-term trends. This week, we study the relationship between the VIX index and the S&P 500.

VIX Index 
The VIX index is primarily used as a representation of the market's expectations of the 30-day volatility of the stock market, expressed in percentage points. Specifically, the VIX is 100 times the square root of the expected 30-day variance of the S&P 500 rate of return.
\text{VIX} = 100 \sqrt{\text{var}}
Where $\text{var}$ is annualized expected 30-day variance. The expected 30-day variance is estimated by the forward price of S&P 500 options with 30 days to expiration, $e^{rt}S$  where $S$ is the spot price. The forward prices of S&P 500 options represent the market's risk-neutral expectation of the variance of the underlying.

No arbitrage pricing says that the forward price of variance must equal the forward price of its replicating portfolio. Since holding forward positions in a portfolio do not contribute value to the portfolio at the present, the forward price of variance must equal the forward price of the options. If 30-day options are not available, the VIX is calculated using a weighted average of forward prices of options with expirations close to 30 days.

We can see that the VIX follows the general shape of the S&P 500's forward 30-day volatility, but with a lag of a few days. This indicates that the VIX is good at determining the level of volatility in the next 30 days, but not at predicting large changes in volatility. Moreover, for high levels of S&P 500 forward volatility, such as in the beginning of October 2008 following Lehman's bankruptcy and preceding several DJIA increases and declines, the VIX seems to underestimate the level of volatility in the next 30 days. Generally, however, the VIX seems to remain above the actual S&P 500 volatilities.

The difference between the VIX and the historical S&P 500 volatilities shows points in time where the VIX is significantly lower. These include the end of September 2008 and the beginning of October 2008, which as we mentioned before, included the worst of the financial crisis. These low VIX points also include the end of April 2010, which preceded the May 6 "Flash Crash", a trillion-dollar stock market crash that lasted just minutes. Another dip in the VIX compared to the S&P 500 was at the end of July 2011, which preceded an August 2011 stock market crash due to a US credit downgrade. The last VIX dip in the graph is due to the recent China crisis. These are all points of high S&P volatility in the first graph that the VIX severely underestimates. 

Sunday, September 6, 2015

Stanimal's Title Chances are Worse than You Think

The first week of this year's US Open has been tumultuous - top 10 players Kei Nishikori, David Ferrer, Rafael Nadal and Milos Raonic have all crashed out, and the tournament has had a record 16 retirements. In particular, Jack Sock and David Goffin were leading their matches, only to succumb to the extreme heat and humidity.

Despite all the unpredictability, the two Swiss contenders, Roger Federer and Stan Wawrinka, have reached the second week of the tournament in contrasting fashions. Federer seems to be enjoying himself, toying his opponents with his flashy shot making and half-volley returns, while Wawrinka has somehow escaped from close tiebreak situations, including a seemingly lackluster effort in his match against Ruben Bemelmans.

With that, we were interested in what our tennis prediction model says about the chances of Federer and Wawrinka ending their tournament at each round and compare it to betting odds. Not surprisingly, our odds are fairly similar to the ones provided on betting websites. However, we believe that Wawrinka's chances of ending his run at the QFs are higher (59%) than betting websites (52%). As our model places emphasis on the closeness of each match, the fact that Wawrinka played more tiebreaks, even though he has not lost a set in this tournament, lowers his prospect of reaching the later stages of the US Open. As a result, our odds of him reaching the semi finals, final and winning are significant lower.

On the other hand, our odds for Roger Federer is in line with betting companies, as a result of his masterclass displays in each of his three matches. In fact, our prospects of him losing before the finals is significantly lower than the betting probabilities.

To look at prospects of other remaining players reaching different stages of the tournament, check out our results below. Stay tuned for more updates in the middle of the week.

Tuesday, September 1, 2015

Nishikori's Early Exit does not Improve Djokovic's Title Chance

Upon the conclusion of the US Open's first round matches, many would believe that Kei Nishikori's early exit will open up the draw for Novak Djokovic and improve his title prospects. However, our tennis prediction model suggests that Djokovic's chances of winning remains level at around 55%. Similarly, Federer and Murray's chances stay the same at around 25-26% and 8-9% respectively.

Ultimately, the reason why Djokovic's prospects haven't changed is that he is still likely to face Nadal in the quarterfinals, and Federer or Murray in the final. Furthermore, the top three players in the world (Djokovic, Federer and Murray)  have a combined 90% chance of winning the tournament, while Nishikori's prospect prior to Monday was a mere 3.8%. Should Federer or Murray have suffered an upset in the first round, Djokovic's title chances would have definitely skyrocketed.

Nishikori's early exit also raises an interesting question - which player will emerge from that quarter? Our model suggests that Marin Cilic, the reigning US Open Champion, and not David Ferrer, the highest seed left in that quarter, has the highest odds. This may to due to the fact that Ferrer has not won a match since Roland Garros, and Cilic had recently reached the quarterfinals of Wimbledon.

Despite ranking outside the world's top 40, Benoit Paire has a decent chance (6.4%) of reaching the semi finals. As our model rewards players who pull off upsets, Benoit Paire's rating increased greatly after his first round win over Nishikori, making him the fifth favorite player to come out of this quarter of the draw. Also look out for dark horse Jo-Wilfried Tsonga - while he has dropped to as low as 18th in the world, his semi-final odds are only a tad lower than Ferrer's as a result of his strong showing in Montreal.