We all know that not all tennis players are treated equally by the press. Roger Federer is beloved by the media for his professionalism, class and grace. Tennis writers have always appreciated Novak Djokovic's humor and candor, even though he carried a somewhat arrogant persona in his early years. Serena Williams is adored by tennis journalists as of late as she continues to shatter records in the Open Era, but has been demonized for her controversial behavior in the past.
Here at DataBucket, we seek to quantify and visualize things as much as possible. As much as these media perceptions on tennis players are true, we found it interesting to try to quantify the tennis press' sentiment towards top tennis players over time, with the goal of matching our results to some of the ups and downs of each legendary player's career.
To quantify this sentiment, we garnered tennis interview transcripts that were generously available to the public at asapsports.com. Using 1000 interviews on the website, we trained a natural language processing algorithm (specifically a Maximum Entropy Classifier) that classified each interview question as "positive," "neutral," or "negative" (we manually read through a subset, used as training data, and classified each one). Using this classifier, we assigned a score for each interview a top player has conducted - for a positively-toned question, we added the score by 1, and for a negatively-toned question, we subtracted the score by 1.
The results can be found for the top male and female players in the past decade. One can immediately notice that scores near the present are generally much higher than in the past - which suggest that:
- More questions are being asked in press conferences, and
- Tennis journalists are less critical to these players as they reach the twilight of their illustrious careers.
We were also able to identify certain events that constituted more extreme scores. Take September 12th, 2009 for example - this was the day Serena Williams threatened a lineswoman for calling a foot fault on her - it has one of her lowest sentiment scores to date. Another example is September 10th, 2011 - the day Federer lost in Djokovic in five sets after seeing the Serb smack forehead winners on his match point. The Swiss maestro publicly voiced his disapproval on Djokovic's "careless" play, which would explain his subpar interview rating on that day.
Some positive moments were captured too - Caroline Wozniacki had some of her highest ratings during her 2014 US Open Final run (September 2014), and Serena Williams had some of her best ratings during her pursuit of the Career Grand Slam this season (although her SF interview on her loss to Roberta Vinci was much less positive)
We were also able to notice some interesting trends - for instance, journalists in the Indian Wells Masters seem to love Maria Sharapova as her sentiment rating seems to peak in the beginning of March of many seasons (e.g. 2006, 2008, 2011, 2013).
We definitely did not cover all the trends so feel free to play around with our interactive graphic and comment on any interesting findings!