Sunday, September 24, 2017

Ranking the Clutchest NBA Players in the 2014-15 Regular Season

This NBA season has seen emerging superstars rise to the pinnacle of the sport. Steph Curry, James Harden, Russell Westbrook and Anthony Davis have all had career-best seasons and are front-runners for the league’s MVP award.

With the playoffs beginning this week, however, the game’s elite need to translate their stellar play into wins. They need to elevate their game when the game is on the line. Their legacies will be defined not by how good they are, but how clutch they are.

How can we determine if the superstars of the NBA are clutch? How do we know if a player can make that buzzer-beater to clinch the series (think Damian Lillard) or shoot the must-make three-pointer to stay alive (think Ray Allen)? We came up with two metrics that can help us quantify “clutchness” in the NBA. The first is Win Shares and the second is Win Probability Added (WPA).

Method 1: Predicting “Win Share” in Clutch Situations

“Win Shares” estimates the number of wins a player contributes to a team. uses advanced formulas to compute this statistic, but these advanced formulas cannot be applied to close game situations. To simplify the win share calculation and extend it to moments when the game is on the line, we regress the metric with available player statistics (e.g. Field Goals, Rebounds, Assists etc.) to find the best model that predicts win shares for each player. We define clutch situations as possessions occurring at less than 3 minutes left in the 4th quarter or overtime and within a 5 point margin.

With our regression model, we were able to predict players’ “clutch win shares." The results, seen below, show some very interesting findings:

  1. James Harden is the best and the clutchest, with the highest regular and clutch win share. Players perform better in tight situations if they are situated above the 45-degree line, and Harden is way above it.
  2. The clutchest player that is not an All-Star is Eric Bledsoe. His win share in close situations is double of his regular win share, good enough to be ranked 6th in the league.
  3. The Golden State Warriors team is not that clutch. Its superstar, Steph Curry, actually performs slightly worse in late-game situations. That said, one can argue that they have been so good this year, that they have not been in many clutch situations
  4. Andrew Wiggins has the makings of a superstar. While he only has a 2.10 regular win share, his clutch win share is over 9, putting him in the top 20 in the league.
  5. Some “clutch” players aren't really that “clutch.” Damian Lillard, Blake Griffin, Kyle Lowry has been known to carry their team in tight situations. Our predictions suggest otherwise, as they all fall below the 45-degree line. On the other hand, heavily criticized John Wall is the fourth best player in close games.
  6. Kyrie Irving, not Lebron James, should be the go-to guy for the Cleveland Cavaliers when the game is on the line. His 50+ efforts against the Spurs and Blazers probably helped his case.

Method 2: Calculating Total Win Probability Added in Clutch Situations

“Win probability added” (WPA) is a metric commonly used to analyze player performance in baseball and American football. It measures how much a play affects the team’s probability of winning. For instance, if the game is tied with 10 seconds remaining in the fourth quarter, draining a three-pointer bumps up the team’s probability of winning by an astonishing 30.7%, so the WPA for that play is 30.7. However, missing that three-pointer bumps down this probability by 23.4%, so the WPA for that play is -23.4. We used’s win probability calculator to determine these changes in winning probability for each possession.

Our method calculates the WPA for every shot attempt in a clutch situation for each player. We then aggregate these possessions to get the total WPA for each player. We also calculated WPA per shot attempt to determine player efficient in tight game situations. The results, shown in the graphics below, reveal some interesting findings:
  1. Anthony Davis is the clutchest player for this metric, with the highest win probability added, at 211.1. James Harden contributed less than half of that despite attempting more shots (66 vs. 80).
  2. Many superstars (Chris Paul, Russell Westbrook, Kyle Lowry) have very bad WPA. A lot of them are forced to take difficult shots in tight situations, and as a result, are heavily penalized.
  3. Vince Carter, at the age of 38, contributes the most win probability per attempted shot. While he may not have the explosiveness he once had, he can still make the big shots when it matters.
  4. Marcus Smart, is not only one of the best rookies, but one of the clutchest NBA players, period.
  5. Kyle Korver is just as effective on the three-point line as he is in clutch situations, topping the charts in win probability added per attempted shot among all-stars.

So who are the clutchest players in the NBA?

Both of these methods add valuable insights to how clutch players are. That said, the Win Share method emphasizes overall player performance over the final minutes of the game,  whereas the WPA method emphasizes the player’s ability to make an important shot, especially in the dying seconds when it really matters. Scoring a buzzer-beater will significantly alter a player’s WPA score, but not his win share score. On the other hand, assisting another player would increase a player’s win share score, but not his WPA score.

Looking at the two methods, two players stand out for us - Anthony Davis and James Harden. Both rank in the top 10 in Clutch Win Share and Total WPA. We would bet on these two players to step it up when the game is on the line.

These metrics, however, should be considered with context. Some players with a lower clutch win share or total WPA than others may more heavily defended because of the positions they are playing - a shooting guard will be in the paint more than a point guard and thus have a lower WPA per shot. Other things to consider are how many clutch situations a team is in. A good team may not have a scoring margin between -5 and 5 with 3 minutes remaining very often, and their players will have a lower total WPA.

The code for this analysis is included in Github:

Sunday, August 28, 2016

Mapping Congressional Cosponsorship Over Time

A recently developed interest of mine has been visualizing and studying social networks. At first, I looked to revisit the Instagram API I used in the past to study follower/following relationships amongst verified accounts (who doesn't want to know if Taylor Swift is really the most central celebrity ever?). Unfortunately, Instagram has now closed off their API, so we found a more timely dataset to analyze - sponsorships and cosponsorships in legislation.
This article mainly introduces the dataset, visualizes the networks over time, and does some preliminary analysis. Much deeper analysis will follow in the future.

The Data
The data is all available through the GovTrack API, which tracks all legislation that goes through the House and Congress. All code for this article is found on Github and interactive visualizations can be found on Plotly, a visualization tool that can layer interactiveness onto regular matplotlib graphs. 

I tracked sponsorships and cosponsorships in House and Senate bills from Congress #100 (1987-89) to Congress #114 so far (2015-17) for bills and joint resolutions. I disregarded approval status because we simply want to study the social dynamics of which legislators support one another, regardless of the outcome of the bill. We do not include simple resolutions, which do not have the force of law.

Visualizing Networks

All networks are constructed as directed graphs. Nodes represent congresspeople who have sponsored legislation, and edges lead into these nodes from other nodes who cosponsor, or support, their legislation. Edge weights are assigned by the number of times a cosponsor has sponsored a particular congressperson, so more cosponsorships from Person B to Person A will result in a greater edge weight between B and A. Node weights are assigned by total number of cosponsorships that person has, or the weighted in-degree of that node, so the more cosponsorships from any person to Person A will result in a greater node weight for person A. In the Senate, a bill may have multiple sponsors whereas in the House, that is not the case. Thus, most of our analysis following will be comparing cosponsorships as a more direct basis of comparison.

All node positions are visualized using force-directed drawing algorithms, which display pairs of nodes with greater edge weights closer together and those with lesser edge weights further apart. Nodes closer together represent more weight, or cosponsorships, between these nodes. Most of these graphs show a cluster of nodes in the center that are close together, meaning they have many cosponsorships with many of the other nodes in the center. The most recent Congress (#114) is shown directly below; historical congresses are in links following.

We also look at average eigenvector and in-degree centralities over time for the top 50 senators and representatives. These averages are based on the number of terms they served.

In-degree centrality of a node is proportional to the in-degrees to that node, which measures the effectiveness of a congressperson in attracting cosponsors. Eigenvector centrality also depends on the degree of connections, but additionally counts the centrality of those connections: $C_{E}(v_{i}) = \frac{1}{\lambda}\sum_{j \neq i}(A_{j,i}C_{E}(v_{i}))$, where $A$ is the square adjacency matrix of the network and $\lambda$ is some constant. If we write $C_{E}(v)$ as a vector of the eigenvector centralities of all nodes, then we can say $\lambda$ times this vector equals $A^{T}C_{E}(v)$, so $\lambda$ is the eigenvalue and $C_{E}(v)$ is the eigenvector. In this context, the most central senators and representatives are those connected to influential senators and representatives.


Senate #100-114

Eigenvector CentralityIn-Degree Centrality
Orrin Hatch0.147Orrin Hatch0.781
Charles Grassley0.140Patrick Leahy0.753
Patrick Leahy0.139Max Baucus0.695
Thomas Harkin0.134John McCain0.692
Dianne Feinstein0.125John Rockefeller0.662
Christopher Dodd0.116Thomas Harkin0.641
Richard Durbin0.116Dianne Feinstein0.617
Harry Reid0.105Harry Reid0.606
Jeff Bingaman0.105Pete Domenici0.591
Frank Lautenberg0.102Barbara Mikulski0.585
John Rockefeller0.098Jeff Bingaman0.571
Thomas Daschle0.097Edward Kennedy0.568
Max Baucus0.097Mitch McConnell0.557
John Kerry0.096Christopher Dodd0.547
John McCain0.096John Kerry0.538
Charles Schumer0.095Kay Hutchison0.532
Olympia Snowe0.082Susan Collins0.530
Daniel Moynihan0.077Joseph Biden0.498
Barbara Boxer0.076Frank Lautenberg0.493
Robert Dole0.076Christopher Bond0.491
Pete Domenici0.074Thad Cochran0.486
Susan Collins0.072Charles Schumer0.477
Joseph Biden0.068Daniel Inouye0.476
Barbara Mikulski0.068Richard Durbin0.473
Robert Menéndez0.066Olympia Snowe0.472
Kay Hutchison0.062Kent Conrad0.470
John Chafee0.062Barbara Boxer0.469
Daniel Inouye0.061Daniel Akaka0.460
Joseph Lieberman0.060Joseph Lieberman0.458
John Reed0.060John Warner0.444
David Pryor0.059Carl Levin0.443
Daniel Akaka0.058Richard Lugar0.442
Byron Dorgan0.058Arlen Specter0.435
Patty Murray0.057Byron Dorgan0.418
George Mitchell0.054Trent Lott0.414
Bob Graham0.054James Inhofe0.413
Christopher Bond0.054Thomas Daschle0.406
Alfonse D'Amato0.053Bob Graham0.391
Alan Cranston0.053John Breaux0.390
Arlen Specter0.052Ted Stevens0.386
Sherrod Brown0.052Ron Wyden0.381
James Jeffords0.051John Chafee0.381
Howard Metzenbaum0.051Michael Enzi0.378
Hillary Clinton0.050Ernest Hollings0.375
Michael DeWine0.050Patty Murray0.369
Ernest Hollings0.049Larry Craig0.367
Trent Lott0.048Daniel Moynihan0.364
Richard Lugar0.048Tim Johnson0.357
Kent Conrad0.047Samuel Brownback0.355

House #100-114

Eigenvector CentralityIn-Degree Centrality
Carolyn Maloney0.126Charles Rangel0.547
Rosa DeLauro0.113Michael Bilirakis0.503
Nita Lowey0.104Don Young0.479
Charles Rangel0.104Nita Lowey0.470
John Conyers0.103George Miller0.467
Fortney Stark0.091John Conyers0.455
Louise Slaughter0.087Elton Gallegly0.454
Henry Waxman0.085Louise Slaughter0.439
Nancy Johnson0.083Fred Upton0.433
Michael Bilirakis0.083Carolyn Maloney0.423
Christopher Smith0.081Nancy Johnson0.423
Edward Markey0.081Peter King0.417
John Lewis0.066Henry Waxman0.412
Jerrold Nadler0.066John Lewis0.408
Barney Frank0.066Rosa DeLauro0.408
Barbara Lee0.065Ileana Ros-Lehtinen0.404
Jim McDermott0.060Barney Frank0.404
Lois Capps0.060Edward Markey0.402
Lloyd Doggett0.059Eliot Engel0.400
Peter King0.059Fortney Stark0.397
Lane Evans0.057Sander Levin0.393
Constance Morella0.055Dale Kildee0.387
Eliot Engel0.053Bob Goodlatte0.379
William Clay0.053E. Shaw0.373
John Dingell0.053Sam Johnson0.368
E. Shaw0.053Clifford Stearns0.367
Don Young0.051F. Sensenbrenner0.366
Peter DeFazio0.051Peter DeFazio0.356
James McGovern0.050H. Saxton0.351
Christopher Shays0.050Dave Camp0.347
Bob Filner0.050Lois Capps0.342
Maxine Waters0.049Lane Evans0.340
Elton Gallegly0.049Bob Filner0.338
Philip English0.047Gary Ackerman0.336
Frank Pallone0.047Walter Jones0.332
Ileana Ros-Lehtinen0.046Bill Pascrell0.331
Gary Ackerman0.046Joe Barton0.327
Sander Levin0.045Thomas Davis0.324
Lynn Woolsey0.045Steny Hoyer0.323
Rush Holt0.045James Moran0.322
Patricia Schroeder0.044Mike Thompson0.317
Charles Schumer0.044John Dingell0.316
Benjamin Gilman0.043Kevin Brady0.314
Barbara Kennelly0.043Anna Eshoo0.312
Dale Kildee0.043Lamar Smith0.312
Bernard Sanders0.042C. Cox0.312
Fred Upton0.042Philip English0.311
Mike Thompson0.041Jim McDermott0.308
Amory Houghton0.041Constance Morella0.306


Next, we look at divisiveness of the House and Senate over time by mapping modularity over time. Modularity is a metric that essentially sums up for all node pairs, the difference between the actual and expected number of edges between them. High modularity indicates that edge connections are not random. Instead, there are dense connections amongst some nodes, and sparse connections to other densely connected nodes. First, we look at the modularity given a partition along political party lines:

We can see that modularity in the House of Representatives is typically higher than in the Senate, a conclusion supported by Y. Zhang, 2009. This may be explained by the intuition that House elections are smaller and more local than statewide Senate elections, and thus have a higher likelihood of electing more partisan people. We can see modularity rise dramatically during the 112th Congress for both the Senate and the House, a period that has been deemed "the least productive since the Civil War" due to extreme polarity. This period was after the 2010 midterm elections, in which more partisan congresspeople were elected into Congress and Republicans took both houses. This polarity was compounded by the fact that Obamacare was signed in March 2010 before these midterm elections.

Next, we look at the clusters that the Louvain method identifies in these Congresses over time and compare them to the actual split along party lines. This network cluster detection method iteratively maximizes the modularity measure (measure of divisiveness):

High error periods generally correspond to periods when party-modularity is not high, as expected. 

What's Next?
After we've introduced our data and did some basic visualizations and analyses, there is a lot left to explore. Possible next steps, as time allows, include:

1) Panel-regressing centrality measures on senator or representative characteristics over time. Are women less central? Are older people more or less central? Does being from a certain state automatically make you more or less central?
2) Reciprocity. Are some pairs always voting for each other? 
3) Predicting cosponsorship edges based on the characteristics of the sponsor, or even the content of the bill.