## Sunday, September 24, 2017

### Ranking the Clutchest NBA Players in the 2014-15 Regular Season

This NBA season has seen emerging superstars rise to the pinnacle of the sport. Steph Curry, James Harden, Russell Westbrook and Anthony Davis have all had career-best seasons and are front-runners for the league’s MVP award.

With the playoffs beginning this week, however, the game’s elite need to translate their stellar play into wins. They need to elevate their game when the game is on the line. Their legacies will be defined not by how good they are, but how clutch they are.

How can we determine if the superstars of the NBA are clutch? How do we know if a player can make that buzzer-beater to clinch the series (think Damian Lillard) or shoot the must-make three-pointer to stay alive (think Ray Allen)? We came up with two metrics that can help us quantify “clutchness” in the NBA. The first is Win Shares and the second is Win Probability Added (WPA).

Method 1: Predicting “Win Share” in Clutch Situations

“Win Shares” estimates the number of wins a player contributes to a team. Basketball-reference.com uses advanced formulas to compute this statistic, but these advanced formulas cannot be applied to close game situations. To simplify the win share calculation and extend it to moments when the game is on the line, we regress the metric with available player statistics (e.g. Field Goals, Rebounds, Assists etc.) to find the best model that predicts win shares for each player. We define clutch situations as possessions occurring at less than 3 minutes left in the 4th quarter or overtime and within a 5 point margin.

With our regression model, we were able to predict players’ “clutch win shares." The results, seen below, show some very interesting findings:

1. James Harden is the best and the clutchest, with the highest regular and clutch win share. Players perform better in tight situations if they are situated above the 45-degree line, and Harden is way above it.
2. The clutchest player that is not an All-Star is Eric Bledsoe. His win share in close situations is double of his regular win share, good enough to be ranked 6th in the league.
3. The Golden State Warriors team is not that clutch. Its superstar, Steph Curry, actually performs slightly worse in late-game situations. That said, one can argue that they have been so good this year, that they have not been in many clutch situations
4. Andrew Wiggins has the makings of a superstar. While he only has a 2.10 regular win share, his clutch win share is over 9, putting him in the top 20 in the league.
5. Some “clutch” players aren't really that “clutch.” Damian Lillard, Blake Griffin, Kyle Lowry has been known to carry their team in tight situations. Our predictions suggest otherwise, as they all fall below the 45-degree line. On the other hand, heavily criticized John Wall is the fourth best player in close games.
6. Kyrie Irving, not Lebron James, should be the go-to guy for the Cleveland Cavaliers when the game is on the line. His 50+ efforts against the Spurs and Blazers probably helped his case.

Method 2: Calculating Total Win Probability Added in Clutch Situations

“Win probability added” (WPA) is a metric commonly used to analyze player performance in baseball and American football. It measures how much a play affects the team’s probability of winning. For instance, if the game is tied with 10 seconds remaining in the fourth quarter, draining a three-pointer bumps up the team’s probability of winning by an astonishing 30.7%, so the WPA for that play is 30.7. However, missing that three-pointer bumps down this probability by 23.4%, so the WPA for that play is -23.4. We used Inpredictable.com’s win probability calculator to determine these changes in winning probability for each possession.

Our method calculates the WPA for every shot attempt in a clutch situation for each player. We then aggregate these possessions to get the total WPA for each player. We also calculated WPA per shot attempt to determine player efficient in tight game situations. The results, shown in the graphics below, reveal some interesting findings:
1. Anthony Davis is the clutchest player for this metric, with the highest win probability added, at 211.1. James Harden contributed less than half of that despite attempting more shots (66 vs. 80).
2. Many superstars (Chris Paul, Russell Westbrook, Kyle Lowry) have very bad WPA. A lot of them are forced to take difficult shots in tight situations, and as a result, are heavily penalized.
3. Vince Carter, at the age of 38, contributes the most win probability per attempted shot. While he may not have the explosiveness he once had, he can still make the big shots when it matters.
4. Marcus Smart, is not only one of the best rookies, but one of the clutchest NBA players, period.
5. Kyle Korver is just as effective on the three-point line as he is in clutch situations, topping the charts in win probability added per attempted shot among all-stars.

So who are the clutchest players in the NBA?

Both of these methods add valuable insights to how clutch players are. That said, the Win Share method emphasizes overall player performance over the final minutes of the game,  whereas the WPA method emphasizes the player’s ability to make an important shot, especially in the dying seconds when it really matters. Scoring a buzzer-beater will significantly alter a player’s WPA score, but not his win share score. On the other hand, assisting another player would increase a player’s win share score, but not his WPA score.

Looking at the two methods, two players stand out for us - Anthony Davis and James Harden. Both rank in the top 10 in Clutch Win Share and Total WPA. We would bet on these two players to step it up when the game is on the line.

These metrics, however, should be considered with context. Some players with a lower clutch win share or total WPA than others may more heavily defended because of the positions they are playing - a shooting guard will be in the paint more than a point guard and thus have a lower WPA per shot. Other things to consider are how many clutch situations a team is in. A good team may not have a scoring margin between -5 and 5 with 3 minutes remaining very often, and their players will have a lower total WPA.

The code for this analysis is included in Github:

## Sunday, August 28, 2016

### Mapping Congressional Cosponsorship Over Time

A recently developed interest of mine has been visualizing and studying social networks. At first, I looked to revisit the Instagram API I used in the past to study follower/following relationships amongst verified accounts (who doesn't want to know if Taylor Swift is really the most central celebrity ever?). Unfortunately, Instagram has now closed off their API, so we found a more timely dataset to analyze - sponsorships and cosponsorships in legislation.

 https://c1.staticflickr.com/7/6043/6262122778_997339a086_b.jpg
This article mainly introduces the dataset, visualizes the networks over time, and does some preliminary analysis. Much deeper analysis will follow in the future.

The Data
The data is all available through the GovTrack API, which tracks all legislation that goes through the House and Congress. All code for this article is found on Github and interactive visualizations can be found on Plotly, a visualization tool that can layer interactiveness onto regular matplotlib graphs.

I tracked sponsorships and cosponsorships in House and Senate bills from Congress #100 (1987-89) to Congress #114 so far (2015-17) for bills and joint resolutions. I disregarded approval status because we simply want to study the social dynamics of which legislators support one another, regardless of the outcome of the bill. We do not include simple resolutions, which do not have the force of law.

Visualizing Networks

All networks are constructed as directed graphs. Nodes represent congresspeople who have sponsored legislation, and edges lead into these nodes from other nodes who cosponsor, or support, their legislation. Edge weights are assigned by the number of times a cosponsor has sponsored a particular congressperson, so more cosponsorships from Person B to Person A will result in a greater edge weight between B and A. Node weights are assigned by total number of cosponsorships that person has, or the weighted in-degree of that node, so the more cosponsorships from any person to Person A will result in a greater node weight for person A. In the Senate, a bill may have multiple sponsors whereas in the House, that is not the case. Thus, most of our analysis following will be comparing cosponsorships as a more direct basis of comparison.

All node positions are visualized using force-directed drawing algorithms, which display pairs of nodes with greater edge weights closer together and those with lesser edge weights further apart. Nodes closer together represent more weight, or cosponsorships, between these nodes. Most of these graphs show a cluster of nodes in the center that are close together, meaning they have many cosponsorships with many of the other nodes in the center. The most recent Congress (#114) is shown directly below; historical congresses are in links following.

We also look at average eigenvector and in-degree centralities over time for the top 50 senators and representatives. These averages are based on the number of terms they served.

In-degree centrality of a node is proportional to the in-degrees to that node, which measures the effectiveness of a congressperson in attracting cosponsors. Eigenvector centrality also depends on the degree of connections, but additionally counts the centrality of those connections: $C_{E}(v_{i}) = \frac{1}{\lambda}\sum_{j \neq i}(A_{j,i}C_{E}(v_{i}))$, where $A$ is the square adjacency matrix of the network and $\lambda$ is some constant. If we write $C_{E}(v)$ as a vector of the eigenvector centralities of all nodes, then we can say $\lambda$ times this vector equals $A^{T}C_{E}(v)$, so $\lambda$ is the eigenvalue and $C_{E}(v)$ is the eigenvector. In this context, the most central senators and representatives are those connected to influential senators and representatives.

Senate

Senate #100-114

 Eigenvector Centrality In-Degree Centrality Orrin Hatch 0.147 Orrin Hatch 0.781 Charles Grassley 0.140 Patrick Leahy 0.753 Patrick Leahy 0.139 Max Baucus 0.695 Thomas Harkin 0.134 John McCain 0.692 Dianne Feinstein 0.125 John Rockefeller 0.662 Christopher Dodd 0.116 Thomas Harkin 0.641 Richard Durbin 0.116 Dianne Feinstein 0.617 Harry Reid 0.105 Harry Reid 0.606 Jeff Bingaman 0.105 Pete Domenici 0.591 Frank Lautenberg 0.102 Barbara Mikulski 0.585 John Rockefeller 0.098 Jeff Bingaman 0.571 Thomas Daschle 0.097 Edward Kennedy 0.568 Max Baucus 0.097 Mitch McConnell 0.557 John Kerry 0.096 Christopher Dodd 0.547 John McCain 0.096 John Kerry 0.538 Charles Schumer 0.095 Kay Hutchison 0.532 Olympia Snowe 0.082 Susan Collins 0.530 Daniel Moynihan 0.077 Joseph Biden 0.498 Barbara Boxer 0.076 Frank Lautenberg 0.493 Robert Dole 0.076 Christopher Bond 0.491 Pete Domenici 0.074 Thad Cochran 0.486 Susan Collins 0.072 Charles Schumer 0.477 Joseph Biden 0.068 Daniel Inouye 0.476 Barbara Mikulski 0.068 Richard Durbin 0.473 Robert Menéndez 0.066 Olympia Snowe 0.472 Kay Hutchison 0.062 Kent Conrad 0.470 John Chafee 0.062 Barbara Boxer 0.469 Daniel Inouye 0.061 Daniel Akaka 0.460 Joseph Lieberman 0.060 Joseph Lieberman 0.458 John Reed 0.060 John Warner 0.444 David Pryor 0.059 Carl Levin 0.443 Daniel Akaka 0.058 Richard Lugar 0.442 Byron Dorgan 0.058 Arlen Specter 0.435 Patty Murray 0.057 Byron Dorgan 0.418 George Mitchell 0.054 Trent Lott 0.414 Bob Graham 0.054 James Inhofe 0.413 Christopher Bond 0.054 Thomas Daschle 0.406 Alfonse D'Amato 0.053 Bob Graham 0.391 Alan Cranston 0.053 John Breaux 0.390 Arlen Specter 0.052 Ted Stevens 0.386 Sherrod Brown 0.052 Ron Wyden 0.381 James Jeffords 0.051 John Chafee 0.381 Howard Metzenbaum 0.051 Michael Enzi 0.378 Hillary Clinton 0.050 Ernest Hollings 0.375 Michael DeWine 0.050 Patty Murray 0.369 Ernest Hollings 0.049 Larry Craig 0.367 Trent Lott 0.048 Daniel Moynihan 0.364 Richard Lugar 0.048 Tim Johnson 0.357 Kent Conrad 0.047 Samuel Brownback 0.355

House
House #100-114

 Eigenvector Centrality In-Degree Centrality Carolyn Maloney 0.126 Charles Rangel 0.547 Rosa DeLauro 0.113 Michael Bilirakis 0.503 Nita Lowey 0.104 Don Young 0.479 Charles Rangel 0.104 Nita Lowey 0.470 John Conyers 0.103 George Miller 0.467 Fortney Stark 0.091 John Conyers 0.455 Louise Slaughter 0.087 Elton Gallegly 0.454 Henry Waxman 0.085 Louise Slaughter 0.439 Nancy Johnson 0.083 Fred Upton 0.433 Michael Bilirakis 0.083 Carolyn Maloney 0.423 Christopher Smith 0.081 Nancy Johnson 0.423 Edward Markey 0.081 Peter King 0.417 John Lewis 0.066 Henry Waxman 0.412 Jerrold Nadler 0.066 John Lewis 0.408 Barney Frank 0.066 Rosa DeLauro 0.408 Barbara Lee 0.065 Ileana Ros-Lehtinen 0.404 Jim McDermott 0.060 Barney Frank 0.404 Lois Capps 0.060 Edward Markey 0.402 Lloyd Doggett 0.059 Eliot Engel 0.400 Peter King 0.059 Fortney Stark 0.397 Lane Evans 0.057 Sander Levin 0.393 Constance Morella 0.055 Dale Kildee 0.387 Eliot Engel 0.053 Bob Goodlatte 0.379 William Clay 0.053 E. Shaw 0.373 John Dingell 0.053 Sam Johnson 0.368 E. Shaw 0.053 Clifford Stearns 0.367 Don Young 0.051 F. Sensenbrenner 0.366 Peter DeFazio 0.051 Peter DeFazio 0.356 James McGovern 0.050 H. Saxton 0.351 Christopher Shays 0.050 Dave Camp 0.347 Bob Filner 0.050 Lois Capps 0.342 Maxine Waters 0.049 Lane Evans 0.340 Elton Gallegly 0.049 Bob Filner 0.338 Philip English 0.047 Gary Ackerman 0.336 Frank Pallone 0.047 Walter Jones 0.332 Ileana Ros-Lehtinen 0.046 Bill Pascrell 0.331 Gary Ackerman 0.046 Joe Barton 0.327 Sander Levin 0.045 Thomas Davis 0.324 Lynn Woolsey 0.045 Steny Hoyer 0.323 Rush Holt 0.045 James Moran 0.322 Patricia Schroeder 0.044 Mike Thompson 0.317 Charles Schumer 0.044 John Dingell 0.316 Benjamin Gilman 0.043 Kevin Brady 0.314 Barbara Kennelly 0.043 Anna Eshoo 0.312 Dale Kildee 0.043 Lamar Smith 0.312 Bernard Sanders 0.042 C. Cox 0.312 Fred Upton 0.042 Philip English 0.311 Mike Thompson 0.041 Jim McDermott 0.308 Amory Houghton 0.041 Constance Morella 0.306

Modularity

Next, we look at divisiveness of the House and Senate over time by mapping modularity over time. Modularity is a metric that essentially sums up for all node pairs, the difference between the actual and expected number of edges between them. High modularity indicates that edge connections are not random. Instead, there are dense connections amongst some nodes, and sparse connections to other densely connected nodes. First, we look at the modularity given a partition along political party lines:

We can see that modularity in the House of Representatives is typically higher than in the Senate, a conclusion supported by Y. Zhang, 2009. This may be explained by the intuition that House elections are smaller and more local than statewide Senate elections, and thus have a higher likelihood of electing more partisan people. We can see modularity rise dramatically during the 112th Congress for both the Senate and the House, a period that has been deemed "the least productive since the Civil War" due to extreme polarity. This period was after the 2010 midterm elections, in which more partisan congresspeople were elected into Congress and Republicans took both houses. This polarity was compounded by the fact that Obamacare was signed in March 2010 before these midterm elections.

Next, we look at the clusters that the Louvain method identifies in these Congresses over time and compare them to the actual split along party lines. This network cluster detection method iteratively maximizes the modularity measure (measure of divisiveness):

High error periods generally correspond to periods when party-modularity is not high, as expected.

What's Next?
After we've introduced our data and did some basic visualizations and analyses, there is a lot left to explore. Possible next steps, as time allows, include:

1) Panel-regressing centrality measures on senator or representative characteristics over time. Are women less central? Are older people more or less central? Does being from a certain state automatically make you more or less central?
2) Reciprocity. Are some pairs always voting for each other?
3) Predicting cosponsorship edges based on the characteristics of the sponsor, or even the content of the bill.