The table below shows trailing 250 and 500 day (excluding offseason) regular season RAPM. Updated nightly.
(This is still a really early version right now. Would love any suggestions.)
Why 250 and 500 day samples? Why trailing? There is a ton of noise in RAPM for small samples. The single-season RAPM numbers that are floating out there (like ESPN’s single-season RPM) suffer from this. They might do an okay job describing how much players have contributed to winning in the given season, but those numbers are definitely not predictive. In light of that, I chose larger samples – 250 game days is about 1.5 seasons, and 500 game days is about 3 seasons. Using a trailing sample removes the arbitrary lines between seasons.
A few technical details:
- I used ridge regression (use cross-validation to find “best” regularization parameter)
- Possessions column is the sum of offensive and defensive possessions for that player
- There is some possessions played cutoff that regresses about 1/5th of the players toward a ~replacement level~ player
- There is no age adjustment, but since the timeframes aren’t super long, it shouldn’t make too big of a difference. I’d actually like any suggestions on how to do the age adjustment. I’ve read this discussion, but not 100% sure I grok it.
- Play-by-play parsing is fraught with edge cases and there may be some bugs in my code, so I can’t guarantee this is 100% accurate. It’s close though.
- Please tweet / dm me @canzhiye any suggestions or questions