﻿ Behind the Numbers

David Annis, Ph.D.
Dave@SportsQuant.com

## Behind the Numbers

SportsQuant Rankings

Computer ranking algorithms fall into two classes: (1) win-loss models which ignore margin of victory (and therefore have no way of determining how dominant the winning team was) and (2) point scoring models which estimate offensive and defensive abilities but don't consider whether those points helped the team win. Our rankings are the first to model both wins and points scored, which allows us to distinguish between a 21-20 win and a 52-3 win.

The methods developed for ranking teams are useful in other sports modeling tasks because many of the obstacles are the same. For example, although certain quantities can be measured absolutely (who won? what was the final score?), they only make sense in a relative context. If Purdue wins a game, that win means different things depending on whether the opponent is Akron or Ohio State. If Florida scores 49 points, did it come against Alabama or Vanderbilt? All sports performance is relative, analysis must accommodate this reality.

Ranking methods must also estimate how good each team is relative to its peers. Goodness is an abstract notion. It can't be measured, but each of us recognizes characteristics of good performance. Many sports research applications must determine reasonable proxies for abstract concepts (team strength, player ability, schedule difficulty) based on observable quantities.

Performance Evaluation and Strategy

How well do conventional statistics quantify players' contribution to victory? What team qualities are most conducive to winning? Improved game metrics are proposed here.

When is it smart to bunt? (Not often) Should you go for it on 4th down? (More often than you think.) The strategy section answers these and other questions using our own methodology.

Stochastic Modeling

Game situations, like all complex systems, are impossible to describe using conventional mathematics and, consequently, require specialized quantitative methods. Since these complex systems can't be optimized analytically, one alternative is to model mathematically the relevant information (event probabilities, game decisions, etc.) and simulate the events you're interested in. You can visit our football and baseball simulators by following the hyperlinks or visit the main simulation page by clicking the button on the left of the page.

The football game simulator mathematically generates the conclusion of an NFL game from a situation that you choose (for example, trailing by 3 points with the ball at your own 35 yard-line). You can use this applet to investigate the probability of winning a game under these circumstances.

The baseball lineup optimizer lets you explore the effects of lineup ordering on run production. Rearrange the batters to your liking and then simulate to determine average runs per 9 innings and an approximate distribution of runs scored. The discussion page gives a few heuristic observations based on simulated data and empirical evidence.

Play Valuation

Due in large part to the proliferation of fantasy sports, countless statistics are available for players in all sports. While they are a start, these statistics fail to measure how much each player contributed to his team's success (or failure). The first step in gauging these contributions is determining how valuable each situation in a game is. For instance, if you were the head coach of a football team and were given the choice between a 3-point lead playing defense at your own 40 yard line or a 3-point deficit playing offense at your opponent's 40 yard line, which would you choose? Before answering, you'd like to know how often teams in similar circumstances eventually won.

The plot below shows the progression of an entire game (Jacksonville at Pittsburgh, October 16, 2005). The horizontal axis represents time elapsed in minutes (note that since the game went into overtime, elapsed time can exceed 60 min); the vertical axis represents the probability of the road team (Jacksonville) winning the game. When elapsed time is zero, this probability is 0.5 (reflecting a 50% chance of Jacksonville winning). As the game progresses, the probability of a Jaguars win is reassessed. All methods end at 1 (reflecting a 100% chance of a Jacksonville win following Rashean Mathis's interception return for a touchdown in overtime).

Three methods -- exclusive to SportsQuant -- are presented: (1) a semi-parametric isotonic/logistic regression (black), (2) a non-parametric kernel smoothed estimate (red) and (3) a simulation-based method (blue) in which hundreds of game conclusions were generated from each play. By valuing each game situation, these analyses can assign a value to each play based on how much it contributed toward a victory. Because game progression is extremely complex there may be several different, yet valid, ways to build a mathematical model describing it. Since it's difficult to know how accurately a model captures the features of the real process, employing three completely unrelated methods provides a measure of validation. If the disparate models agree (as they do in this graphic), then it is likely that they all have captured the essence of things.

Future work will address assigning credit to individual players based on their contribution to the team's chance of winning.