Optimization of Batting Order
Frank R.Zheng
A Quick Introduction to Baseball
Twoteams alternate batting and fielding.Batting team tries to score runs.Runnersmust advance through first, second and third base in order to reachhomeRunners are advanced by players getting hits, drawing walks, stealing bases, or errors by the opposing team’s defenseThe team with the most runs at the end of the game wins
Batting Order
Before each game, the team’s coach must submit the batting order of the teamThe batting order dictates the order in which players step up to the plateSubstitutions such as pitch hitters or pitch runners are allowed, but are relatively rareThe optimal batting order maximizes the expected run production
Batting Order Optimization as a Scheduling Problem
Finding the optimal batting order for a team can be thought of as a single-machine scheduling problemEach batter is modeled as a job, and the batting order is a set of 9 such jobsThe objectivefunction is to maximize the run production of the lineupThis is acomplicated functionthat requires simulation to analyze
Approach to Optimize Batting Order
Each baseball team has a roster of ~15 batters, of whichonly9 compose the batting orderBrute forcing all the possible lineups issomewhat impractical –need to calculate 15!/6!combinations (over1.8billionunique lineups)Solution is to combine a qualitative “conventional wisdom” approach with a data-driven quantitative methodology
Batting Order Conventional Wisdom
Over the many decades baseball has been played, coaches have dedicated much thought to finding the best lineupTraditional lineups follow this general order1-2 – batters who get on base on a lot3-5 – batters who get a lot of extra base hits6-8 – weak batters9 – pitcher/weak batter/batter who gets on base a lotKey is to have players with a high realization value (lots of runs batted in) follow those with a high potential value (getting on base a lot)i.e., get runners on base so your power hitters can drive them home
Underlying Causes of Run Production
There is alimitedset of events that have the potential to score runsWe refer to these as“Run-Producing Events”or RPEsRPEs includeSingles (1B)Doubles (2B)Triples (3B)Home Runs (HR)Bases on Balls/Batter Hitter by Pitch (BB+HBP)Errors (ERR)
Batting Performance
Doesthe model fully capture differences among player batting characteristics?Howto distinguish between ‘table setters’ vs. ‘sluggers/cleanup hitters’?
Realization Value vs. Potential Value
Realization Valueis the expected number of runs each RPE actually scoresPotential Valueis the effect each RPE has on the team’s chances to score additional runs in the same inningDifferentiating between these two metrics allows us to quantitatively determine which players create the potential for scoring runs and which ones are good at bringing those players to home plate
Differentiating Players
By comparing each individual’s realization value and potential value to the team’s overall averages, we can group players into one of four categories(R+, P+) Strong Hitters– players who bat in a lot of runs but also create the potential for more runs(R+, P-) Run Producers– players who bat in a lot of runs(R-, P+) Table Setters– players who create a lot of potential for more runs(R-, P-) Weak Hitters– the team’s worst playersThis gives us the quantitative data we need to apply the conventional wisdom discussed earlier
Overview of Heuristic
Now we have the tools we need to combine the holistic conventional wisdom with quantitative dataWe adapted this heuristic from the work ofSokolAfter determining which players fall into which set, we attempt to follow the conventional wisdom of placing batters with high realization values after a group of batters with high potential valuesWe want to build up potential value and then release it with realization valueThe optimal order of the four sets is(R-, P+)(R+, P+)(R+, P-)(R-, P-)
Heuristic Steps
Select the two batters with the highest P in the (R-, P+) set and assign them to the top two slots in the batting order, by order of increasing PPlace all batters in the (R+, P+) group in the next slots, ordered by decreasing PFill as many remaining slots as possible with batters from the (R+, P-) group, ordered by decreasing PIf there are any remaining slots, fill them with batters in the (R-, P-) group, ordered by increasing PFor each player left in the (R-, P+) group, replace a (R-, P-) player if possible, ordering the new (R-, P+) players by increasing P
Application to 2011 New York Yankees
In order to see the effects of our heuristic, we applied it to the 2011 New York YankeesFirst, we placed each player into the appropriate category
(R+, P-)Run Producers
(R-, P-)Weak Hitters
(R-, P+)Table Setters
(R+, P+)Strong Hitters
Simulation
In order to determine the value of our objective function (the expected number of runs scored per game) we need to simulate a game of baseball using the designated lineupOur simulation follows the structure of a normal game of baseballAt each point in time, the next batter steps up to the plate and either generates a RPE or gets out, depending on that player’s distributionRPEs advance runners according to the rules of baseball or by probabilistic outcomes determined using data from the 2011 seasonThe number of outs and runs is recorded for each of 16,200 games
Results of Analysis
Standard LineupThis lineup generated an average of 5.68 runs, and is expected to have a 61.3% chance of winning a 5-game series against theDetroitTigers
Heuristic LineupThis lineup generated an average of 5.84 runs, with a 64.7% chance of winning a 5-game series against theDetroitTigers
Conclusions and Other Applications
The heuristic was only able to generate a lineup with a 3% increase in the amount of expected runsSince statistical analysis in baseball is a known quantity, it may be the case that the NYY have already studied this problem in great detailEven if the gains in expected run production were minimal, there are other applications for our methodologyPotential trades or acquisitions of new players can be evaluated by what effect they would have on the team’s expected run productionCan apply a game-theoretic approach to maximize your expected win rate by adjusting the distribution of your team’s run production to maximize the potential of winning a game against a specific team
0
Embed
Upload