Let the crapshoot begin

So those of you who have nothing better to do than remember everything I’ve written here will of course recall my periodic forays into Bradley-Terry Ratings for baseball.  It’s time to use math to predict the playoffs.  Why not?  It can’t be worse than anyone else’s predictions, can it?  It may not be better, but no one will ever be able to prove that it’s worse.

A brief recap: to make Bradley-Terry Ratings, you need a dataset in which everyone has played everyone else a number of times, and you create a single number to represent the strength of that team such that you come closest to predict the actual record of that team in expectation over the season when every match between two teams is of the form:

Pr(Road Team Win) = RoadRating/(HomeRating x HomeFieldAdvantageFactor + RoadRating)

Let’s start with the rankings for the playoff teams.

There are a number of things you have to make Bradley-Terry ratings.  The first is to decide the period over which they are calculated.  The standard method is to use that season’s games.  Under that criterion, here are the ratings:

TeamBT RatingWins
LAD6.4403111
HOU5.6087106
NYM5.1058101
NYY5.096099
ATL4.9954101
TOR4.318092
STL3.882893
CLE3.855892
SEA3.854090
SDP3.838389
TBR3.711186
PHI3.641387
BAL3.455683
MIL3.321586
SFG3.157481
BOS3.156278
CHW3.008881
MIN2.853878
ARI2.719674
LAA2.655673
CHC2.539174
MIA2.399869
COL2.351168
TEX2.312468
DET2.130266
KCR2.127365
OAK1.935460
CIN1.922762
PIT1.899762
WSN1.706155

But of course using the whole season’s games can be misleading.  Teams can change dramatically over the course of the year, as a certain team which retooled at the deadline last year might remind you.  What if we just ranked teams based on July-October?  That would yield the following rankings:

TeamBT RatingWins
LAD8.10915164
ATL6.31357357
HOU5.73870958
NYM5.06127354
SEA4.81247753
STL4.40789950
CLE4.17812853
TOR3.91981849
BAL3.75916848
PHI3.74353747
SDP3.48852343
TBR3.34935946
CHC3.12046744
CHW3.01444146
NYY2.98500643
SFG2.94735441
MIL2.83909542
ARI2.81189640
COL2.43449535
LAA2.39906436
OAK2.32847635
KCR2.31122338
DET2.21375737
BOS2.14094935
MIA2.14033235
CIN2.11536636
MIN2.10960235
TEX1.78126232
PIT1.7351531
WSN1.69044926

You have to drop all the way down to 16th place to get all 12 playoff teams.  And the Braves rise to 2nd. (This proves that the whole season matters. Without those first two months, the Yankees would be fishing now.)

One more entirely biased method is to go from June 1 to the end of the year.  That yields this:

TeamBT RatingWins
ATL6.80126977
LAD6.64676878
HOU5.79663973
NYM4.73657866
SEA4.62130769
CLE4.45738370
NYY4.27154265
PHI4.13425865
TOR3.9823563
BAL3.96503461
STL3.68480764
TBR3.44259957
SDP3.40404659
BOS3.14024754
CHW3.05250958
SFG2.78346954
MIL2.69139654
CHC2.65347353
MIA2.44690750
MIN2.44323348
KCR2.40965449
LAA2.37631346
ARI2.3402649
COL2.2442546
DET2.13509446
TEX2.03684644
OAK1.98191540
CIN1.90851445
WSN1.75829637
PIT1.65304540

Ahh… that’s more like it.  Now the right team is the best team.

There’s an important lesson here even before we begin playoff simulations.  Anyone who claims to be able to rank teams must have in their mind something that is the team and something about the timeframe they are measuring over.  None of that is a guarantee of performance in the playoffs, but you won’t get any better predictions out that the strength of the team you assumed going in.

So we can start with the most pessimistic Braves’ method: the full year estimate, which is standard. I’ll go through the playoffs in some detail, and then circle back to show the change in probabilities from only using post-May data.

The next step is to assign a home field advantage.  I am really skeptical of trying to measure a playoff home field advantage, so I’m just going to go with the sort of numbers I’ve seen in the literature and go with a 1.1 multiplier.  This really only has a big effect in the Wild Card round, where the better team gets up to three home games.  The following table gives the Wild Card round estimates:

Wild Card Round
VISGameSeriesHOMGameSeries
SDP0.4290.361NYM0.5710.639
PHI0.4840.440STL0.5160.560
TBR0.4900.450CLE0.5100.550
SEA0.4720.422TOR0.5280.578
      

For each team, this gives their probabilities of winning a game in the series and their chances of winning two out of three. So the Padres, for example, have a 42.9% chance of winning a game at Citifield, which gives them a 36.1% chance of winning two out of three. The last two columns are just the complement of columns (2) and (3). Note, by the way, how little a 2-out-of-three format changes the probabilities. That’s the basis for the basic insight that if you want to upset someone, your best chance is a short series.

Now we get to the LDS round.  The lack of reseeding under the new schedule makes this pretty easy. When we look at all the possibilities, we get this:

Game 1 VisitorGame 1 HomeMatchup ProbabilityPR VisPR Home
NYMLAD0.6390.3840.616
TORHOU0.5780.3710.629
STLATL0.5600.3750.625
CLENYY0.5500.3630.637
TBRNYY0.4500.3470.653
PHIATL0.4400.3470.653
SEAHOU0.4220.3220.678
SDPLAD0.3610.2660.734

I have sorted these by the probability that the matchup occurs. Since the Mets have the highest probability of winning their Wild Card Series, it gets the first row. I list the Visitor in Game 1 and the Home Team in Game 1, the probability that matchup occurs, and then the probability that the Visiting Team (in Game 1) survives, followed by the complement, the probability that the other team survives. All the home teams are substantial favorites, which you’d expect.

Next comes the LCS.  Following the same structure as the previous table (though now we are playing 7 game series) we get the following:

ATLLAD0.4200.3570.643
NYYHOU0.4180.4400.560
NYMATL0.1560.5040.496
STLLAD0.1380.2400.760
TORNYY0.1380.4030.597
CLEHOU0.1300.2980.702
PHILAD0.1010.2140.786
TBRHOU0.1010.2800.720
SEANYY0.0880.3440.656
SDPATL0.0610.3520.648
NYMSTL0.0520.6390.361
TORCLE0.0430.5540.446
PHINYM0.0380.3150.685
TBRTOR0.0330.4110.589
SEACLE0.0270.4920.508
TBRSEA0.0210.4720.528
SDPSTL0.0200.4860.514
PHISDP0.0150.4640.536

The probabilities here require both teams to survive to this round, which is just the multiplication of each teams chances of surviving the second round.

Finally, we come to the 36 possible World Series matchups, sorted by their probability of occurrence, and in the same format as the previous two tables:

HOULAD0.1810.4180.582
NYYLAD0.1470.3670.633
ATLHOU0.1060.4300.570
NYYATL0.0860.5030.497
NYMHOU0.0550.4410.559
NYYNYM0.0450.4920.508
TORLAD0.0450.2860.714
CLELAD0.0330.2370.763
TORATL0.0260.4140.586
STLHOU0.0250.3010.699
SEALAD0.0250.2360.764
TBRLAD0.0240.2210.779
STLNYY0.0200.3480.652
CLEATL0.0190.3550.645
PHIHOU0.0160.2720.728
SDPHOU0.0160.2960.704
SEAATL0.0150.3540.646
TBRATL0.0140.3350.665
TORNYM0.0140.4020.598
PHINYY0.0130.3160.684
SDPNYY0.0130.3420.658
CLENYM0.0100.3440.656
SEANYM0.0080.3430.657
TBRNYM0.0070.3250.675
TORSTL0.0060.5500.450
CLESTL0.0040.4890.511
PHITOR0.0040.4010.599
SDPTOR0.0040.4290.571
PHICLE0.0030.4610.539
SDPCLE0.0030.4900.510
TBRSTL0.0030.4680.532
SEASTL0.0030.4880.512
TBRPHI0.0020.5030.497
TBRSDP0.0020.4740.526
PHISEA0.0020.4620.538
SDPSEA0.0020.4900.510

One thing you can do with this table is calculate fair odds.  If I wanted to bet that Seattle beats the Padres in the World series, I ought to get around 1,000:1 odds. 

We can then cumulate by team across this table and get the full crapshoot probabilities.  The chances of winning the World Series are:

TeamBT RatingChampionship Probability
LAD6.4400.293
HOU5.6090.207
NYY5.0960.150
ATL4.9950.135
NYM5.1060.071
TOR4.3180.037
CLE3.8560.023
STL3.8830.023
SEA3.8540.018
TBR3.7110.016
SDP3.8380.014
PHI3.6410.013

How robust are these probabilities?  Well, suppose we use team rankings since June 1st.  The revised probabilities are:

TeamBT RatingChampionship Probability
ATL6.8010.286
LAD6.6470.265
HOU5.7970.202
NYY4.2720.063
CLE4.4570.043
NYM4.7370.041
SEA4.6210.037
TOR3.9820.020
PHI4.1340.018
STL3.6850.011
TBR3.4430.009
SDP3.4040.005

That’s more like it, although the Braves are barely better than 4:1 odds. But it’s hard to refute these probabilities. They don’t say the Padres won’t win, only that it’s about 150-1 against. Current Vegas odds are around 30-1 on the Padres. That’s a terrible bet.

On the other hand, the current Dodger odds of about 3.5:1, while not exactly fair. are not strikingly unfair either. But the Braves odds of 6:1 are really good if you think the last four months are who the Braves are.