So you want to be an amateur election forecaster? Here’s step zero.

October 26, 2016

[reprinted and slightly edited from 2012]

Scientifically forecasting the presidential election has, in the span of just a few elections, gone from a hobby of some obscure political scientists to a full-blown mainstream media and public obsession.  There are lots of forecasting sites out there for the presidential election: FiveThirtyEight, The Upshot, Votamatic. And these sites are really popular; people love to read them!

And so I figure there are a lot of people out there who would like to do some amateur election modeling in the next couple of weeks. And that’s one of the dirty little secrets of all this: the basic strategy is not wizardry. I’m far, far, far from anything approaching a good modeler and/or quantitative forecaster. But that doesn’t mean I can’t try my hand at it. And the same thing applies to you.

There’s only one file you need to get started. Right-click and save this excel sheet (it’s about 2mb): electoral-college-2016-monte-carlo-simulation-matt-glassman.

Open up the excel file. You will see it contains a list of all states, the number of electoral votes they get, and an associated probability of Hillary Clinton winning the state. Each time you alter one of the state win probabilities, the file re-runs a simulation of 50,000 elections, based on the individual state probabilities, and reports the following relevant electoral college results: percentage of time Clinton wins, Trump wins, or its a tie; the average number of electoral votes for each candidate; and a graphical probability distribution of Clinton’s electoral college votes. Excel will instantly re-run the simulation (it takes about 2 seconds on my crappy computer) any time you change any of the state win probabilities. The graphic you see here is the distribution of outcomes when FiveThiryEight’s current (as of 10/26) “polls-only” win probabilities for each state are plugged into my simulation (not surprisingly, my sim and his sim both find the mean number of Clinton EC votes to be about 333.6; my sim’s estimate of the percent of the time Clinton wins is higher — 99% to 84% — most likely because my sim is simple, non-dynamic, and doesn’t account for anything but the state win probabilities). But you can plug in whatever you want. It can be fun to get lost in the various scenarios. Trust me.

hrc-simNow, let me be crystal clear about one thing: this is not a model of the election. The Monte Carlo simulation is the final step in translating your model into a forecast. Your actual model is the data and analytical process that generates the individual state win probabilities. When you waste a fun hour plugging in various different state probabilities, your implicit model is “my best guess.” In effect, you are just doing somewhat-systematic punditry. If you want to actually model the state probabilities, you need at a minimum some sort of data (if you just want to forecast) or some data and a theory (if you want to forecast and explain the Way Things Work).

Here’s an example of a super-simple forecast model: take the most recent major-firm poll in each state, get Clinton’s percentage in the poll, build a table that translates each polling number to a win probability (i.e. polling 52% = 70% win chance; polling 58% = 99.99% win chance), and then just plug in the probabilities that your poll data suggests. Re-run simulation each time you get a new state-level poll.

Of course, that’s an absurdly simple and naive model. All sorts of advancements can be made on it. You can study the historical translation between poll numbers and win probabilities to improve your table; you can average multiple polls; you can weight those averages by the age of the poll and the past track-record of the pollster; you can correct for “house effects” of state-level polls; you can incorporate national polls and weight their contribution to the model; you can incorporate fundamental demographic data about the state; you can use all this to build a baseline and make the whole thing a continuous Bayesian update; you can build in uncertainty.

This sort of complicated and detailed modeling is exactly how the popular models work. Now, FiveThirtyEight’s model is proprietary, but other very good models, such as Votamatic, are fully transparent and available for inspection. Go check it out. It will demystify much of what is going on under the hood of these things. But the bottom line is that simple and complicated models all work the same way: use an algorithm to translate some data into win probabilities for each state, then simulate the electoral college with those state probabilities.

Also note that the simulation I gave you here isn’t dynamic in any way. It’s the most basic simulation possible. In real life, if Trump wins Pennsylvania — even if he only had a 13% estimated chance of doing so — he would almost certainly win Ohio and Iowa. The simulator doesn’t account for this; the states are completely independent. Which isn’t how the world works. Making them partially dependent — forcing the simulator to have Trump win OH if he happens to win PA, or having Clinton win GA if she happens to win TX — is an important complexity that most serious forecast models employ. This one does not.

And that’s the rub. Simple modeling and forecasting is very easy. Get some data, generate some win probabilities in the states, and then simulate the election. Good modeling and accurate forecasting is much, much tougher. It requires careful theoretical and empirical specification, and a subtle understanding of how politics and public opinion work. Those aren’t chops I have, and they probably aren’t chops you have. But they certainly aren’t magical tools or skills that people are born with. And there is virtually zero barrier to entry to getting started. So why not?

At any rate, enjoy the Monte Carlo simulator — even seat-of-your-pants modeling of the win probabilities is a hell of a lot of fun! And if you want to get more into it, good luck. The sky’s the limit.


Leave a Reply

Your email address will not be published. Required fields are marked *