As I've hinted a couple of times, we now have a new version of the model running, which attempts to account for the interrelationships between polling movement in different states. Before I can work up the energy to fully describe that, let me first tell you about the new Tipping-Point States metric that I've developed to accompany it.
A Tipping-Point state is now defined as a states that would be most likely to alter the outcome of a close election if it were decided differently. More specifically, a Tipping-Point State is among the closest states –- taken alone or in combination –- that would give the losing candidate at least 270 electoral votes if transferred to him from the winner’s column, with no wasted electoral votes.
Let me give you a couple of examples. First, 2004, which is an easy one. The closest states won by Bush were New Mexico and Iowa. However, these would not have given John Kerry enough electoral votes even if he had won them. So the third-closest state, Ohio, was the lone Tipping-Point State for that election, since it would have gotten Kerry to 270 all on its own.
Now a somewhat more complicated example: 1960. Richard Nixon was 51 electoral votes shy of winning that election. The closest Kennedy states were as follows:Hawaii (3) -0.06%So, we start with a 51-EV gap and begin whittling those numbers down for Nixon. Giving him Hawaii makes it 48, Illinois makes it 21, Missouri cuts Kennedy's lead to 8, and New Mexico to 4. And then we hit New Jersey, which gives Nixon the election. But it also gives him 12 extra electoral votes.
Illinois (27) -0.19%
Missouri (13) -0.52%
New Mexico (4) -0.74%
New Jersey (16) -0.80%
So we go back through the list in reverse order and see if there is any wastage. In this case, there is. If we place New Mexico back in the Kennedy column, Nixon still has 8 electoral votes to spare. In fact, while we must keep Missouri and Illinois, we can also eliminate Hawaii. So the Tipping-Point states for Nixon in 1960 were Illinois, Missouri, and New Jersey. This was the most efficient possible combination of states that would give him a winning electoral margin.
But I tried to slip one by you there. What do I mean by a "close election"? I mean one in which the electoral math matters. That was the purpose of the graph that I posted this morning:
The Tipping-Point calculation is weighted across each simulation based on the popular vote result predicted in each election. Specifically, it is weighted according to the probability that a candidate with that popular vote share will lose the Electoral College. So a simulation in which the popular vote is divided exactly evenly will be weighted at .5 -- the highest possible weighting. A simulation in which the popular vote margin is 3 points -- that gives the popular vote leader about a 97 percent chance of winning the election -- will be weighted at a .03. Basically, most of the calculation is derived from elections that are decided by a point or two.
So this definition is rather complicated mathematically -- but at the same time, I think it's more intuitive than the previous version. It's definitely a lot more robust.
The hot-off-the-presses Tipping Point numbers are as follows:
Michigan and Ohio will each prove to be decisive in a close election about 30 percent of the time. After that are Colorado and Virginia, which serve as gateways to their respective regions.
The really interesting thing is to compare the Tipping Point states with and without the intrastate (or should that be interstate?) correlations. A state like North Carolina is punished, for instance, for reasons that most of you can probably figure out. But we'll save that for later.
Thursday, July 10, 2008
Tipping Point v2.0
-- Nate at 11:53 AM 48 Comments...
Labels: electoral math, meta, methodology, site
Popular Vote v Electoral Vote, Part II
Here's another take on this: the probability, as determined by logistic regression of our latest simulation run, of the candidate winning the electoral vote based on his share of the popular vote.
So a 1-point popular vote win translates to about a 75 percent chance of winning the Electoral College, a 2-point win about a 90 percent chance, a 3-point win a 97 percent chance, and a 4-point win a 99 percent chance.
This is, of course, assuming that my simulation model is getting this approximately right. Note that I've lumped together McCain and Obama's numbers here, making the distribution exactly symmetrical. I don't know to what extent these numbers would hold with two different candidates and with a different set of states in play, but I'd bet it's pretty close to the mark.
-- Nate at 7:31 AM 38 Comments...
Labels: electoral math, meta, popular vote
Wednesday, July 9, 2008
Quibble
Very interesting read from Justin Wolfers at the New York Times' Freakonomics blog, who cites research from friend-of-538 Robert Erikson which suggests that polls tend to understate the performance of the incumbent. Political scientists Robert Erikson (of Columbia) and Christopher Wlezien (of Temple) have recently mined daily polling reports from the last half-century of elections, mapping the relationship between early polling numbers and final election returns. At this point in the race, they find that around half of any lead should be discounted, as early advantages tend to dissipate. (You can read the full paper here, or an ungated version here.)
I just looked at this too. Although I doubt that my methodology is anywhere near as robust, I found the same thing that Erikson did: you would need to discount a polling lead more heavily for an incumbent than for a challenger.
Profs. Erikson and Wlezien point to another reason to be wary of Sen. Obama’s early polling lead: On average, the voting public tends to be more strongly anti-incumbent three-and-a-half years into an administration than they are on Election Day. Based on patterns in previous cycles, the professors suggest that this exaggerated anti-incumbent feeling is boosting Sen. Obama’s lead by around three percentage points.
So if you first halve Obama’s six point lead, then subtract a three point anti-incumbency bias, you are left with a dead heat.
The problem is that while McCain comes from the incumbent party, McCain himself is not an incumbent. He is not even a pseudo-incumbent (a.k.a. the sitting Vice President), the first time this situation has occurred since 1952.
Things start to get fairly complicated if you look at the interaction effects between a candidate's lead in the polling, the number of days until the election, and the incumbency, especially since there are different degrees of incumbency, and also interaction effects and various other sorts of non-linearity between all these different variables.
But from what best I can tell, the incumbent effect that Wolfers and Erikson have identified is smaller when we're dealing with the incumbent party rather than an actual, incumbent President -- probably more like one point rather than three at this stage of the cycle. And in '52, when you had neither an incumbent President nor a sitting Vice President running, the large lead that Dwight Eisenhower had in the polling held up quite well.
My general philosophy behind my modeling is to make everything "candidate-neutral". The model knows that there are two candidates and that they have certain polling numbers, but it doesn't know who is a Republican or who is a Democrat, or who is an incumbent and who isn't. So I won't say "the polling numbers will move toward Candidate X", although I do say "the polling numbers are likely to move toward whichever candidate happens to be trailing".
If I did look at that stuff, the numbers might be moved a couple points in McCain's direction, because of this incumbent issue I just described and also because there has been some tendency for the polling to overstate the performance of the Democrat. On the other hand, if one starts to consider candidate-related variables, there are another whole set of 'meta' variables that one might want to evaluate too, such as the condition of the economy, the presence of a war, incumbent approval ratings, and party registration figures. Most of those factors would tend to favor Mr. Obama.
Making things candidate neutral is partially a marketing decision -- I can't imagine how much yelling and screaming there would be if I said "let's give McCain 2 bonus points because he's a Republican" or "let's give Obama 2 bonus points because the economy stinks". But also, the set of past polling data is not that robust -- about 14 elections that were polled scientifically, in only the last several of which did you have multiple agencies releasing polling data at regular intervals. If you want to look at something like "Democrat challenging Republican quasi-incumbent in wartime with crappy economy", your dataset gets down to zero pretty quickly.
-- Nate at 11:42 AM 47 Comments...
Labels: meta, methodology
Monday, July 7, 2008
State Similarity Scores
For the most of you who haven't followed my baseball work, I am best known for inventing a forecasting system called PECOTA, which generates predictions by comparing baseball players with a large database of historical peers and identifying the most similar ones. This same technology -- which is really just a variant of nearest neighbor analysis -- can be applied to virtually anything, including identifying the similarity of any two states along a number of dimensions of political salience. In fact, that's exactly what I've done in the chart below, with each state listed along with its three most similar states.
What factors go into the similarity score? There are quite a few, which are weighted in rough proportion to their importance in determining the Kerry-Bush result in 2004 and the McCain-Obama polling this year according to an analysis of variance.
Specifically, those variables are: (1) Partisan ID index; (2) Likert liberal-conservative score; (3) Average years of completed schooling per adult; (4) Per Capita Income; (5) 18-29 year old population; (6) senior population; (7) African-American population; (8) Hispanic population; (9) percentage of white evangelicals; (10) Catholic population; (11) Mormon/LDS population; (12) percentage of military veterans; (13) percentage of same-sex partner households; (14) gun ownership rate; (15) percentage of adults identifying ancestry as 'American'; (16) percentage suburban; (17) percentage of state jobs in manufacturing sector; (18) current unemployment rate, and (19) latitude and longitude (e.g. geographic distance).
The highest score theoretically achievable is 100, for two states that are exactly identical along each of these 19 dimensions. The highest score in practice is 71 between North and South Carolina. A score of 0 represents states that are as dissimilar as similar, and negative scores are both possible and quite common (though I list them as zeroes in the table above).
Note that some states really aren't like any other states at all, including big ones like Florida and Texas and small ones like Alaska, Utah, and New Mexico. Then there are other states that are sort of within the main sequence but need to pull from different regions -- like Indiana, whose three most similar states span the Midwest (Ohio), South (North Carolina) and the Prairies (Kansas).
And yes, this does have implications for our model, which will become clear at some point soon.
-- Nate at 7:18 AM 79 Comments...
Labels: indiana, meta, portfolio theory
Thursday, July 3, 2008
Return on Investment
Our continuously-updated list of Tipping Point States -- recently represented with a snazzy new map -- tells you which states are most likely to determine the outcome of this year's election. As described in the FAQ:
'Tipping Point States' are those states that tip the outcome of the election from one candidate to the other. In each simulation run, the states are lined up from best to worst for each candidate. The states are marked off sequentially until the candidate reaches 270 electroal votes. The state responsible for putting the candidate over the top to 270 electoral votes is the tipping point state for that simulation run.Naturally, Tipping Point States are usually going to be those associated with large electoral vote counts. It's much more likely that a state like Pennsylvania, which has 21 electoral votes, will make the difference between winning and losing the election than something like Montana, which has 3. The goal of the Tipping Point States metric is to balance which states are closest to the median of the electorate with the value of each state in the Electoral College -- and it generally comes up with pretty intuitive results.
However, it is not necessarily the case that the states with the highest Tipping Point number will represent the best return on investment for the candidate. While Pennsylvania is more likely to swing the election than Montana, it is also many times more expensive. Of course, a campaign will still want to invest more in Pennsylvania than it does in Montana in the aggregate. But which state is better on a dollar-per-dollar basis?
To get at this, what we can do is divide a state's Tipping Point percentage by its population (more specifically, it's eligible voter population). What this implicitly assumes is that the expense of competing in a state is proportional to its eligible voter population. Strictly speaking, this is not true, especially when it comes to television buys, where there are a lot of idiosyncrasies related to the geography of different TV markets (something I'll be writing more about in the near future). But it's a reasonably safe and neutral assumption for our purposes here.
This calculation produces a ratio, whose value is meaningless in the abstract, but which can be compared to the ratio in the country as a whole (in other words, we're taking the ratio of the ratios). The ratio figures, for instance, that a dollar spent in Pennsylvania is about 3.5 times more likely to influence the outcome of the election than one spent in the nation as a whole. This is what we call the "Return on Investment Index". Which states have the highest ROIs?

The top state is New Mexico, which produces an ROI almost 6 times higher than the nation at large. Why New Mexico? We project it to be very close to the median of the electorate. Right now, we are predicting a 2.7-point victory for Barack Obama in New Mexico, versus a 3.7-point victory in the national popular vote. Strictly speaking, the states that deserve the most attention are not those that are closest at any given moment, but rather those that are closest to the national average. If, say, Barack Obama has built a 12-point lead in August, you will probably start to see some weird things like Mississippi being a toss-up. But that doesn't mean the Obama campaign should at that point begin to invest heavily in Mississippi, because the only time the decision to invest in an individual state matters is when the election is close. If that hypothetical 12-point lead in August reverted back to a 1-point lead in October -- the only contingency that matters, it is very likely that Mississippi would no longer be one of the closer states. Likewise, even though Obama has a "safe" lead in Pennsylvania now, he cannot stop campaigning there (nor can McCain), because if the election tightens, Pennsylvania is liable to be within a couple of points.
The other small advantage in an investment in New Mexico is that small states have more electoral votes per eligible voter: New Mexico offers one electoral vote for every 274,000 eligible voters, whereas Pennsylvania offers one per 449,000.
Overall, the map suggests a slightly more defensive-minded resource allocation strategy than the one that the Obama campaign is employing currently. It doesn't look like states like Oregon and Iowa are going to be all that close now, for instance, but it also doesn't look like the election is going to be all that close; if the polls tighten, they may be vulnerable. At the same time, the calculation validates the Obama campaign's decision to put resources into states like North Dakota (which ranks 10th by this metric), Montana (14th) and Alaska (17th). By contrast, Florida ranks just 25th. It's running about 6 points behind Obama's national averages, and it's extremely expensive to compete in.
-- Nate at 6:27 PM 77 Comments...
Labels: advertising, meta, swing states
Tuesday, July 1, 2008
And Here I Thought John Kerry Lost...
...it turns out that the 2004 election was a "statistical dead heat". With just over four months remaining until voters weigh in at the polls, the new survey out Tuesday indicates Obama holds a narrow 5-point advantage among registered voters nationwide over the Arizona senator, 50 percent to 45 percent. That represents little change from a similar poll one month ago, when the presumptive Democratic presidential nominee held a 46-43 percent edge over McCain. CNN Polling Director Keating Holland notes Tuesday's survey confirms what a string of national polls released this month have shown: Obama holds a slight advantage over McCain, though not a big enough one to constitute a statistical lead. "Every standard telephone poll taken in June has shown Obama ahead of McCain, with nearly all of them showing Obama's margin somewhere between three and six points," Holland said. "In most of them, that margin is not enough to give him a lead in a statistical sense, but it appears that June has been a good month for Obama."
From the National Council On Public Polls:Certainly, if the gap between the two candidates is less than the sampling error margin, you should not say that one candidate is ahead of the other. You can say the race is "close," the race is "roughly even," or there is "little difference between the candidates." But it should not be called a "dead heat" unless the candidates are tied with the same percentages. And it certainly is not a “statistical tie” unless both candidates have the same exact percentages.
From CNN:
And just as certainly, when the gap between the two candidates is equal to or more than twice the error margin – 6 percentage points in our example – and if there are only two candidates and no undecided voters, you can say with confidence that the poll says Candidate A is clearly leading Candidate B.
When the gap between the two candidates is more than the error margin but less than twice the error margin, you should say that Candidate A "is ahead," "has an advantage" or "holds an edge." The story should mention that there is a small possibility that Candidate B is ahead of Candidate A. (CNN) — With the dust having finally settled after the prolonged Democratic presidential primary, a new CNN/Opinion Research Corporation poll shows Sens. John McCain and Barack Obama locked in a statistical dead heat in the race for the White House.
p.s. Yes, I was being being slightly facetious with the headline.
The poll, conducted June 26-29, surveyed 906 registered voters and carries a margin of error of plus or minus 3.5 percentage points.
-- Nate at 6:56 PM 88 Comments...
Monday, June 30, 2008
Internal Polls
This site has had a ban on listing internal polls for some time now. The logic behind this is that when a candidate for office commissions a poll, he is only liable to leak its results to the public if it contains good news for him, thereby encouraging donors, press persons, etc. This does not mean per se that the poll is "biased" -- many pollsters do very good and thorough work on behalf of campaigns and affiliated interest groups. But it does mean that there may be a bias in which information becomes part of the public record: we learn about a poll that has a candidate ahead by 10 points in a state, but not one where he is down by 2. For this reason, such polls have been excluded.
There have been an increasing number of surveys, however, particularly on the Senate side of things, that somewhat test our definition of an "internal poll". Where would you draw the line on the following spectrum?
1. Polls commissioned by the candidate himself.
2. Polls commissioned by another candidate for office in that state.
3. Polls conducted by a national campaign committee (e.g. RNC, DSCC)
4. Polls conducted by an interest group (Emily's List, US Chamber of Commerce), but formally unassociated with the candidate.
5. Polls that are private, but conducted on behalf of someone with no direct interest in the campaign, such as an outside lobbying group.
Presently, I have been drawing the line between #3 and #4. But I'm not sure that there's a major philosophical difference between, for instance, Emily's List commissioning a poll, and the DNC doing so. I'm also not so sure that I necessarily have things in the right order.
Anyway, I've come to very much trust in the wisdom of the 538 crowd -- so opinions are solicited and appreciated.
-- Nate at 2:11 PM 70 Comments...
Labels: internal polls, meta, methodology, site
Thursday, June 26, 2008
Should we be discounting Obama's lead? The next and hopefully last Big Change.
Thus far, the trendline adjustment that we implemented last week has been quite successful. It has correctly anticipated bounces for Obama in states ranging from Florida to Ohio to Tennessee. It has allowed the model to fall more intuitively into line with changes in the momentum of the race, and to correct some of the timing bias associated with different states being polled at different times.
The model believes that if the election were held today, Obama would win by approximately 6 points. That's very close to his current lead in the national polling. Intuitively, it feels just about right to me.
However, our goal is not to predict what would happen if the election were held today. Our goal is to predict what will happen in November. In an earlier article on this subject, I framed the question thusly: Suppose we are correct that Obama would win an election held today by 6 points. Is a 6-point Obama win therefore the best prediction of the outcome in November? Up until now, our model has always assumed that it was.
However, this assumption is not correct. Rather, there is a fairly strong tendency for national polling to tighten as one approaches election day. National polls are not equally likely to move upward or downward at any given time. Rather, they are more likely to move in the direction of the candidate who is trailing in the race.
This tendency is actually fairly easy to eyeball if you look at some historical polling data. Below is a table containing the largest lead held by each candidate in any public poll in my database released within 200 days of that year's election. For 1952-1984 and 1996, the database consists of Gallup polling only; for the other years, it consists of a variety of national polls.Largest leads for each candidate in public poll
released within 200 days of general election.
.... Biggest Biggest ......
Year GOP Lead DEM Lead Result
---------------------------------------------------
1952 Eisenhower +28 None* Eisenhower +11
1956 Eisenhower +27 None* Eisenhower +15
1960 Nixon +6 Kennedy +4 Kennedy +0.2
1964 None* Johnson +59 Johnson +23
1968 Nixon +16 Humphrey +6 Nixon +0.7
1972 Nixon +34 None* Nixon +23
1976 Ford +1 Carter +33 Carter +2
1980 Reagan +16 Carter +8 Reagan +10
1984 Reagan +21 None* Reagan +18
1988 Bush +17 Dukakis +18 Bush +8
1992 Bush +16 Clinton +30 Clinton +6
1996 None* Clinton +23 Clinton +9
2000 Bush +14 Gore +17 Gore +0.5
2004 Bush +13 Kerry +11 Bush +2
* In 1952, 1956, 19
48 comments
Post a Comment