One of the fun things we can do with our model is to change the election date to anything we like and see how it might impact the results. So what would the model project if the election were scheduled for tomorrow?
Notice how the red states turn redder and the blue states turn bluer. This is something you'll see happen very gradually over the course of the next four months. As we near the election, smaller leads will become more meaningful, and more states will be taken out of play.
Overall, our model thinks that Obama would have an 88.6 percent chance of winning an election held tomorrow. The most competitive states would be as follows:
1. Virginia. Obama 58% to win, McCain 42%
2. Florida. McCain 62% to win, Obama 38%
3. Indiana. McCain 63% to win, Obama 37%
4. Nevada. McCain 63% to win, Obama 37%
5. Missouri. McCain 65% to win, Obama 35%
6. North Dakota. McCain 79% to win, Obama 21%
7. North Carolina. McCain 81% to win, Obama 19%
8. Colorado. Obama 83% to win, McCain 17%
9. Louisiana. McCain 85% to win, Obama 15%
10. Ohio. Obama 88% to win, McCain 12%
11. New Mexico. Obama 88% to win, McCain 12%
12. South Dakota. McCain 89% to win, Obama 11%
13. Michigan. Obama 90% to win, McCain 10%
14. West Virginia. McCain 90% to win, Obama 10%
15. Montana. McCain 91% to win, Obama 9%
16. Alaska. McCain 92% to win, Obama 8%
17. South Carolina. McCain 94% to win, Obama 6%
All other states are at 95% or better for the leading candidate.
McCain's electoral prospects, unsurprisingly, would be very much tied to his chances of pulling out an upset in Ohio or Michigan. One thing that's going on here, by the way, is that the amount of polling data in a particular state is a factor. For example, our model thinks that McCain would win by 4.4 points in North Dakota in an election held tomorrow, but still assigns Obama a 21 percent chance of winning there because the polling data is so sparse that we can't quite be certain what's going on. In Ohio, meanwhile, the model projects an Obama win of 4.2 points -- less than McCain's projected margin in North Dakota -- but gives him an 88 percent chance of winning because that small lead is backed up by much more polling volume.
And there is still an outside chance that something totally wacky could happen, with the polling data off in the same direction in many states. As recently as 2000, there was about a 3-point miss in the final national polling averages ... Bush was expected to win that election by about 2.5 points but Gore actually won the popular vote by a half-point. In 1996, Bill Clinton was 12 points ahead in the final polling averages but won by "only" 8.5 points ... in 1980, Ronald Reagan led by just 3 points in Gallup's last poll but actually won by almost 10 points. Sometimes, there is movement that comes too late for the polls to detect, or everyone is off in the same direction with their turnout assumptions.
Saturday, June 28, 2008
If the Election WERE Held Today...
-- Nate at 8:01 PM 30 Comments...
Labels: history, scenario testing
Thursday, June 26, 2008
Should we be discounting Obama's lead? The next and hopefully last Big Change.
Thus far, the trendline adjustment that we implemented last week has been quite successful. It has correctly anticipated bounces for Obama in states ranging from Florida to Ohio to Tennessee. It has allowed the model to fall more intuitively into line with changes in the momentum of the race, and to correct some of the timing bias associated with different states being polled at different times.
The model believes that if the election were held today, Obama would win by approximately 6 points. That's very close to his current lead in the national polling. Intuitively, it feels just about right to me.
However, our goal is not to predict what would happen if the election were held today. Our goal is to predict what will happen in November. In an earlier article on this subject, I framed the question thusly: Suppose we are correct that Obama would win an election held today by 6 points. Is a 6-point Obama win therefore the best prediction of the outcome in November? Up until now, our model has always assumed that it was.
However, this assumption is not correct. Rather, there is a fairly strong tendency for national polling to tighten as one approaches election day. National polls are not equally likely to move upward or downward at any given time. Rather, they are more likely to move in the direction of the candidate who is trailing in the race.
This tendency is actually fairly easy to eyeball if you look at some historical polling data. Below is a table containing the largest lead held by each candidate in any public poll in my database released within 200 days of that year's election. For 1952-1984 and 1996, the database consists of Gallup polling only; for the other years, it consists of a variety of national polls.Largest leads for each candidate in public poll
Look at some of those numbers! LBJ at one point had a 59-point lead over Barry Goldwater. Bill Clinton once polled 30 points ahead of George Bush (and Bush once polled 16 points ahead of Clinton). Jimmy Carter once held a 33-point lead on Gerald Ford.
released within 200 days of general election.
.... Biggest Biggest ......
Year GOP Lead DEM Lead Result
---------------------------------------------------
1952 Eisenhower +28 None* Eisenhower +11
1956 Eisenhower +27 None* Eisenhower +15
1960 Nixon +6 Kennedy +4 Kennedy +0.2
1964 None* Johnson +59 Johnson +23
1968 Nixon +16 Humphrey +6 Nixon +0.7
1972 Nixon +34 None* Nixon +23
1976 Ford +1 Carter +33 Carter +2
1980 Reagan +16 Carter +8 Reagan +10
1984 Reagan +21 None* Reagan +18
1988 Bush +17 Dukakis +18 Bush +8
1992 Bush +16 Clinton +30 Clinton +6
1996 None* Clinton +23 Clinton +9
2000 Bush +14 Gore +17 Gore +0.5
2004 Bush +13 Kerry +11 Bush +2
* In 1952, 1956, 1964, 1972, 1984 and 1996, one candidate
led in all public polls in my database taken within 200
days of the election. The *closest* that the trailing
candidate came in those years was as follows: Stevenson
(1952), 2 points; Stevenson (1956), 10 points; Goldwater
(1964), 28 points; McGovern (1972), 16 points; Mondale
(1984), 1 point; Dole (1996), 11 points.
Of course, if you go about looking for the largest leads you can find, you are naturally going to expect to see some regression to the mean. But even if we look at this data more systematically, we still find a fairly robust tendency for a lead in the national polling to diminish by election day. The extent to which it diminishes is a function of two things: the magnitude of the lead -- the larger the lead, the more it needs to be discounted -- and the number of days until the election. We can specify a regression equation to project the November outcome based on a candidate's present polling lead as follows:PROJECTION
Visually, that looks about like this:
= MARGIN*.909
+ MARGIN*ROOTDAYS*-.0475
+ SQRT(MARGIN)*ROOTDAYS*.0604
ROOTDAYS = Square root of the number of days until election.
MARGIN = Size of lead for leading candidate.
This chart is perhaps a little confusing, but it's exhibiting the two essential features that I talked about before: the larger the lead, the more it needs to be discounted (both proportionately and absolutely), and the closer we get to election day, the less it needs to be discounted. Particularly, a lead starts to become significantly more meaningful once we get within about 30 days of the election, although it's also the case that presidential elections have tended to tighten within the last 30 days.
So, for instance, a 20-point lead in a poll 300 days before the election projects to only a 6-7 point victory in November. A 15-point lead in a poll taken 100 days before the election projects to a 9-point victory. And so forth. These are very significant corrections; big leads held a long ways before the election must be discounted quite heavily.
As for Barack Obama's lead right now, the correction required is not quite as dramatic. The regression equation specifies that a 5.9-point lead held 130 days before the election should be discounted by about one-third -- to 3.8 points to be exact. That is our new projection for Obama's margin of victory.
Specifically, what the model now does is to calibrate the trend adjustment to a candidate's discounted lead in the polls. What this process involves is to run the numbers once through without the discount (just as we had run them before), and then figure out the difference between the candidate's current lead and his projected winning margin based on our discount formula, and then subtract that number of points from the candidate's margin in each state. Put less fancily, we are subtracting 2.1 points from Obama's present trend-adjusted estimate in every state, because all else being equal, we expect McCain to gain 2.1 points between now and November. This lowers Obama's win percentage from 76 percent to 69 percent, a figure that squares a lot better with my intuition about this election.
*-*
I'm sure that people are sick and tired of all these changes, but this really ought to be the last missing piece of the puzzle, and it's something that we absolutely must do if our goal is to predict the November outcome rather than merely give a snapshot of the current polling. This is something, frankly, that I should have looked at before, although since the election had been so close until recently, it would not have mattered very much.
You'll also notice one other, less important change. Our projection now allocates the undecideds in each state 50:50 to the two major candidates, after making an allocation for third-party votes. The third-party allocation differs slightly from state to state depending on the other + undecided vote in that state's polling. The model had implicitly been allocating the undecideds this way before, but now I'm doing it explicitly, as I want to make it absolutely clear that our projection in each state is in fact a projection of the final outcome rather than some kind of supercharged polling average.
Acknowledgments: I again want to thank Robert Erikson of Columbia University, who has performed similar calculations in the past and gave me the idea for this one, and Andrew Gelman, also of Columbia, who lent me use of his historical polling database.
EDIT: Per some early feedback in the comments, I have changed the way I present the polling detail chart. What we formerly called our projection is now presented as before and described as the "Snapshot". The Snapshot is our best estimate of what the election would look like if it were held today.
In contrast to the Snapshot is the Projection, which discounts current national polling leads through the process described herein, and also allocates out the undecided vote. This is our best guess at what the election will look like in November.
-- Nate at 6:55 PM 133 Comments...
Labels: history, meta, methodology, site, timing bias
Monday, June 23, 2008
Today's Polls, 6/23
We have three Rasmussen Reports polls to kick off the week, the most intriguing of which is in Pennsylvania, where Barack Obama leads John McCain by 4 points. That's a slight improvement for Obama from Rasmussen's May poll, when he had led by 2. Nevertheless, Rasmussen shows Pennsylvania tighter than some other polling of the state.
The operative question about Pennsylvania is whether John McCain should make a serious effort to compete there. The way to evaluate this is not by looking at the Pennsylvania numbers in the abstract, but by comparing them to the national averages. Presently, we project Barack Obama to win Pennsylvania by 8.3 points, but we also show him winning the entire election by 5.2 points. That means that if the race tightens to a draw -- and it's only when the popular vote is very close that electoral math matters -- we'd expect Pennsylvania to be in the range of Obama +3. That's close enough such that it's too early for McCain to write Pennsylvania off entirely. A lot of things will have to go right for McCain to win Pennsylvania, but then again, a lot of things will have to go right for him to win this election.
The New Mexico result is Obama +8; he had been ahead by 9 in Rasmussen's May poll. One thing to keep in mind is that, while we focus a lot on trendlines within any one given agency's polling, the comparison to other pollsters does matter too. Thus, while Obama lost a point in New Mexico relative to Rasmussen's previous poll, our win percentage for him went up there, since an 8-point lead is still much comfier than other pollsters have seen the state. Conversely, even though Obama gained in Pennsylvania relative to Rasmussen's previous poll, his win percentage went down there, since Rasmussen continues to see the state much tighter than agencies like Quinnipiac.
Finally, in Utah, Rasmussen has John McCain ahead by 19 points. Utah is probably interesting only insofar as trivia questions go. When Bill Clinton won the election in 1992, Nebraska was his worst state, which he lost by "only" 17.2 points. That was the best worst state for a winning candidate since FDR, assuming that you count the District of Columbia as a state. The modern record, however, appears to be held by Woodrow Wilson, who lost no state by more than 11.6 points in the three-way election of 1912. If Obama has an exceptionally good election night, it is conceivable that he could threaten some of those records, but Utah will almost assuredly be his limiting factor.
-- Nate at 4:59 PM 31 Comments...
Labels: history, new mexico, pennsylvania, today's polls, utah
Sunday, June 22, 2008
Why Obama isn't like Dukakis
As several observers have noted recently, including yours truly, June polling has not been a particularly good predictor of November results. In four out of the last five elections, the candidate leading in the polls in June went on to lose the popular vote. The largest discrepancy was in 1988, when Michael Dukakis, 8.2 points ahead in June, would eventually lose the election by 7.8 points -- a catastrophic 16-point swing against the Massachusetts governor.
This election too could move in any number of different directions. While Obama can presently be regarded as the healthy favorite, think of what a 16-point swing would mean in this year's election. If that swing were in Obama's direction (giving him a 21-point victory when added to his current lead of about 5 points) we would project Obama to win all states except Alabama, Idaho, Oklahoma and Utah. If it were in John McCain's direction instead, giving him an 11-point win nationwide, we would have him winning 42 out of 50 states.
The way that the Republicans achieved that big swing in 1988, assisted by a couple of significant gaffes from the Dukakis campaign, was to portray Dukakis as too liberal for the American mainstream. The same basic strategic template was employed against John Kerry in 2004. However, this strategy is unlikely to work in 2008. How come? Barack Obama is already perceived as being very liberal.
In a Rasmussen Reports poll conducted last week, 67 percent of likely voters described Obama as liberal, including 36 percent who described him as very liberal. By contrast, only 45 percent of voters described John Kerry as liberal in May of 2004, and 53 percent by November, 2004.
This shouldn't be terribly surprising. Obama is best known not so much as a candidate for the Presidency, but as one for the Democratic nomination. In contrast to Dukakis, who had both Jesse Jackson and Gary Hart flanking him to the left, and Kerry, who was perceived as the more centrist, electable alternative to Howard Dean, Obama had emerged by the end of the primary campaign as running to Hillary Clinton's left (Clinton being no conservative herself). Indeed, Obama is already perceived as substantially more liberal than Kerry was even after the Swift Boat ads, months' worth of framing the narrative, and tens of millions of dollars in attack advertising had gotten done with him.
But Obama is winning.
It may be that the primary fault line in this election is not liberal versus conservative, but change versus experience. Voters might think that Barack Obama is slightly further from them ideologically than is John McCain -- but they might also think that the country has been governed for eight years by a conservative, and that this governance has failed.
It may also be that voters are more conservative in theory than in practice. According to Rasmussen, 36 percent of voters describe themselves as conservative as opposed to 25 percent who say that they are liberal. This figure is not all that different from 2004, when 34 percent of voters said they were conservative and 21 percent liberal in exit polling. But if you look at the specific issues that loom largest in this campaign, the liberal position on things like pulling out from Iraq, implementing some kind of national health care policy, and increasing environmental regulation each poll at roughly 70/30 majorities.
There is also a school of thought that voters in Presidential elections tend to base their decisions less on the ideological attributes of a candidate and more on the personal ones. Obama's favorability rating presently stands at a +25. By contrast, John Kerry rarely did much better than even on this metric, depending on the specific wording of the question.
Either way, this is a significant problem for the Republicans. If their strategy is to say "Hey! Hey! Barack Obama is a liberal!", the American public's reaction is likely to be "Well, no shit! We're voting for him anyway."
This is not to say that McCain can gain no traction at all by trying to seize the political center. In fact, in an election in which the Democrats have something like a 4:3 edge in party identification, McCain absolutely has to find some way to win a majority of independent voters, and perhaps a fairly substantial one. Moreover, while the voters appear to be ready to elect a President they perceive as liberal, they surely won't be ready to elect one they perceive as radical, and so we can expect the Republicans to continue to play up Obama's associations with figures like Jeremiah Wright and William Ayers. This remains relatively dangerous territory for Obama.
However, if the Republicans attempt to recycle the 1988 or 2004 playbooks, they will probably not find the results to their liking. And if McCain at any point refers to Obama as a "Card-carrying member of the ACLU", you can be pretty sure that this election is over.
-- Nate at 3:21 PM 62 Comments...
Labels: history, ideology, obama, political spectrum
Monday, June 16, 2008
A Refinement to the Adjustment, Part I
In consideration of everyone's feedback, I am making two refinements to the timeline adjustment that I introduced yesterday.
The first refinement is to slightly dampen the effect of the timeline adjustment at the endpoints of the curve. The second is to use a state-specific timeline adjustment, rather than a one-size-fits all model. I will describe the first adjustment in this post.
Before I continue, I want to make clear what the goal of this project is. I want to provide you, at any given moment in time, with the best possible projection of what's going to happen in the November election. This is inherently a forward-looking exercise. If what you're interested in instead is simply a summation of what the polls are telling you now, there are plenty of other websites that can provide that for you. I do require that the projections be based on objective and quantifiable evidence. For example, I'm not going to say: "McCain is awful on the campaign trail, and people don't realize it yet. Let's take 5 points off his averages". Nor am I going to say "I heard from a well-connected source that the Republicans have put together a devastating attack ad on Barack Obama. We'd better cut his win percentage by 10 points". But that doesn't mean I'm going to limit myself to simply averaging the current polls.
* * *
In the long methodological discussion that we have had over the past couple days, there is one important point that hasn't been raised. Suppose you grant me that my timeline adjustment does an essentially optimal job of telling you what would happen if the election were held today? Does it necessarily follow that that the best projection of what would happen if the election were held today is also the best projection available to us of what would happen if the election were held tomorrow?
In other words, suppose that we are holding an election for the President of Hell. The candidates are Gary Condit and Mark Foley. In June, Foley leads by 2 points. In July, Foley leads by 5 points. What is our best possible projection in July of what the outcome will be in November? There are three possible answers to that question.
1. The random walk hypothesis. There is no way to guess whether the polls will move upward or downward in any given future period. Therefore, if a candidate's current lead in the polling is 5 points, our best guess at the eventual election outcome is 5 points.
2. The bounce hypothesis. Polls have some tendency to regress back to the mean established in previous periods. Therefore, if a candidate leads by 2 points in June, and by 5 points in July, our best guess is that he will probably finish somewhere between 2 points and 5 points ahead.
3. The trend hypothesis. This is sort of the opposite of the bounce hypothesis. Polling from previous periods does tell us something, but those polls are inversely related with the eventual outcome. So if Foley leads by 2 points in June and 5 points in July, that is evidence that he is trending upward, and is likely to eventually win by some number greater than 5 points.
I've tried to produce an answer to this question in several different ways, revisiting it this weekend by using Andrew Gelman's dataset. In some cases, like in 1988 or the summer of 1992, when the movement in the polls was fairly unidirectional for long periods of time, the more recent your poll was, the better off you'd be. In other cases, like in 2000 and 2004, the polls tended to oscillate, as though regressing back toward the mean; a bounce was usually just a bounce.
We can model this more formally by using different LOESS curves. The smoothness of a LOESS curve is determined by something called the smoothing parameter. A smoothing parameter of .7 or .8 will give you a very conservative curve that reacts slowly to new information (put differently, it still places some value in old information). A smoothing parameter of .3, on other hand, will give you an extremely volatile curve that gives a strong presumption to the most current information.
I went back and tried to evaluate whether there was an optimal smoothing parameter based on the weekly national polling averages from 1988, 1992, 2000 and 2004 (skipping 1996 because my dataset is scattershot for that year). I was looking for an answer in the following form: with X weeks to go until the general election, you will minimize your error by using smoothing parameter Y. If Y is a smaller number, like .3, that would be evidence for the random walk hypothesis or perhaps even the trend hypothesis. If Y is closer to .8, that would be evidence for the bounce hypothesis.
Unfortunately, there is no clear answer to this question. Different parameters performed better or worse in different elections, and at different points in those elections. All smoothing parameters from about .3 to .8 produced roughly the same average error when applied to the weekly polling data, with a possible exception of the two weeks immediately prior to the election, when a smaller parameter (e.g. a more sensitive curve) may be more desirable.
What this tells us is that it's frankly a judgment call as to how much emphasis we want to give to the most recent polling results. Neither the random walk hypothesis nor the bounce hypothesis can really be ruled out (we can probably rule out the trend hypothesis, however, as that would require low smoothing parameters to be demonstrably better than higher ones).
What I wound up doing was using a hybrid smoothing parameter, which is conservative toward the endpoints of the curve, but more aggressive in the middle of the curve.
There is a good, logical reason to do this, namely that we have less information available to us at the endpoints of the curve than we do in the middle. We can fairly clearly isolate the impact of something like Jeremiah Wright's first appearance on the scene, because we can look at polling both before and afterward: we see Obama's polls tumbling and then recovering. However, in trying to evaluate the polls right now, we only know what the polls were in the past; we do not know in which direction they'll move in the future. The hybrid curve allows us both to be fairly aggressive in isolating events that might have impacted the polls in the past, but also erring on the side of caution about the present direction of the polls.
The net effect of all of this is a somewhat more conservative estimate of Barack Obama's current strength in the polling; we know he's bouncing, but we don't know how long that bounce is going to last. If his polling remains strong into next week, that will be three weeks in a row where his numbers have shown a marked improvement, and even the most conservative estimator will start to give him credit for more or less the entirety of his bounce. If he and McCain regress back to a tie, on the other hand, we may even start to take a point or two away from polls that were conducted over the past couple of weeks. This is one thing, by the way, that I think some of the McCain supporters around here are missing. If Obama's post-nomination bounce does prove to be a temporary thing, we will be able to adjust for this more quickly, and recognize that states that were polled frequently during this period may not be as strong for him as they appear.
-- Nate at 12:35 AM 31 Comments...
Labels: bounces, history, meta, methodology, site
Saturday, June 14, 2008
We don't know as much as we think (Big Change #1)
There are two major changes to my methodology, which you already see reflected in the new charts and graphics that are presently on the site. This is the easier of the two to explain, so let's handle it first.
Andrew Gelman of Columbia University was kind enough to share some of his old national polling data with me. His dataset runs from 1952 through 1992. I took his data from 1988 and 1992 (before 1988, there are only a limited number of polls available), then combined it with the data I already had for 2000 and 2004, and tracked down some 1996 data in a magical place on the internet.
If you had looked back at the polls in June in the five previous election cycles, what would you have found?
In 1988, Michael Dukakis was ahead by an average of 8.2 points in 5 June polls. In November, George Bush won by 7.8 points.
In 1992, George Bush was ahead by an average of 4.9 points in 14 June polls. In November, Bill Clinton won by 5.6 points.
I don't actually have any June polls for 1996 (if anybody's sitting on a big stash of Clinton-Dole data, you know where to find me). But in Gallup's July poll, Bill Clinton led by 17 points. In November, Clinton won by 8.5 points.
In 2000, George W. Bush was ahead by an average of 4.7 points in 14 June polls. In November, Al Gore won the popular vote by 0.5 points.
In 2004, John Kerry was ahead by an average of 0.9 points in 16 June polls (this was pretty much his high-water mark all year). In November, George W. Bush won by 2.4 points.
So in four out of the last five elections, an average of June polls would have incorrectly picked the winner of the popular vote. That's kind of a problem for anybody who is overly confident about how this election is going to turn out.
Previously, I had modeled the error in our polling averages based on 2004 data (simply because that's the data I had access to). The issue with that is that the polls were unusually stable in 2004. From April onward, John Kerry never held a lead of more than about 2 points in the Real Clear Politics national average, and George W. Bush never held a lead of more than 6 or 7 points. Those numbers pretty well framed the actual result of Bush +
30 comments
Post a Comment