Thursday, June 26, 2008

Should we be discounting Obama's lead? The next and hopefully last Big Change.

Thus far, the trendline adjustment that we implemented last week has been quite successful. It has correctly anticipated bounces for Obama in states ranging from Florida to Ohio to Tennessee. It has allowed the model to fall more intuitively into line with changes in the momentum of the race, and to correct some of the timing bias associated with different states being polled at different times.

The model believes that if the election were held today, Obama would win by approximately 6 points. That's very close to his current lead in the national polling. Intuitively, it feels just about right to me.

However, our goal is not to predict what would happen if the election were held today. Our goal is to predict what will happen in November. In an earlier article on this subject, I framed the question thusly: Suppose we are correct that Obama would win an election held today by 6 points. Is a 6-point Obama win therefore the best prediction of the outcome in November? Up until now, our model has always assumed that it was.

However, this assumption is not correct. Rather, there is a fairly strong tendency for national polling to tighten as one approaches election day. National polls are not equally likely to move upward or downward at any given time. Rather, they are more likely to move in the direction of the candidate who is trailing in the race.

This tendency is actually fairly easy to eyeball if you look at some historical polling data. Below is a table containing the largest lead held by each candidate in any public poll in my database released within 200 days of that year's election. For 1952-1984 and 1996, the database consists of Gallup polling only; for the other years, it consists of a variety of national polls.

Largest leads for each candidate in public poll
released within 200 days of general election.

.... Biggest Biggest ......
Year GOP Lead DEM Lead Result
---------------------------------------------------
1952 Eisenhower +28 None* Eisenhower +11
1956 Eisenhower +27 None* Eisenhower +15
1960 Nixon +6 Kennedy +4 Kennedy +0.2
1964 None* Johnson +59 Johnson +23
1968 Nixon +16 Humphrey +6 Nixon +0.7
1972 Nixon +34 None* Nixon +23
1976 Ford +1 Carter +33 Carter +2
1980 Reagan +16 Carter +8 Reagan +10
1984 Reagan +21 None* Reagan +18
1988 Bush +17 Dukakis +18 Bush +8
1992 Bush +16 Clinton +30 Clinton +6
1996 None* Clinton +23 Clinton +9
2000 Bush +14 Gore +17 Gore +0.5
2004 Bush +13 Kerry +11 Bush +2

* In 1952, 1956, 1964, 1972, 1984 and 1996, one candidate
led in all public polls in my database taken within 200
days of the election. The *closest* that the trailing
candidate came in those years was as follows: Stevenson
(1952), 2 points; Stevenson (1956), 10 points; Goldwater
(1964), 28 points; McGovern (1972), 16 points; Mondale
(1984), 1 point; Dole (1996), 11 points.
Look at some of those numbers! LBJ at one point had a 59-point lead over Barry Goldwater. Bill Clinton once polled 30 points ahead of George Bush (and Bush once polled 16 points ahead of Clinton). Jimmy Carter once held a 33-point lead on Gerald Ford.

Of course, if you go about looking for the largest leads you can find, you are naturally going to expect to see some regression to the mean. But even if we look at this data more systematically, we still find a fairly robust tendency for a lead in the national polling to diminish by election day. The extent to which it diminishes is a function of two things: the magnitude of the lead -- the larger the lead, the more it needs to be discounted -- and the number of days until the election. We can specify a regression equation to project the November outcome based on a candidate's present polling lead as follows:
PROJECTION
= MARGIN*.909
+ MARGIN*ROOTDAYS*-.0475
+ SQRT(MARGIN)*ROOTDAYS*.0604

ROOTDAYS = Square root of the number of days until election.
MARGIN = Size of lead for leading candidate.
Visually, that looks about like this:



This chart is perhaps a little confusing, but it's exhibiting the two essential features that I talked about before: the larger the lead, the more it needs to be discounted (both proportionately and absolutely), and the closer we get to election day, the less it needs to be discounted. Particularly, a lead starts to become significantly more meaningful once we get within about 30 days of the election, although it's also the case that presidential elections have tended to tighten within the last 30 days.

So, for instance, a 20-point lead in a poll 300 days before the election projects to only a 6-7 point victory in November. A 15-point lead in a poll taken 100 days before the election projects to a 9-point victory. And so forth. These are very significant corrections; big leads held a long ways before the election must be discounted quite heavily.

As for Barack Obama's lead right now, the correction required is not quite as dramatic. The regression equation specifies that a 5.9-point lead held 130 days before the election should be discounted by about one-third -- to 3.8 points to be exact. That is our new projection for Obama's margin of victory.

Specifically, what the model now does is to calibrate the trend adjustment to a candidate's discounted lead in the polls. What this process involves is to run the numbers once through without the discount (just as we had run them before), and then figure out the difference between the candidate's current lead and his projected winning margin based on our discount formula, and then subtract that number of points from the candidate's margin in each state. Put less fancily, we are subtracting 2.1 points from Obama's present trend-adjusted estimate in every state, because all else being equal, we expect McCain to gain 2.1 points between now and November. This lowers Obama's win percentage from 76 percent to 69 percent, a figure that squares a lot better with my intuition about this election.

*-*

I'm sure that people are sick and tired of all these changes, but this really ought to be the last missing piece of the puzzle, and it's something that we absolutely must do if our goal is to predict the November outcome rather than merely give a snapshot of the current polling. This is something, frankly, that I should have looked at before, although since the election had been so close until recently, it would not have mattered very much.

You'll also notice one other, less important change. Our projection now allocates the undecideds in each state 50:50 to the two major candidates, after making an allocation for third-party votes. The third-party allocation differs slightly from state to state depending on the other + undecided vote in that state's polling. The model had implicitly been allocating the undecideds this way before, but now I'm doing it explicitly, as I want to make it absolutely clear that our projection in each state is in fact a projection of the final outcome rather than some kind of supercharged polling average.

Acknowledgments: I again want to thank Robert Erikson of Columbia University, who has performed similar calculations in the past and gave me the idea for this one, and Andrew Gelman, also of Columbia, who lent me use of his historical polling database.

EDIT: Per some early feedback in the comments, I have changed the way I present the polling detail chart. What we formerly called our projection is now presented as before and described as the "Snapshot". The Snapshot is our best estimate of what the election would look like if it were held today.

In contrast to the Snapshot is the Projection, which discounts current national polling leads through the process described herein, and also allocates out the undecided vote. This is our best guess at what the election will look like in November.

135 comments

Mark said...

Looks brilliant as usual, Nate.

I notice that with the new calculations, McCain is about four and a half points below the magic 50% mark in his home state of Arizona. That's a substantial number of people going for third parties or Nader or something, around 17% of total Arizona voters, and that's just one example. How much do you figure the underdog candidates are likely to wrap up nationally in the popular vote? I mean, that's certainly much more than they picked up in the last two elections.

Alex said...

Well, I won't say you're wrong, but color me unconvinced. When did those various closing leads occur during the race? (I assume that their proximity to the election was how you derived your projection equation, but I'd be interested to see those projection equation lines superimposed on actual polling data throughout the election.) What circumstances surrounded losing those leads? How do we know that similar circumstances apply today?

I ask because I feel like you're going off a symptom of elections (closing polls) without really explaining the cause of that symptom or why it is present today. It may be enough that we can be psychohistorical about it all and just assert that what happened then should and will happen now, but I'm...just not sure.

Anyways, it will be very interesting to see how it plays out. Thanks for the update and I hope we get more data out of you soon!

Mark said...

One more thing - I'd suggest separate counters and graphs for the current projection vs. the November projection. Might make it easier to keep track of shifts and changes, because if everything's calibrated for the November projection and there's a sea change - October surprise, assassination (God forbid), ill-advised photo op with Hugo Chávez - it might be hard to gauge the immediate effects.

Anonymous said...

Can I just tell you you're a genius and I have no idea what half the numbers or graphs mean but the charts are so pretty anyway.

I hope you're right. GOBAMA

Milos said...

You're a goddamn machine Nate. I love you and I hope you never stop. Great post.

Anonymous said...

I guess this confirms that you are trying to accurate and not a shill for Obama. Of course, we will still get people that will come on here and say otherwise.

Thanks and keep up the interesting work.

By the way, when you are done with the election, can you do anything to fix my beloved Seattle Mariners.

Daniel M.

Another Mike said...

Please include a separate row in each state projection for this new adjustment rather than combining it with the trend adjustment. These really are two VERY different adjustments. Remember your readers here like to know how the sausage is being made.

MC said...

Isn't this uncertainty built in to the change earlier where you adjust the error based on the days until the election?

Anonymous said...

Poblano:

I think you're shooting yourself in foot again here like you were when you first factored in national polling data. There is no apparant reason to apply the discount of the national polling equally to the margins in all states. Rather, since you already corrected for the relative effect of trends on each state, you should instead discount the predicted change in national polling from the national polling trend as applied to the states before running the regression (I guess this would amount to running the trend regression once to determine the margin, and then a second time after the discount is applied).

You've said yourself that some states respond more to shifts in national polling than others, so you really should observe those differences. That's not to even mention problems such as if McCain suddenly had a 95% margin in Utah the same week Obama was up 15% nationally. Would it then make sense to give McCain a margin over 100%?

Anonymous said...

The anonymous from above here. That came off cruder than I meant it to. I think the changes you have been making are better to be made uniformly across the whole country than not at all, but it still seems that better can be achieved and I am confident that you are able to achieve that, as you did when first factoring in national polling. Not trying to lecture you on what the specific remedy should be or anything.

Anonymous said...

I see your overall point but I think you made a big mistake in using the largest lead as your starting point in the analysis. Obviously, somebody is going to win by significantly less than their largest lead. If the final score was bigger, the final score would be the largest lead. I see the same mistake in terms of studies of stocks, where they track the fall of xyz stock from its overall high or the rise from its overall low.

I'd like to see the same study redone with whatever lead the person had at this point in the contest.

There are a couple of caveats that would limit the effect of any previous election poll study for this election. I believe that all elections ultimately come down to "incumbent party vs. challenger party" A) The Dem contest was very well covered. Hence, Obama is better known than most challengers. B) McCain is not president or VP. Unlike almost every other incumbent party candidate, he has very little ties to the incumbent. I think these factors will change the overall case.