Thus far, the trendline adjustment that we implemented last week has been quite successful. It has correctly anticipated bounces for Obama in states ranging from Florida to Ohio to Tennessee. It has allowed the model to fall more intuitively into line with changes in the momentum of the race, and to correct some of the timing bias associated with different states being polled at different times.
The model believes that if the election were held today, Obama would win by approximately 6 points. That's very close to his current lead in the national polling. Intuitively, it feels just about right to me.
However, our goal is not to predict what would happen if the election were held today. Our goal is to predict what will happen in November. In an earlier article on this subject, I framed the question thusly: Suppose we are correct that Obama would win an election held today by 6 points. Is a 6-point Obama win therefore the best prediction of the outcome in November? Up until now, our model has always assumed that it was.
However, this assumption is not correct. Rather, there is a fairly strong tendency for national polling to tighten as one approaches election day. National polls are not equally likely to move upward or downward at any given time. Rather, they are more likely to move in the direction of the candidate who is trailing in the race.
This tendency is actually fairly easy to eyeball if you look at some historical polling data. Below is a table containing the largest lead held by each candidate in any public poll in my database released within 200 days of that year's election. For 1952-1984 and 1996, the database consists of Gallup polling only; for the other years, it consists of a variety of national polls.Largest leads for each candidate in public poll
Look at some of those numbers! LBJ at one point had a 59-point lead over Barry Goldwater. Bill Clinton once polled 30 points ahead of George Bush (and Bush once polled 16 points ahead of Clinton). Jimmy Carter once held a 33-point lead on Gerald Ford.
released within 200 days of general election.
.... Biggest Biggest ......
Year GOP Lead DEM Lead Result
---------------------------------------------------
1952 Eisenhower +28 None* Eisenhower +11
1956 Eisenhower +27 None* Eisenhower +15
1960 Nixon +6 Kennedy +4 Kennedy +0.2
1964 None* Johnson +59 Johnson +23
1968 Nixon +16 Humphrey +6 Nixon +0.7
1972 Nixon +34 None* Nixon +23
1976 Ford +1 Carter +33 Carter +2
1980 Reagan +16 Carter +8 Reagan +10
1984 Reagan +21 None* Reagan +18
1988 Bush +17 Dukakis +18 Bush +8
1992 Bush +16 Clinton +30 Clinton +6
1996 None* Clinton +23 Clinton +9
2000 Bush +14 Gore +17 Gore +0.5
2004 Bush +13 Kerry +11 Bush +2
* In 1952, 1956, 1964, 1972, 1984 and 1996, one candidate
led in all public polls in my database taken within 200
days of the election. The *closest* that the trailing
candidate came in those years was as follows: Stevenson
(1952), 2 points; Stevenson (1956), 10 points; Goldwater
(1964), 28 points; McGovern (1972), 16 points; Mondale
(1984), 1 point; Dole (1996), 11 points.
Of course, if you go about looking for the largest leads you can find, you are naturally going to expect to see some regression to the mean. But even if we look at this data more systematically, we still find a fairly robust tendency for a lead in the national polling to diminish by election day. The extent to which it diminishes is a function of two things: the magnitude of the lead -- the larger the lead, the more it needs to be discounted -- and the number of days until the election. We can specify a regression equation to project the November outcome based on a candidate's present polling lead as follows:PROJECTION
Visually, that looks about like this:
= MARGIN*.909
+ MARGIN*ROOTDAYS*-.0475
+ SQRT(MARGIN)*ROOTDAYS*.0604
ROOTDAYS = Square root of the number of days until election.
MARGIN = Size of lead for leading candidate.
This chart is perhaps a little confusing, but it's exhibiting the two essential features that I talked about before: the larger the lead, the more it needs to be discounted (both proportionately and absolutely), and the closer we get to election day, the less it needs to be discounted. Particularly, a lead starts to become significantly more meaningful once we get within about 30 days of the election, although it's also the case that presidential elections have tended to tighten within the last 30 days.
So, for instance, a 20-point lead in a poll 300 days before the election projects to only a 6-7 point victory in November. A 15-point lead in a poll taken 100 days before the election projects to a 9-point victory. And so forth. These are very significant corrections; big leads held a long ways before the election must be discounted quite heavily.
As for Barack Obama's lead right now, the correction required is not quite as dramatic. The regression equation specifies that a 5.9-point lead held 130 days before the election should be discounted by about one-third -- to 3.8 points to be exact. That is our new projection for Obama's margin of victory.
Specifically, what the model now does is to calibrate the trend adjustment to a candidate's discounted lead in the polls. What this process involves is to run the numbers once through without the discount (just as we had run them before), and then figure out the difference between the candidate's current lead and his projected winning margin based on our discount formula, and then subtract that number of points from the candidate's margin in each state. Put less fancily, we are subtracting 2.1 points from Obama's present trend-adjusted estimate in every state, because all else being equal, we expect McCain to gain 2.1 points between now and November. This lowers Obama's win percentage from 76 percent to 69 percent, a figure that squares a lot better with my intuition about this election.
*-*
I'm sure that people are sick and tired of all these changes, but this really ought to be the last missing piece of the puzzle, and it's something that we absolutely must do if our goal is to predict the November outcome rather than merely give a snapshot of the current polling. This is something, frankly, that I should have looked at before, although since the election had been so close until recently, it would not have mattered very much.
You'll also notice one other, less important change. Our projection now allocates the undecideds in each state 50:50 to the two major candidates, after making an allocation for third-party votes. The third-party allocation differs slightly from state to state depending on the other + undecided vote in that state's polling. The model had implicitly been allocating the undecideds this way before, but now I'm doing it explicitly, as I want to make it absolutely clear that our projection in each state is in fact a projection of the final outcome rather than some kind of supercharged polling average.
Acknowledgments: I again want to thank Robert Erikson of Columbia University, who has performed similar calculations in the past and gave me the idea for this one, and Andrew Gelman, also of Columbia, who lent me use of his historical polling database.
EDIT: Per some early feedback in the comments, I have changed the way I present the polling detail chart. What we formerly called our projection is now presented as before and described as the "Snapshot". The Snapshot is our best estimate of what the election would look like if it were held today.
In contrast to the Snapshot is the Projection, which discounts current national polling leads through the process described herein, and also allocates out the undecided vote. This is our best guess at what the election will look like in November.
6.26.2008
Should we be discounting Obama's lead? The next and hopefully last Big Change.
by Nate Silver @ 7:55 PM...see also history, meta, methodology, site, timing bias
Subscribe to:
Post Comments (Atom)

140 comments
Looks brilliant as usual, Nate.
I notice that with the new calculations, McCain is about four and a half points below the magic 50% mark in his home state of Arizona. That's a substantial number of people going for third parties or Nader or something, around 17% of total Arizona voters, and that's just one example. How much do you figure the underdog candidates are likely to wrap up nationally in the popular vote? I mean, that's certainly much more than they picked up in the last two elections.
Well, I won't say you're wrong, but color me unconvinced. When did those various closing leads occur during the race? (I assume that their proximity to the election was how you derived your projection equation, but I'd be interested to see those projection equation lines superimposed on actual polling data throughout the election.) What circumstances surrounded losing those leads? How do we know that similar circumstances apply today?
I ask because I feel like you're going off a symptom of elections (closing polls) without really explaining the cause of that symptom or why it is present today. It may be enough that we can be psychohistorical about it all and just assert that what happened then should and will happen now, but I'm...just not sure.
Anyways, it will be very interesting to see how it plays out. Thanks for the update and I hope we get more data out of you soon!
One more thing - I'd suggest separate counters and graphs for the current projection vs. the November projection. Might make it easier to keep track of shifts and changes, because if everything's calibrated for the November projection and there's a sea change - October surprise, assassination (God forbid), ill-advised photo op with Hugo Chávez - it might be hard to gauge the immediate effects.
Can I just tell you you're a genius and I have no idea what half the numbers or graphs mean but the charts are so pretty anyway.
I hope you're right. GOBAMA
You're a goddamn machine Nate. I love you and I hope you never stop. Great post.
I guess this confirms that you are trying to accurate and not a shill for Obama. Of course, we will still get people that will come on here and say otherwise.
Thanks and keep up the interesting work.
By the way, when you are done with the election, can you do anything to fix my beloved Seattle Mariners.
Daniel M.
Please include a separate row in each state projection for this new adjustment rather than combining it with the trend adjustment. These really are two VERY different adjustments. Remember your readers here like to know how the sausage is being made.
Isn't this uncertainty built in to the change earlier where you adjust the error based on the days until the election?
Poblano:
I think you're shooting yourself in foot again here like you were when you first factored in national polling data. There is no apparant reason to apply the discount of the national polling equally to the margins in all states. Rather, since you already corrected for the relative effect of trends on each state, you should instead discount the predicted change in national polling from the national polling trend as applied to the states before running the regression (I guess this would amount to running the trend regression once to determine the margin, and then a second time after the discount is applied).
You've said yourself that some states respond more to shifts in national polling than others, so you really should observe those differences. That's not to even mention problems such as if McCain suddenly had a 95% margin in Utah the same week Obama was up 15% nationally. Would it then make sense to give McCain a margin over 100%?
The anonymous from above here. That came off cruder than I meant it to. I think the changes you have been making are better to be made uniformly across the whole country than not at all, but it still seems that better can be achieved and I am confident that you are able to achieve that, as you did when first factoring in national polling. Not trying to lecture you on what the specific remedy should be or anything.
I see your overall point but I think you made a big mistake in using the largest lead as your starting point in the analysis. Obviously, somebody is going to win by significantly less than their largest lead. If the final score was bigger, the final score would be the largest lead. I see the same mistake in terms of studies of stocks, where they track the fall of xyz stock from its overall high or the rise from its overall low.
I'd like to see the same study redone with whatever lead the person had at this point in the contest.
There are a couple of caveats that would limit the effect of any previous election poll study for this election. I believe that all elections ultimately come down to "incumbent party vs. challenger party" A) The Dem contest was very well covered. Hence, Obama is better known than most challengers. B) McCain is not president or VP. Unlike almost every other incumbent party candidate, he has very little ties to the incumbent. I think these factors will change the overall case.
I agree with Another Mike and Anonymous @ 7:17. This is a great addition, but anything that's in beta is of course prone to a few more errors than once it's polished.
Keep up the great work, though. This place has definitely far surpassed 270ToWin and electoral-vote.com as my preferred electoral projection place.
Why the constant offset, such that there's a "tightening" even with zero days until the election?
Nate said "Our projection now allocates the undecideds in each state 50:50 to the two major candidates, after making an allocation for third-party votes. The third-party allocation differs slightly from state to state depending on the other + undecided vote in that state's polling."
Can you explain how the model makes an allocation for third party votes?
Excellent. Now that the trend bounce gets a correction, the numbers are looking much more realistic.
hi nate,
first: reasonable adjustment, it makes sense that the candidate polling ahead is more likely to give up some of that advantage instead of increasing it. After all, the leader is probably closer to his ceiling/potential (thats the way i would describe it)
a suggestion: don't display the projected national lead as a line in the super tracker!!
That is misleading. People might think its a sort of "average lead", instead of an adjusted lead from the end point of the tracking line.
Just make it some kind of cross, crosshair, big X, whatever, that indicates it is a point kind of thing :)
Regards,
Icylemort
This corrected all the methodological qualms I can think of.
EXCEPT maybe the fact that it shouldn't happen equally across all states. But I don't know how to fix that.
Why don't you Anonymouses (Anonymi?) just give yourselves pseudonymns for effing sake!
Click the "Name/URL" radio button and enter whatever you like.
Jeez.
The problem with trying to predict the future is that it is always predicting the future with current and past information. You assume that nothing is going to happen during the debates or in the media or somewhere else that would throw the whole prediction down the drain.
That the race might tighten is though somewhat predictable, big leads mean that the supporters of the candidate in the lead are less likely to be motivated to show up than the supporters who lean to the candidate who is behind at least nationally. Turnout tends to be high only in battleground states but since it's going to be a battle this year that is going to be contested in a lot more states I think we might be in for a surprise.
When Poblano says "But even if we look at this data more systematically..." I think he means that he is deriving a model based off of all of the national polling data he has access to, not just the maximums. If I had to guess, he plotted the absolute difference between national polls and the final result vs. days until the election and then found the best-fit square-root curve (or something similar) for that data. That's where those seemingly arbitrary constants .909, -.0475, and .0604 come from. Though, I agree it would've been nice to see this sausage made as well, even if it were just the graph with the curve and an indecipherable cloud of points.
This addresses the one methodological question I had about the site--whether the "error" due to time before the election was in some part predictable, and worth accounting for. Apparently, it is.
Great job.
Dumb question, perhaps, because I think I know the answer. But, do all the previous predictions shown in the Super Tracker (tm) incorporate this correction?
Anonymous @7:41, it's one thing to say "the numbers came from fitting the data"; that's obvious enough.
The problem as I see it -- and this may just be my background in the physical sciences speaking -- is that there appears to be a free parameter in the model that shouldn't be there. The model as it stands says that even polls taken on election day will overpredict the lead in the actual election, and unless there's a good reason to expect that's true I'd feel more comfortable with a model that constrains the difference between polls and result to be zero on election day.
As Robert DeNiro said (as Lou Cipher) in Angel Heart, the future's not what it used to be.
Nate, your new changes make the November projections for states like NV and IN way more realistic seeming. Some Obamaphiles may be mad that the projection is not as excessive in Barack's favor, but most of us want a realistic approach.
You also pointed out corrrectly that no presidential winner has ever been at his polling peak on the day he wins the election (though sometimes the losre is at his peak).
people !
speak english for people like me
from what i understand Obama will win in november if these numbers hold ??
mark from ohio
Obama brown noser, how come you have not updated your Super Tracker since 6/21? Could it be that there are many polls that are tightening like Gallup & Rassmusen? If fact, Gallup now shows it even for 2 days in a row. If you run your 8th degree least squares poly through the points you will see a steep negative slope at the end. You would not want to do that would you OBN.
OMG !
Win percentage for Obama is now 69.3%
??
DOWN !!!!!!!!!!
mccain is winning !
WOOOO HOOO !
I can see the logic in what you are doing, though I think it's taking a sledgehammer to the problem. The real issue is that undecided voters eventually decide, usually in predictable ways. I think that applying a discount curve is an elegant solution, but one that is somewhat disconnected from the actual facts on the ground.
That said, I feel honor bound to tease you by noting that you've essentially added a "mercy rule" to your model. As a sports fan, surely you must take umbrage at that!
Excellent adjustment. The "Electoral Vote Distribution" functions as the margin of error...
The one thing I would add, at this point, is to make the "Electoral Vote Distributor" better tied with the "Tipping Point" stats. Show how states are polling compared to the national averages.
Love the new adjustment, though I share everyone's qualms about applying it to states. Why not apply it to the national trend adjustment? I'm sure there's a reason...
One thing: I think it would be much better if you could calculate the newly-adjusted projection retroactively starting 2/2 and plot it in purple with the supertracker. This would make much more sense than plotting what looks like a steady prediction 3.8% for Obama for the past four months. I realize this would require going back over the past four months of data again, but it would be very useful.
MC, estimation error comes in two forms: bias (accuracy) and variance (precision). Typically more data reduces variance but cannot correct bias from misspecification.
The recent adjustment isn't an uncertainty "variance" adjustment.
Nate is trying to correct an upward bias of summer leads (I may argue it is not a structural component of a political race but Nate generally takes a reduced form perspective in his prediction models).
Nate - it appears as if you are now including third party candidates. Could you please elaborate on how they are included in the 538 model, and in which (swing) states do they have the greatest impact?
Er, I do agree with the criticism that some have offered that it seems wrong to simply lop the adjustment off the top in each state. It would somewhat surprise me if that were the ideal way to do this (but I'm no expert).
Regarding the why of this--it seems totally intuitive to me. Early in the race, bandwagon effects and high public profile for the frontrunner are powerful forces. As the election approaches, the trailer gets more coverage and a chance to get out his message, leading him to recover somewhat in the polls.
To use a military metaphor, when Obama is up 6 in the polls, he is fighting in hostile terrain. Those last 6 percentage of his votes come from people who voted for Bush and almost certainly disagree with Obama on one or more fundamental issues. As the candidates are seen more and their messages and positions permeate the electorate, some of these folks will gravitate to their prior voting patterns.
My next question for Nate would be this: Is there a pattern of parties out of power losing (or gain) support as the election approaches? Is there some pattern of Democrats losing big leads at a higher rate than Republicans? In short, is there any reason to think that this adjustment shouldn't be symmetrical?
Anonymous asshat @ 20:57 - If you'd been paying attention in when they taught you how to read a chart in school you'd realize that the x-axis on the Super Tracker is labeled every two weeks. The tracker is updated daily, but there is not enough room to put daily labels on the x-axis. The next labeled x-axis point will be 14 days from 6/21 or, since I doubt you can do the calculation yourself, 7/5.
quoting Richard:
[One thing: I think it would be much better if you could calculate the newly-adjusted projection retroactively starting 2/2 and plot it in purple with the supertracker. This would make much more sense than plotting what looks like a steady prediction 3.8% for Obama for the past four months. I realize this would require going back over the past four months of data again, but it would be very useful.]
I agree with him!
That would be even better than my crosshair idea...
Interesting analysis. A somewhat off-topic comment: Out of curiosity, I decided to look at the academic literature to see what types of electoral analysis it contained. Searching with Google Scholar revealed some pretty interesting articles. One of the more recent and interesting articles I found is Soumbatiants, Chappell, and Johnson, "Using state polls to forecast U.S. Presidential election outcomes" Public Choice, 127 (1-2), 2006, p. 207-223" (http://www.springerlink.com/content/4r77200825315574/) I haven't read the whole article yet, but it seems they may take an approach similar to that of 538.com; some of the article's references also appear interesting. I also noticed that some of the same issues being discussed in the comments of this blog (like the importance of considering state correlation and the relative roles of state and national polling information) were also discussed in some of the articles I came across.
Also, I figured I'd contribute my two cents to the ongoing discussion in the comments regarding the role of correlation in Nate's model. I've come to the conclusion that his model takes a significant amount of correlation between states into account. If you use the 538.com state win percents in an electoral college model assuming independent states, the electoral vote distribution appears much like a bell curve (Gaussian distribution) with relatively low variance. At the other extreme, you can consider a "fully correlated" model where Obama wins all states "bluer" than some randomly determined threshold and McCain wins all the other "redder" states. The electoral distribution presented on 538 appears intermediate between the electoral vote distributions with these two extreme models. Also, the Obama win % predicted by 538 falls between the values from these two models (an independent model gives a much higher Obama win%, over 90% based on data from a few days ago).
Fascinating site...keep up the good work!
Response to Troll, err, I mean Anonymous (2008-06-26 17:57): The labels are at two week intervals. The data points do not stop at 6/21, the labels do.
How hard is it to see that the data goes about 5 days past the 6/21 label?
The major source of uncertainty for me is that the intense negative campaign hasn't started yet. I wonder if you have (1) anything you can glean about the history of that from past elections or (2) any information from this year's primary or general polling about who is most persuadable, or whether that makes particular states more uncertain than they seem now.
If I had time, I'd love to compare this with some Intrade numbers and try to explain the differences.
(My other source of uncertainty: the veepstakes -- Crist could turn FL redder, Webb or Strickland could affect VA or OH.)
Richard who thinks he is smart, where are the two 0%lead dots for the last 2 days of the Gallup poll? THEY ARE'T THERE. Clean your glasses.
Nate knows his website has really arrived when it makes the "List o'websites to be trolled".
Anon at 8:21: The tracker shows daily polling averages, not individual polls. That's why Gallup isn't on there.
Colin Powell may support Obama soon!!!
Polls won't matter as much untill McCain makes up ground and until VPS are picked.
Until then- wait and see. If Obama keeps big swing state leads it should be fine!!!
I'll expand on Richard's request and ask if the Super Tracker could have two curves, one plotting what the popular vote is expected to be if the election were held that day (like your current red line) and one plotting what your model projects will be the final outcome of the popular vote (I think this is what Richard is requesting). Further, could the Win Percentage Tracker have two curves, one plotting what the chances of Obama winning are if the election is held tomorrow (essentially your old model), and one plotting what the projected chances of Obama winning in the final outcome (essentially the new model). So laymen don't think this is Obama-biased, just split the axis like you've done for the Super Tracker (i.e. McCain 80%, McCain 70%, McCain 60%, even, Obama 60%, Obama 70%, Obama 80%). This fulfills a lot of what people are requesting as far as graphics displayed goes, gives an easy visual context for your projections so people see that you aren't giving more to Obama than what current trends dictate, and also eliminates the need for the silly plot of the same graph twice, once flipped vertically.
Where has the win percentage tracker gone
It's probably being refit for this new model.
the win percentage tracker would have to be recalculated for a few weeks back
quite time demanding, but it would be well worth the effort
go nate!
New Super Tracker looks cool. Very nice!
I'm a bit confused by the new popular vote projection. I don't think that it's supposed to be Obama + McCain + Third Party Candidates because third party candidates certainly won't receive 4.8% of the vote. So then how is it calculated - does it leave out some undecideds?
I kinda liked how it was before - the projected two-party share of the popular vote.
I'm not a statistician, but why would you use the national lead to determine discounted numbers for individual states? As a result, you're subtracting 2 points from an 18 pt lead state like California, but also 2 points from a 1 point state lead like Indiana. Shouldn't you instead use the statewide leads to make this determination, so that California loses 2 points, but Indiana only, say, 0.2? If Obama polls 1 point ahead in Indiana for six straight weeks four months out, why should he be penalized 2 points based on a larger, more volatile national lead?
4.8% for third party candidates doesn't seem totally unreasonable.
especially considering that both parties have some kind of internal dissatisfaction problems
David,
It doesn't seem unreasonable to think that a race between two candidates who have both had trouble unifying their bases (Obama with PUMAs, hardcore feminists, and white racists; McCain with evangelicals, antiwar conservatives, and the Liberty Republicans) could yield around 5% to third-party candidates. Bob Barr is certainly going to benefit from the Ron Paul grass-roots movement, while Ralph Nader always manages to scrape up some cynical independents and disaffected liberals.
5% of the electorate is hardly Perot numbers. I think that's actually a pretty accurate projection for third-party results this year.
Thanks to the straight responses to some of the trolls' questions -- they've cleared up a lot of questions I had about the graphics.
Nate,
I am a big fan of the site, and I'm not at all sick of big changes. I think it is very interesting to watch as you refine the methodology step by step. I hope that you won't hesitate to make further changes if you think it will improve the predictions---and I think those that you've made do.
Nate, I have to agree with Anym@18:48. Taking the national numbers from previous elections and noting that the lead shrinks as the election approaches does not necessarily mean that the proper adjustment is to apply a simple linear adjustment equally across all states. Without having the state data broken out of the previous elections, it seems that a better model is that each state's margin regresses to +/- 0 with the same functional (exponential) form as opposed to each state changing by the same absolute margin.
A comparable example would be the '08 Florida Marlins. They are scoring a lot of runs, and to project their scoring for the second half of the season would not be to regress the total scoring of the team to a median number (or to the PECOTA numbers) but to regress each player to their projected numbers. That way if eight players are producing as expected and one player is way overachieving, the more correct prediction would have the eight "normal" players continuing as predicted and one player in for a huge drop. In some situations the predicted runs would be equivalent, but not in every case. I hope this makes sense.
Generally true elections tighten near the end. In 1976, Ford's only lead was the final Gallup poll.
You make electoral-vote.com so very.....2004.
I was thinking about a mean reversion possiblity yesterday, but I didn't have the data to check it. I'm glad you were one step ahead of me. I think this is a vast improvement.
Perhaps it would be interesting to look at other long run patterns including cycle lengths and the importance of various observable events on trends. Perhaps something like the conventions act like a "knot point" where the predictive adjustment changes. Obviously any further inclusion of long run trends runs the risk of overwhelming polling with tons of structure.
Hi all,
With respect to the fact that the model thinks we should slightly discount polling leads even on election eve: there were three elections in which the last polling materially overestimated the winning candidate's margin: 1964, 1996, and 2000 (where it actually got the winner of the popular vote wrong). Also maybe 1956, but the data is sketchy there.
There was one election in which the final polling materially *under*estimated the winning candidate's margin ... that was in 1980, which was an election that broke both very late and very rapidly to Ronald Reagan. Also maybe 1952, but the data is sketchy there.
In every other instance, the polling did a pretty darn good job.
Nevertheless, the coefficient on the MARGIN term in the regression is <1 by a statistically significant margin.
Also, as we get closer to the election, we are going to switch to the more aggressive/sensitive version of the LOESS trend estimator, and so this will make for a little bit of a hedge.
Paul,
Yeah, I'm definitely going to investigate the impact of the conventions on these numbers. We could for example incorporate a two-week window following the convention where the mean reversion is stronger.
I may be misunderstanding the overall methodolgy of this site, but this seems like it might be double counting to me. Doesn't the model already estimate an error term based on the time remaining to the election? I'm assuming this term was calculated using raw polling gaps to final totals so would include this tightening error.
Won't the new projections explicity include this error and then also implicitly assume further error in the scenarios. I'd think that this adjustment is right, but the scenarios now include too much variation in results.
Does this make sense?
Dan, not only is there error but that error is biased. The model already accounted for the error, but assumed there was no bias. Now the model is acknowledging that earlier polls tend to overstate actual election results and large leads tend to close.
Before it looked at variance, now it considers mean bias.
There are two possible explanations for the average margin of an election to be less than the margin of national polls used to predict it:
1)The winning candidate's vote share is more likely to fall than rise
2)the winning candidate's vote share is more likely to change by a lot if it falls but not if it rises (i.e there is more resistance to change as polls move away from 50 /50)
There is of course a continuum between these two extremes, but it should be easy to test where the data falls:
if it's all option 1, the median outcome given a poll x days before the election should be the same as the mean outcome. If it's all option 2, the median outcome should be the same as the polled option, but the mean should regress towards the mean.
Anyway,
I'd be curious to know what the data shows...
Thanks for the great site Nate.
Paul,
Thanks for the response. I understand your point and think correcting this bias makes sense. I'm just wondering if the error term is now overestimated. It seems that, depending on the lead, polling bias is being corrected by 1-10 points and then that bias is also being included when calculating the error term. I feel like this could have a fairly significant impact on the range of outcomes being modelled.
Again, my math isn't good enough to tell if this is really what's happening and I may be missing how the error terms are actually being calculated.
Also, Nate, I've been lurking for a long time and love the site. It's like crack.
I second the comment of someone above that I love the continual improvements your making and trying to follow along.
Nate:
I think you should publish both the "present snapshot" and the "Election Day projection". I think your audience is interested in both of these charts; I know I am.
With some further thought. I'd like to add that more analysis of the mechanism behind this decay of large leads would really be valuable.
A lot about it seems counterintuitive.
National polls getting closer near an election seems normal, but state polls (eg dems in DC or repubs in UT) seem to maintain large and quite stable leads.
I'm sure you could disect this to death, but it's certainly not dead yet.
I DEFINITELY don't believe that projecting results 130 days out based on historical trends is accurate because of the simple fact that today is not tomorrow.
With that said, the effect of this is to moderate the bumps in the polls, and as we all know, things will no doubt swing, especially around the conventions. So even though I strongly disagree with the reasoning behind this, the moderation from the existing method was appropriate and provides a more accurate and more stable result. I definitely felt that the previous projections were way too rosy for Obama, while what I am seeing now looks to be right on track with the exception of a few states that haven't updated since the Dem's contest was decided (Montana and Louisiana come to mind).
If I read correctly your description of how to apply this tightening correction, you are using the overall (national) lead to compute an adjustment (of 2.1 points), and then shifting the snapshot of each state by this overall amount. Would it not make equal (or, possibly, more) sense to compute a tightening correction for each state, and then apply that correction to that state? It's not clear to me that just because Obama is leading nationally by, say, 6 points, and this will tighten to, say, 4 points by November, that that would necessarily imply that every state Obama is currently winning by one point will switch to a one point McCain victory in November.
That is, to tighten the race nationally I would expect you to tighten (by scaling) each state rather than shift each state.
~Doc~
Shouldn't the third pie chart have a third color sliver now?
49.5% isn't more than a semicircle as it appears now.
Hi - I understand why this changes the margin of victory for a candidate, but I don't understand why it changes the probability for victory by that candidate.
I think you should roll the dice for your monte carlo, before you apply the convergence algorithm. If you apply the convergence algorithm first, then not only does the mean shift towards zero, but the sigmas for the sampling should also shrink.
My instincts say that the probability of victory for one candidate or the other shouldn't change, just the magnitude.
I always thought the percentages (before this tweak) was the two-party percentages. So perhaps it should be consistent that way, or insert the Barr/Nader slice(s).
Nate please explain why you think 6 pts feels right? Other than the fact that is close to the polling average to date(actually the true poll average is 7%, but anyway). Moreover, why not weight or otherwise distinguish between polls of likely voters and those of registered voters? The former tends to take out a lot of the flim flam caught up in polling people who just happen to be registered even if they say they will not vote this year? Finally, given the fact that half the time the person trailing does NOT make a run toward the end, why are you assuming it will happen this year? My feel is that this election has a lot of the hallmarks of the 1996 campaign (perhaps even the 1980 campaign). From reviewing your chart those elections did not fall into the model you now seek to employ.
Nate, this is really a bizarre thing to do. The historical sample is nowhere near dense enough to gleam any insight on what you're looking for. Just consider the standard deviation in the sample from the model that you're applying to the current data. I'm betting it's pretty damn large.
Have to agree with a handful of previous comments. Would it not make more sense to apply the formula in a way that California would close by, say, 4 points and Indiana by only .5 or what ever it would turn out to be? Taking the 2 points off the top of every state seems like it would provide an inaccurate result.
That would also close gaps in deep red states to make them a little lighter.
We'll see if you think this would be the more logical way to move.
But also have to agree with some of the others on here on another issue: Your updating. KEEP DOING IT. I don't care how many times I have to wrap my mind around new graphs or new concepts. If it makes sense and will improve accuracy, don't ever hesitate to take action.
Thanks SO much for all the work. My friends and I LIVE on this site. Long time lurker, here, but the absolute national vs. the relative state-by-state debate got me out of my shell!
This is Anonymous from 18:48. Thanks to IronThrone and EquationDoc for articulating my point far better than I could.
I'm curious what Nate thinks of the tightening option re: adjusting November projections compared to the current raw subtraction of the same amount (2 points) from all states.
"no presidential candidate has been at his peak on election day"
I think that is a mistake. Reagan in '80 on election day was near his peak. Interestingly, McCain campaign is trying to use the same tactic against Obama as Carter tried against Reagan --- that Obama (Reagan) is (was) too risky. That tactic does not really work in a change election year and this year has all the hallmarks of a change election. Nate's analysis problem is its assumption that there is uniformity between trends in all the presidential elections, namely, the winner never wins on a peak and the loser usually closes. However, I think with each election the use of new technology has broken down information barriers to voters who until then would have not bothered (or had access to) info in making their decision. That is simply not true these days, and especially after a heated primary season on both sides. To believe that there is this big segment of low info voters out there whose movement will not occur until the end is a idea of the past. Look at all the national polls they have consistenly shown a small group of undecideds (in the realm of 6-8%).
Can you put the historical date on your regression plot?
For example can you plot the day of maximum lead for the candidate that won say the 1994 election, and then also the point by which that election was actually won? How about plotting the data for the candidate that lost, in those cases where that candidate actually had a lead at some point?
I am not sure whether this regression will hold this time. How does it factor in the partisan identification advantage, which is at a historic maximum for D this time? How also does it factor the GOTV effort? Do the regressions hold eaually well for the 2000 and 2004 elections as the earlier ones? Presumably the Republican GOTV effort was superior to the earlier elections?
If I understand this right, all the probability distributions use some sort of square root of the number of days left for the standard deviation of the noise. This is exactly what you'd get from doing a random walk on the percentages day-to-day (because of course a sum of n independent normally distributed random variables with the same standard deviation is a normally distributed random variable with standard deviation equal to sqrt-n times the original standard deviation). This suggests that (if you have the computing power - I really don't have any idea how much each estimate takes) you could simulate by means of random walks rather than just generating the national and state total changes. If you simulate by random walks, then you can change the simulation slightly by making the random walk more likely to move in the direction of the one who is losing rather than the one who is winning. Of course, I would guess that each state has a different baseline that they're likely to regress towards - this model currently assumes that regression is towards a nationwide 50-50 split, and has no regression towards the mean in individual states.
I've have some misgivings with looking at polls for the last 60 years and trying to figure out things today. That makes two critical assumptions which I don't think are correct.
The first is that polling methodology has remained consistent over the decades. I don't have as much ability to comment on this assumption, but I'm inclined to say that isn't the case now. The way pollsters push leaners, the way pollsters adjust the data based on party, age, race or whatever, all seems designed to make the poll more accurate and less liable to +30 margins that are obviously not going to hold up.
The other assumption is that society is the same. The theory appears to be that early leads are exaggerated and that as people pay attention, races tend to tighten. I don't think that's nearly the case today as it used to be. While people might pay more attention to the race closer to the election, I would bet that on the whole today's electorate is far more informed about the candidates and issues now than their counterparts were in June 1960.
Beyond how society has changed since then, there's also the matter of the salience of the presidential election to the electorate. If people care more about the election, then they're more likely to have their views on candidates cemented early. Primaries make it so that the presidential election cycle starts much earlier than it used to in the mind of the public.
In sum, while I think tightening is probably going to happen, I don't think it will be as pronounced as earlier elections.
Couple more comments.
1. Some of you are WAY underestimating how robust this finding is. It's extremely statistically significant, and reduces the error on the election day projection by roughly 30 percent versus taking the numbers as-is.
2. Scaling the results by the margin in a particular state doesn't really work. If anything, the states that are closer are more likely to contain swing voters and therefore liable to be more volatile.
3. Dan K. may be right that we're sort of double-counting the error term. On the other hand, all these adjustments we're making now are introducing more noise into the model. I haven't totally thought it through. If it's a mistake, it probably isn't one by much, and at least it errs on the conservative side.
BTW -- if we look only at elections since 1988, the degree of mean reversion would actually be quite a bit stronger. This is being driven primarily by the more recent elections. 1992 was exceptionally volatile; 2000 was pretty darn volatile; 1988 was volatile, although almost always moving in GHWB's direction.
I don't like explicitly assigning the undecided vote... it was nice to see how many people are undecided in a particular state in the chart averages.
Nate,
You note that 1992 was exceptionally volatile. How are you dealing with Perot in that election? Are you just ignoring him - so your comparison is just Bush-Clinton margins? The existence of Perot as a serious contender (heck, wasn't he leading in some polls around June) would seem to make the lessons of 1992 not particularly applicable to a traditional two-man battle like what we've got this year.
First, to add to those saying we absolutely love this tinkering - hope this isn't the last one!
I think I also like the change itself. The flat application of the adjustment across states appears to be right. It is not the same as the time trend adjustment, because there you threw away some information we actually had available when you made it flat. But I'm not sure we have information relevant for the question, which form the regression to the mean will take. It definitely need not be more pronounced in more one-sided States (doing it State-by-State in fact rather tends to diminish it, as you simply end up with a pinker red and a lighter blue, overall remaining pretty much the same, and I am sure history is not like that).
The actual regression to the mean will be associated with certain *demographics*, and what's absolutely sad about the future is that we really, truly have no information as to *which demographics will break away from the leader and which will stay*. We can make a guess, however, that *some* demographics will break away from the leader - and assigning them, based on this limited information, flat across the board, is I think the right thing to do.
I apologize, incidentally, for staying "anonymous". Can't think of a name.
Tom,
I also looked at a version where you include the number of "other" votes (third party + undecided) as a variable in the regression. It actually behaves pretty intuitively -- the fewer votes that the two major candidates have between them, the more volatility in the model and the higher the degree of mean-reversion. I don't know why I didn't run with that version -- thought it made things too complicated, I guess -- but I probably should be using it.
This may be a dumb question, but if we consider a state like Utah where McCain has a huge lead even when Obama leads overall, wouldn't it be possible that there was a tightening of the race in a way that favors Obama in Utah? Or is there a reason why the phenomenon would have Obama doing worse in every state, even the ones where he was always doing poorly?
I like the comment about 'today [not being] tomorrow' - someone doesn't quite understand the concept of a projection! Have to say - the numbers showing up all seem more likely. Having said that, you'd hope that Obama's candidacy is stronger than those listed above - his being the once-in-a-lifetime kind and so maybe that cancels out the 'need' to take points off.
Isn't the Projected Election Day Margin table in the post wrong? It gives the impression of a increasing margin rather than a tightening one.
Way to go, Nate! This is a great job, and I really appreciate every improvement you make to the model. Greetings from Spain!
In my opinion using an average value for all state, and using election result that happened more than 20 years ago makes the projection is rather rough. There have been fundamental changes in the campaigns and demography since the times of Eisenhower. Even if the overall goal is predict the November result you should definitely make a distinction between the now and then in the diagrams.
Freely add anything you can think of to make the projection more accurate.
Whether it's coincidence or not, this small change has rectified the few slightly skewed projections that I had complained about previously. Nevada, Florida, Missouri and Indiana have now gone from very light blue to very light pink. I think that has to be right, particularly in NV and MO where by any objective account McCain is still clearly just ahead even after the Obama bounce. FL and IN appear to be toss-ups according to recent polls but the 2004 results suggest that Obama probably needs to be consistently ahead in those states before we can colour them blue.
Of course Nate has justified the change with reason and logic but I'm just more comfortable that the map looks realistic now.
On a sidenote, I see that McCain has been endorsed by his former captor at the Hanoi Hilton and Fox News has run with it as if this is in some way a positive story. Interestingly if you read the full interview with the Hanoi prison chief he also goes on to deny that they ever tortured prisoners and that McCain is telling lies to win the election. With friends like these...
I'm not into this. The fact that the lead often tightens is only the result; we don't know what actually causes it. Will the reverse actually work for Obama? We don't know. If you're making these changes simply as a guessing game on what might happen, then why should you not include factors such as how Obama would be able to spend mcuh more money, or that he is definitely a better speaker? Or maybe factor in the risk that Obama makes a huge gaffe? No. 538 should stay, for the most part, with the polls. You might be a genius, Nate, but you can't predict everything.
Facinating but predicting November is a mug's game.
In a way I don't care what the projection is for November; as it is likely to change just as much as the 'if the election were held today' snapshot.
Though I suppose the two lines will converge as we get closer?
Anonymous at 4:06:
You don't need to predict the cause of future changes in the model to have the best model. Poblano's model does account for many measurable things, such as fund raising and campaigning (and therefore indirectly accounts for having much more money to spend). Things like the likelihood of a huge gaffe aren't measurable, so Poblano does the next best thing here and predicts Obama's likelihood of a huge gaffe by the likelihood of them in past presidential candidates. In the same way he also predicts the likely outcomes of such events and the likely outcomes of other events significant in an election.
You're objection to the methodology essentially boils down to "you can't predict what will happen to Obama because he is unlike any presidential candidate that proceeded him!" The problem, though, is that if that were true then no model based on polling data would lend any sort of insight into the outcome of the race. By assuming Obama is an outlier in every way from the outset there is no way to make any projection that has any reliability or worth at all. If, instead, you assume he will behave (with respect to what's expected of him) roughly like presidential candidates of the past, we can get a good idea of what is likely to happen if Obama isn't some superhuman freak. If it turns out that Obama is a superhuman freak, well, no generally reliable model would've picked up on that before it becomes apparent anyway, and at least Poblano's model will adjust toward that reality over time.
whoops, I meant "...future changes in the election..." and "Your objection..."
It's late, okay :(
The big question is why do races tighten?
I think the truth is that come the time of decision voters tend to ignore the soundbites & the scandles and vote for their party.
In short the election result is more in line with party identification numbers than early polls would show.
Using this logic (and it might be worth no more than you have paid for it)Obama is not 6% ahead. He is rather badly behind the lead a Democrat should have.
I am not arguing for an adjustment in the opposit direction, but rather caution. The lead in Democratic identification is new. We have no idea how fixed it is. Puma may shrink as it's members get a better look at McCain, or not. There are a lot of conservative democrats that could be won over if Obama is seen as a liberal idealist.
Do you have evidence that races tighten towards even rather than party identification?
I suspect that the "tightening" is combining two factors: 1) an actual tendency of candidates to regress to the mean (underlying partisan split in country at that time), and 2) tendency of undecideds to break for challenger in incumbent races (prior to 2004). You might improve the model by accounting for the latter factor in incumbent races.
I agree with the observation that presidential races generally tighten as the campaign progresses. What will be interesting this time around is whether Obama's expected cash advantage -- which might exceed 2:1 -- enables him to buck this trend.
@Guy: Maybe the tightening is because the undecided voters who are leaning to the leading candidate are less compelled to vote than those who are leaning to the trailing candidate; the vote of the latter could make the difference, while the vote of the former may not.
I'm afraid I don't really understand the difference between "trend" and "projection". If there's an answer in the FAQ, I must have missed it.
The projection map looks more correct now. I really wasn't buying the possibility of an Obama win in Missouri, and I still have strong doubts about Indiana.
You should really adjust your popular vote pie chart to have a wedge for "other." Or just a space where that wedge would otherwise be . . . although that might imply undecided.
I agree with some comments that it is time to take a break from the model updates and adjust your FAQs . . .
TIME National poll:
Obama 43, McCain 38
Dates conducted: June 19-25. Error margin: 3.5 points.
Link
The adjustment is a good one. One thing I noticed looking over the largest lead table thing....The polls all tighen, agreed, but they seem to tighten and land in favor of the Repuplican in recent history. (i.e. 1976 Carter's largest lead was 33, Ford's was 1, Ford lost by 2. So while Carter won, He lost virtually all of his largest lead -31, while Ford lost only 3) As a matter of fact, if I did the off the cuff averages in my head right, a poll hasen't tightened in favor of a Democrat since 1968. The good news (for Dems)is that in the last couple elections, the D's and the R's split the loss of largest lead fairly evenly. Maybe a trend.
Someone might have pointed this out in an earlier post. If so, my appologies for being redundant, I dont have time to read 100 posts on 30 sites. I'm a casual junkie!
Tain
Orlando: No, it's a reflection that voters tend to make a yes/no judgment about the incumbent. In other words, an undecided voter's decision not to support the incumbent has more predictive power than their decision to not (yet) support the challenger. That said, this did not appear to be true in 2004.
* *
Nate, I think the question of allocating the undecideds is something to think about. One of the biggest polling challenges this year will be trying to measure how much, if any, of the Undecided category is actually a hidden anti-Obama vote that will not speak its name (because of racial dynamics).
Missouri, Nevada, and Ohio—to name them alphabetically—are the three biggest bellwethers.
Missouri has voted for the winner in every election since 1904, with the exception of 1956.
Nevada has voted for the winner in every election since 1912, with the exception of 1976.
Ohio: With Republican-vs.-Democratic party matchups dating back to 1856, no GOP has ever won the White House without carrying this state. On the other hand, four Dems, in five elections, prevailed without it: James Buchanan in 1856; Grover Cleveland in 1884 and 1892; Franklin Roosevelt in 1944; John Kennedy in 1960.
Over the last century these three states have been in consistent agreement—and they have voted for the winner at least 21 of the past 25 elections. On the rare occasion that Missouri or Nevada or Ohio disagrees—the other two have picked the winner. (Which seems to be, as late, what this polling reflects. Only problem: Ohio. So I'm not buying the polls with Missouri and Nevada leaning GOP unless Ohio goes that way as well—and, at that point, you may as well predict victory for Republican Senator John McCain over Democratic Senator Barack Obama. Due to the sensitivity of the two top issues—economy and Iraq war—and the U.S. voting history of similar economy and/or war of 1932, 1952, 1968, 1980, and 1992 … I believe it will turn out just the opposite, with Obama capturing all three bellwethers and winning in November.)
To repeat: In the long run, I believe all three—Missouri, Nevada, and Ohio—will once again agree [in 2008], and will vote for wins this election. (And I do not believe the winner will be John McCain.)
The results at T=0 suggest late-breaking undecideds break to the underdog. Particularly for the larger leads, this result is not supported by the data as far as I am aware. Blowouts become larger on election day (a fact the GOP is publicly worried about at the moment).
At a mimimum, the model should allocate undecideds 50/50 at election day (which is occuring according to your post) and the regressions should be forced through the polling average at T=0. That is, a 10 point lead at T=0 should be a 10 point victory, not 9.
8:48: And I do not believe the winner will be John McCain.
The Gravelanche will take the White House, you heard it here first. ;)
I'd second the call to factor third party candidates into the model. It's not just Perot in 1992. It's Wallace in 1968, Anderson in 1980, Perot in 1992 and 1996, and Nader in 2000. Third party candidates don't swing states, but they have a substantial effect on the final margin. Off the top of my head, I would say it is because they tend to draw votes from late deciders, protest voters, occasional voters, etc. You've got to account for Barr and Nader this year, or you risk being off by a point or two, which could (but hopefully won't) make all the difference.
Nate,
While I applaud the concept I am concerned about its implementation. Consider that one of the distinct differences that should lend itself to regression analysis of this election is the fact there are no incumbents. If you evaluate only the elections where there were no incumbents, the only election where the lead didn't change hands was Eisenhower. In elections with an encumbent, the only elections that the lead changed hands was Ford, Carter and Bush where the incumbent lost.
While I to beleive the results are more reasonable, I think it is for an unreasonable reason. I believe a higher order is called for here.
WOW...enough statistical data to make one's head explode...but in a good way.
Thanks for doing this! Remarkable work indeed!
Ed in Memphis
Nate, thanks for answering my question last night. This is a really great site and, in general, I think this is a good improvement.
I was curious about one thing, though (because it's easy to sit here and suggest extra work for you to do). As others have suggested, I think what you're seeing/modeling here is essentially a "regression to the mean". Have you given any thought to trying to explicitly model what that "mean" is based on outside factors.
There was some press recently, for example, of a model that predicted winners based on GDP, Presidential approval ratings, and the length of time a particular party has been in office. Such a model would predict a big Obama victory this year.
I guess my question is: have races historically trended toward 50/50 or have they trended toward some objective mean, which happens to look like a trend toward 50/50? If the former, then, of course, what you've done here is exactly right (as right as possible, that is), but if the latter, then (as I think somebody else suggested upthread), we might expect to see this election "regress" toward an even bigger Obama victory.
Also, while I'm here, I have an unrelated question. Rasmussen reports their national poll numbers with and without leaners (e.g., today Obama's up 7 without leaners, up 4 with leaners). Which number do you use and why?
Thanks.
I don't understand why the projection and snapshot for Florida are calling different winners. Why is that?
So the model is now a random walk with a weak restoring force. A massless particle on a sublinear string, buffeted by Brownian motion. Interesting.
And for those who think each individual state should regress toward the mean: In a sense that's true, but it should regress towards its own mean, not the national mean (which is here presumed to be 50-50). So Indiana regresses towards R+8, Vermont towards D+20, or whatever. If Utah's mean is R+30 and Obama is only down 20, that means he's currently got the support of some people who don't normally vote D. It'll be easier for him to give those back than to pick up more.
It seems to me that much of the tightening in the polls as the race goes on can be attributed to the lesser known candidate getting more press. We saw this exact thing happen in the race between Obama and Hillary. Obama's numbers soared as he campaigned because he was by far the lesser known candidate. However, his campaign had diminishing returns once everyone had gotten a chance to look at him. In fact, towards the end of the primary Nate had incredible success predicting races purely on demographics because the media had already saturated Obama and pretty much everyone had made up their minds.
I think this correction may be far less dramatic in the general election as Obama is very well known nationally as a result of his contested primary and McCain is already well known as a former candidate and long time Senator.
I do expect polls to tighten maybe maybe not as dramatically as past elections.
The projection map looks more correct now. I really wasn't buying the possibility of an Obama win in Missouri, and I still have strong doubts about Indiana.
Missouri, as Anonymous@9:48 said, is a pretty reliable bellwether - it almost always votes for the winner. Clinton won it twice. Indiana hasn't voted for a Democrat in a non-landslide since 1892, I think (it did vote for Wilson in 1912, for FDR in 1932 and 1936, and for Johnson in 1964). It is virtually inconceivable, historically, that a Democrat would win Indiana and lose Missouri.
Missouri, Nevada, and Ohio—to name them alphabetically—are the three biggest bellwethers.
New Mexico has a pretty solid record, as well. It has voted for the popular vote winner in every election since statehood, save 1976. (It also voted for Gore in 2000, but he won the popular vote, so I'm not sure that should be considered a failure)
This change will have only one effect: making every election look like it will end very tight if it's more than a few months away.
The reasoning you use seems to be "well, races don't end at the winning candidate's peak position". But this has nothing to do with races "naturally tightening toward the end." It has everything to do with the fact that Election Day is only 1 day out of say 200 in a serious election... the odds of it being the "peak" for any one candidate is therefore something like 1 in 200. This is a "today is not tomorrow" effect, not a "races tighten as time goes on" effect.
The odds of the race getting closer in the future really no greater than the odds of the race widening in the future. Thus, even though the change makes this particular's map more "acceptable" to us right now, the actual logic behind it seems very flawed and it would have made Reagan-Mondale project as a "close election" in late June 1984 as well.
Nice improvement indeed. However, what I would like to see is whether the gap is closing uniformly across all 50 states 9as you sort-off) assume or whether it is more/less drastic in battleground states. If you have state-by-state data for previous elections it might lead to a better regression model.
1. I think this trend-adjustment stuff has too much voodoo involved with it. The way to do "trends" would require understanding the correlation among neighbor states and to do a epidemiological "viral" simulation (like how disease spreads). Its possible to do but is way too complex I think to do.
2. Post the General Election Model regression models and coefficients on the FAQ. Who cares about Clinton model and primary race regression models.
3. Allocation of undecideds: Obama is change/reform/forward looking candidate. McCain is status quo/"experience" (whatever that actually means)/incumbent-light as 3rd term of Bush/Cheney.
There surely is some research (think Kaneman and Turversky) that shows some insights into why and under what circumstance undecideds will break (unless there are right conditions) for status quo or what seems most like status quo. The primary season gave us a preview: Clinton was incumbent-light. Obama was change/reform. Late deciding undecided tended to break for Clinton because of some of the underlying psychology behind "decision making" unless there are certain other circumstances which overcome risk-averse bias.
Consider the difference between incumbent and incumbent-light
Nixon - incumbent light/status quo
Kennedy - change/reform
Johnson incumbent/status quo
Goldwater - change
Humphrey - incumbent light/status quo except on civil rights
Nixon - change "back"
Wallace - change "back"
Ford - incumbent light/status quo
Carter - change
Carter - incumbent light because of hostages and Kennedy challenge
Reagan - change
Reagan - incumbent/status quo
Mondale - change (back)
Bush - incumbent light
Clinton - change
Perot - change
Clinton - incumbent/status quo
Dole - status quo but not incumbent
Perot - change
Bush - change from Clinton
Gore - incumbent light/ status quo
Bush - incumbent (but maybe incumbent light because of War)
Kerry - he was for change before he changed his mind on being not really change, etc. etc.
Obama - change
McCain - incumbent light
Of the two changes made, I think the second is based on a slightly doubtful assumption. On the other hand, the first change more or less corrects the error I think comes with the second (and also came with the previous model).
I would argue that the two problems addressed by the new changes are connected: undecideds making up their mind is a significant part of the reason why races tighten when approaching election day. With their numbers now close to 20 percent in several states they will obviously affect the race.
Some comments have raised the concern that undecideds might not break 50-50, and that they may break differently in different demographics and thus differently in different states.
Discussing the first change, Blame asks if races don't tighten towards party ID rather than 50-50, and Lilnev makes the case that a state should regress towards its own mean, not the national mean.
I think there's much merit to this line of reasoning, and I think the main reason for this is that the undecideds who eventually vote (some probably won't), tend to vote according to their habits from previous presidential elections. If this is factored into the model, the result will be that states regress towards their own mean, and that a national popular-vote advantage will shrink.
Let's look at Utah, where Bush beat Kerry 72-26, but where McCain's lead in the trend-adjusted average is currently only 54-32 with approximately 14 percent undecided. Ignoring the rather improbable scenario that McCain has made significant inroads in Kerry's support but lost even more Bush voters to Obama, this should mean that almost all the undecideds voted Republican in 2004. My guess is that in this group, the majority of those that bother to vote in 2008 will go Republican again (possibly holding their noses while doing it). If this is the case, Utah won't approach 50-50, but rather move some distance towards 72-26.
Since the last two elections have been close, the accumulated nationwide effect of these trends would be that undecideds break in a way that tightens the race. In creating a method to factor in that a national lead in polling tend to shrink, Nate has basically taken this into account.
Some other commentators have argued that it doesn't make sense to calculate a nationwide adjustment towards an even split and then applying it evenly across all states. I agree with this, but it's also fairly obvious from the Utah example above that it wouldn't make sense to assume that all individual states will gravitate from current polling towards 50-50. Sure, many states will do just that, but mainly because the voters in these states have a tendency to divide their votes fairly evenly in presidential elections.
If polling history seems to agree with what I'm saying, it may be worth adjusting the calculation of how undecideds break. Since this will to some extent work in the same direction as the national polling advantage discount, the algorithm used for calculating the discount probably needs readjusting in order to keep it in line with previous election trends.
If you are going to use long term trends from previous elections and want to project all the way to election day, why not also incorporate the long-term standard deviation (i.e. between polling averages 130 days before the election & final outcome)?
It would probably make the election at this stage a complete toss-up, but that would be more realistic.
In essence it would include all the noise of actual events.
A couple of other thoughts:
When there is "no information" the hypothesis needs to be 50%.
I have previously posted my preference for Shewart's/Deming's thoughts on changepoints (of which "trends" are one type - when to throw out data that has a special cause being another). The IndX/Mr chart would be useful I think. But I was just reading about Page,Shiryayev and Roberts. (Devlin , Lordon, "The Numbers behind Numb3ers" a birthday gift.) I'm thinking something like considering all your past and current simulation run and making the likelihood of IN going for Obama the canary in the coal mine.
I like the modification that distinguishes between a snapshot and a projection. However, I would like to know what assumptions are being made regarding the corresponding uncertainty of the projected result.
My gut tells me that along with the narrowing of the margin of victory, there is a corresponding narrowing of the probability distribution, i.e. victory margins are shrunk, but the uncertainty of the outcome is diminished.
Perhaps there is useable information in the variability of the historical polling data that can improve the precision of the estimates as the election day approaches. (And then again, maybe this has already been taken in account.)
Are numbers already adjusted down for the Time, Rasmussen (6/27) and Gallup (6/26) polls (O +4, O +4, Tie) respectively? Cause in my math this thing is getting closer and closer to the margin of error on major polling.
The other really telling number in this posting is that in elections where both candidates have led in polling at any particular point, a person with at least a 6 point lead has lost 7 out of 9 times. 6 points does not seem nearly as secure as this website makes it sound. It sounds like Nate is saying at most a 6 point lead at this point can regress to 3.8 points. That is simply not true, numbers historically move much greater and faster than that.
Higglytown:
In looking at you comment about the margin of error it sounds as though you are not taking into consideration that if you have a collection of polls, averaging them means that the margin of error for that combination is signficantly smaller than the indiviual sampling errors of each poll.
If you have three polls each with an n of 400, the samplng error of each (all other things being equal) is 5%, but the sampling error of their average (all other things being equal) is 2.89.
I try not to snark on other commenters too much, but I find it amusing the number of comments on this post taking the position "it's impossible to predict the future."
Might I humbly suggest that if you believe that, a site whose whole purpose is to project the outcome of the election may not be the place for you? Critiquing the method is fine, but if you declare that the results are meaningless because the objective is impossible, you're essentially saying "stop doing this," which Nate is unlikely to do (nor, based on the traffic, do most readers want him to.)
And on a lesser note, for those who believe it is impossible to predict without directly taking into account all the factors that have produced election results in the past, keep in mind that even for past elections, most of the "reasons" for the outcome are speculative at best; only marginally better than the daily "reasons" cited by financial analysts for why the stock market moved the way it did. Statistical modeling does not require knowing the exact cause and effect for everything; indeed, it can be used to discover effects that contradict conventional wisdom.
I follow Nate's site not because he can in any way shape or form predict the future and the election. I follow because it is by far the best sight at analyzing the current trend of likely voters in the populace and tells both parties what they are up against going into the closing months. It is extremely valuable for that lesson.
When I learn that Obama has an innate advantage in a certain state or area because of its demographics and party ID and learn how to balance that against the voter trend and sentiment, I can tell more exactly the swing needed to turn the election.
I just do not believe it is predictive, it is only taking basic trends as reflected in polls and balancing those numbers in numerous weighted ways to fully reflect the complexity of the current situation.
But then again I do not think polls are by there nature predictive of outcomes at this stage, nor should they be. Too much debate on issues needs to take place first.
My issue is that people may believe Nate's predictions are fact and it therefore could suppress voting, which is bad for everyone. Especcially with the recent publicity, and the none mathematical public masses viewing these predictions.
why would you discount each state by 2.1 percentage points even though their snapshot margins vary from very small to very large? the proportionality should apply at the state level too, no??
plus, your 2.1 rule actually swings some states from one candidate to another. that seems silly, because your whole point is that margins move toward zero over time...why would it start moving away from zero after it crosses to the other candidate?!?
i think you might need to rethink this.
2 points:
1. I think your model should take into account undecideds. As I have said before, I suspect that a state with higher undecideds is more uncertain, (and thus the last-minute-movement correction should be higher), but that you should correct for the effect of different polling outfits having different undecideds.
2. A much stronger effect than elections closing at the last minute is 3rd-party candidates collapsing at the last minute. I think that your 3rd-party gap should be about half as big. Luckily, this changes nothing about your results, so you can study the issue and do it later. Still, it is really hard to justify fixing one thing without the other.
That revised lead seem paper thin to me. Bradley effect? Obama wasting votes in the NE and CA piling up big majorities there, leaving him fewer votes to spread over the battleground states he needs to win.
It should be an interesting night on Nov. 5.
Watch Maine!
My issue is that people may believe Nate's predictions are fact and it therefore could suppress voting, which is bad for everyone. Especcially with the recent publicity, and the none mathematical public masses viewing these predictions.
Oh, please. Nobody reads this sight (or TNR, for that matter) who isn't a political junkie who is going to vote. People need to stop thinking that websites with a few thousand readers are going to have some kind of macro impact on the election.
I have a question. Have you factored in the possible point shaving effect of race?
I have run simulations that show a solid 2 point shave and a probable 4 point shave and a possible 6 point shave from Obama's total.
That would make the race much closer (or a loser for Obama) that your forecast shows. I COULD BE WRONG.
I dont have expertise in this area or a broad enough data set to draw any conclusions I'd place money on. And some of the racial aspect of the data was infered.
So I'd like to know your thoughts on this.
Lastly, later you might want to look into neural networks to augment/replace your regression. They can significantly improve your results if the data isn't linear and can sometimes infere effects like race.
Nate,
Have you looked into Bayesian regression? The general idea is that you explicitly include your prior expectations in the regression analysis. For example, if your prior expectation is that smaller margins of victory are more likely, the model would be biased towards predicting smaller margins of victory. There are a lot of benefits to this approach:
1) It does exactly what you are trying to do here by favoring more realistic margins of victory.
2) It is built on a solid, rigorous mathematical foundation.
3) It enables you to use a large number of variables in your model without having to worry about the noise inherent in variable selection.
4) It makes convergence to the "best" model a smoother process.
5) It is very easy to implement algebraicly. It's really just a matter of adding a single term to your regression.
Some of the characteristics of your model appear to be attempts to compensate for the fact that you are inherently thinking about this problem in a Bayesian sense, but you have not modeled it as such. The Bayesian approach would be more elegant, informative, and arguably more rigorously defensible.
Anonymous said,
"why would you discount each state by 2.1 percentage points even though their snapshot margins vary from very small to very large?
i think you might need to rethink this."
The reason you have to re-think this 2.1% backswing is that you have used incredible precision in analyzing the snapshot of each state and you throw away this precision by throwing in an arbitrary number. It is akin to measuring the distance from the earth to the moon to the nearest meter. Then, when you realize that the distance changes in the course of the month, you add 100,000m because it makes you feel better.
Much can change between now and November -- a gaffe here, a dirty trick there, and someone promising to "deliver" a State by making it easy for the "right" people to vote and difficult for the "wrong" people to vote or to tamper with voting devices. Some indications suggest that this election will be close, and that John McCain has a genuine chance. If he can make a marginal nationwide gain, he might pick off Ohio and maybe Pennsylvania or Colorado as well as Virginia, Missouri, and Indiana which are statistical ties... and the election. It wouldn't take much, but nobody can predict how that is possible. Remember well; Kerry was ahead of Bush by about the same amount as Obama is ahead of McCain at the same time in 2004.
But if things go the other way, it won't be a close election in electoral votes. Obama solidifies his hold of Pennsylvania and Ohio, picks off a few States that rarely vote for Democratic candidates for President (including Virginia and Indiana), a bellwether state like Missouri, and seem to have been trending Republican (North Carolina, Georgia, Florida). A close election then becomes a landslide in the contest for electoral votes. Note well that North Carolina, Georgia, and Florida have together 57 electoral votes -- two more than California alone. That scenario gives Obama 400 or so electoral votes.
It's surprising at first sight that the distribution of electoral votes for winners has a gap between 307 (JFK, 1960) and 370 (Clinton, 1992). Presidential candidates seem to win by 40 or fewer electoral votes in really-close elections (1916, 1948, 1960, 1968, 1976, 2000, and 2004) or by 100 or more.
The pattern holds because a gain in national percentage of votes cast results in blocks of states flipping -- and at times one of those blocks is the difference between a close election and a not-so-close election. It will be a close election should Obama win the States that Kerry won and either Ohio or the combination of Colorado, Iowa, and New Mexico. that's still around 300 electoral votes at most. He could win with two of Virginia, Missouri, and Indiana -- but he's not going to win either of those unless he wins Ohio, and if he picks one of those three states, it's still close (about 310 electoral votes). With those three, it's 330 electoral votes -- undiscovered country.
Strategies likely to win in Indiana, Missouri, and Virginia put North Carolina, Georgia, and Florida in reach; those States probably vote together, and who knows what goes with them. That pushes the count to about 400 electoral votes.
Obama's campaign seems to have learned something that neither Gore nor Kerry got -- don't target one particular State. The other side might prove capable of ensuring that that one State doesn't have a fair election. Would Bush have won Florida in 2000 or Ohio in 2004 without such deeds? Who knows? Will Obama or McCain resort to such techniques this year? That depends on their character.
It is possible this year that one of the candidates can have a better record in history by choosing an honest loss over a dishonest win. It is essential that those who have partisan axes to grind remember that winning isn't everything; the credibility of the game is even more important.
But one thing that I don't predict is an Obama victory with 320-360 electoral votes. He will lose by 10 or fewer, win by 40 or fewer, or win by 100.
Whoops!
The 1968 election wasn't close in electoral votes!
情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,
^^ nice blog!! thanks a lot! ^^
徵信, 徵信社,徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 外遇, 抓姦, 離婚, 外遇,離婚,
徵信, 外遇, 離婚, 徵信社, 徵信, 外遇, 抓姦, 徵信社, 徵信, 徵信社, 徵信, 外遇, 徵信社, 徵信, 外遇, 抓姦, 徵信社, 征信, 征信, 徵信, 徵信社, 徵信, 徵信社, 征信, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信社, 徵信社, 徵信, 外遇, 抓姦, 徵信, 徵信社, 徵信, 徵信社,
^^ nice blog!! ^@^
徵信, 徵信, 徵信, 徵信社, 徵信社, 徵信社, 感情挽回, 婚姻挽回, 挽回婚姻, 挽回感情, 徵信, 徵信社, 徵信, 徵信, 捉姦, 徵信公司, 通姦, 通姦罪, 抓姦, 抓猴, 捉猴, 捉姦, 監聽, 調查跟蹤, 反跟蹤, 外遇問題, 徵信, 捉姦, 女人徵信, 女子徵信, 外遇問題, 女子徵信, 徵信社, 外遇, 徵信公司, 徵信網, 外遇蒐證, 抓姦, 抓猴, 捉猴, 調查跟蹤, 反跟蹤, 感情挽回, 挽回感情, 婚姻挽回, 挽回婚姻, 外遇沖開, 抓姦, 女子徵信, 外遇蒐證, 外遇, 通姦, 通姦罪, 贍養費, 徵信, 徵信社, 抓姦, 徵信社, 徵信, 徵信公司, 徵信社, 徵信, 徵信公司, 徵信社, 徵信公司, 女人徵信, 外遇
徵信, 徵信網, 徵信社, 徵信網, 外遇, 徵信, 徵信社, 抓姦, 徵信, 女人徵信, 徵信社, 女人徵信社, 外遇, 抓姦, 徵信公司, 徵信社, 徵信社, 徵信社, 徵信社, 徵信社, 徵信社, 女人徵信社, 徵信社, 徵信, 徵信社, 徵信, 女子徵信社, 女子徵信社, 女子徵信社, 女子徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社, 徵信, 徵信社,
艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要到打工兼差、打工,兼差,或者八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店當上班小姐,水水們如果想要擁有打工工作、晚上兼差工作、兼差打工、假日兼職、兼職工作、酒店兼差、兼差、打工兼差、日領工作、晚上兼差工作、酒店工作、酒店上班、酒店打工、兼職、兼差、兼差工作、酒店上班等,想了解酒店相關工作和特種行業內容,想兼職工作日領、假日兼職、兼差打工、或晚班兼職想擁有快速賺錢又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!
艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領、現領。
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表
水水們妳有缺現領、有兼職、缺錢卡奴的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的夜間兼職工作,打工機會和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店經紀,
酒店打工經紀,
制服酒店工作,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
酒店經紀,
菲
梵,
Post a Comment