6.28.2008

Construction Season Over (Technical)

This afternoon, I completed a series of refinements to both the trendline adjustment that was implemented two weeks ago, and the mean-reversion adjustment that was implemented earlier this week. I am hopeful that these will be the last significant changes to our methodology. The refinements are described in more detail below.

Changes to Trendline Adjustment

The most noticeable change is that the trendline curve has been retooled to be considerably more sensitive to changes in the polling data. For example, compare the curve we're using now (this is the top graph) to the one we had in place a couple of days ago (the bottom graph):




The more sensitive curve does a much more intuitive job of pinpointing Obama's post-primary bounce. Rather than showing a leisurely jaunt upward for Obama in the polls over the course of the past month, it instead has his numbers improving much more steeply right as the primaries end, but then leveling off. In fact, the new curve thinks that Obama's numbers peaked shortly after Hillary Clinton's concession speech and that he's lost perhaps half a point in the polls since then.

The other problem with the curve we had been using before is that, by being so slow to respond to changes in the polling data, it was causing us to adjust some of the previous polling results incorrectly. For example, it might have been taking a poll conducted 14 days ago and actually giving Obama a bonus point or two from it, when the more sensitive version of the trendline reveals that Obama's numbers have been flat since then. In other words, the more slow-moving trendline, which was intended to be more conservative, was actually being too liberal about adjusting upward polls taken after most of Obama's post-primary bounce had been realized.

A second, more technical adjustment to the trendline is that it now weights the daily datapoints based on the number of polls that were conducted that day. Before, a day on which just one poll came out had just as much influence on the curve as a day like 2/27, when SurveyUSA released polling in all 50 states. This idiosyncrasy has now been resolved.

The third adjustment is in the way that the trendline adjustment is attributed to particular states. The formula that we were using before was causing problems because the value of the dummy variables used to calculate our terndline adjustment are arbitrary except when taken relative to one another. The new procedure for calculating the state-by-state trendline adjustment is as follows:

1. For each state in which at least 5 polls have been conducted, we perform a regression of the polling results in that state relative to the LOESS trendline curve. Recent polls are weighted more heavily to place the emphasis on the current movement in the numbers. The coefficient produced by each state's regression tells us how sensitive that state is relative to changes in the national numbers. For example, in New Hampshire the polls have been about three times as sensitive to national trendline changes as has the nation as a whole, whereas in Iowa there has been essentially no relationship between the polling in that state and the overall national trend.

2. We then take the coefficients produced in each state and regress those against a series of demographic and political variables to determine what exactly is triggering the changes. For example, right now the changes are mostly related to (1) states in which Hillary Clinton had a lot of support in the primaries; (2) states that have a lot of independent voters; (3) states with a high number of voters who identify their ancestry as 'American', which means states in Appalachia and parts of the South.

The results of this regression give us our 'm' parameter that tells us how to scale the trendline adjustment in each state. As before, m is capped at values of 0.0 and 2.0.

The spirit of the adjustment is exactly the same as it was before, but the results of the calculation appear to be more robust and intuitive than they were before. Obama's numbers are adjusted upward sharply in states like Connecticut and West Virginia, which have not been polled since the primaries ended, because he has seen big movement toward him in similar states like New Jersey and Kentucky, respectively. But he isn't assigned much of a bounce in, say, the Dakotas, because his polling in the Upper Midwest has been much flatter.

One implication of being able to do this calculation more precisely is that the model now sees Obama as having a slight excess of popular votes relative to electoral votes. He has gotten a big bounce in large, Clinton-leaning Democratic states like California and New York; perhaps he'll now win these states by 20 points rather than 15. While that will help with his nationwide popular vote total, it will do little for him in terms of the electoral math.

Change to Mean-Reversion Adjustment

The mean-reversion adjustment, which takes points away from whichever candidate is leading in the national polls because there is a strong historical tendency for the polls to tighten before Election Day, had previously been taking an equal number of points away in each state. If it had calculated that Obama is likely to lose 2 points between now and November, for instance, is was simply lopping 2 points off his margin in each state.

The mean reversion is now state-specific, based on a variant of the procedure used to assign the trendline adjustment to individual states. In other words, we see which types of states and demographics have been most sensitive to movement in the national polling thus far, and use that to infer which states might be most sensitive going forward. In fact, the procedure used to calculate the state-by-state mean-reversion adjustment is identical to the one used to calculate the state-by-state trendline adjustment, with the exceptions that (i) the mean-reversion model does not weight recent movement more heavily, instead looking at the overall sensitivity of each state's polling since February; (ii) because it does not necessarily follow that those states that have been most sensitive to national polling momentum in the past will continue to be so in the future, we hedge our bets by assigning only half of the mean-reversion adjustment on a state-by-state basis, with the other half being assigned equally to all 50 states.

Lastly, I have slightly tuned down the vote share assigned to third-party candidates by rerunning the regression used to determine this figure while excluding the 1992 and 1980 elections, the two years in which a third-party candidate was invited to participate in a nationally-televised debate. We are now assigning about 3.8 percent of the nationwide popular vote to third-party candidates rather than the almost 5 percent that we had assigned before.

47 comments

JGabriel said...

The new methodology looks great, Nate. This feels like a much more intuitive and solid projection than the last run.

Thanks for all the work you've put into this. We political and stats junkies appreciate it.

.

Juris said...

This is really looking sharper in numerous ways.

Small question: how much time does it take you to run an "update" with today's polls (assuming you're no longer tinkering with the parameters)?

Related question: are you doing the analysis with STATA and then have some automated way to translate that into a spredsheet, with the spreadsheet also producing the graphics? Or what steps does this go through to produce what we see in the tables, charts, and maps?

Thanks. Just curious about the operational side of this.

Tom said...

Nate, this is really solid work. I'm very impressed. While it's obviously impossible to "check" your work until November, your win percentage does match up very well with In-Trade's "wisdom of the crowds" (65%, last I looked).

I still think that the underlying demographics - the economy, Bush's popularity - favor Obama more strongly than polls are showing. But I understand and respect your decision to focus entirely on projecting based purely on polls. And given that, I think you really do an excellent job here.

Nate said...

Juris,

The model was originally completely self-contained in EXCEL, but generating the trendline curve requires more sophistication so that procedure is now conducted in STATA.

The basic process is:

1. Plug new polling data into EXCEL (1-5 minutes).
2. Export polling data into STATA, run STATA routine to calculate trendlines, etc. (2 minutes).
3. Plug STATA output back into EXECL and run simulations (5 minutes CPU time).
4. Export charts and graphs into flickr, upload onto site (5 minutes).

So it's about a 15 minute procedure, give or take.

obsessed said...

quick silly question: In the main map, how many discrete shades of red and blue are there? Obviously 50-50 is white, but how many percentage points away from that do you have to go for each change of hue?

Andrew said...

Fantastic work - the amount of work you put into improving your process and refining your projections is evident.

The current graphs look much more substantial (and as you have said, more sensitive to ongoing polling changes) than, say, a month ago.

Keep up the great work! I always look forward to checking your updates each day.

JGabriel said...

Nate, quick question:

Just as a reference point, is that big spike near the center at 293 or 306?

Or is it some other nearby number?

.

JGabriel said...

P.S. Any chance of getting a median EV as well as the average?

.

obsessed said...

293 or 306?


I think he said it was 293 the other day.

Kerry States
OH
IA
NM
CO

or, all the states 60% and above in the left hand column. Below 60% there VA at 50% and then nothing til 41%, so the 306 is with VA and the 293 is without.

obsessed said...

continuing with that, the two spikes just to the left of the big one are losing CO and NM, but still winning the election.

IA appears to be the safest of the flipped Bush states. I hope that's true! As long as you have IA in the bag, there are many winning combinations, like NM+CO without OH.

Frank from Germany said...

I like your updates a lot - great work! However, I am still not sure what to make of the mean-reversion.

I had a look at what is available on the Internet on electional polling history - 2004 is still well documented on RCP, and CNN has some archives on their 2000 polling. Graphs for both elections don't look like the trend you posted a few days ago, but rather like some cyclical up and down, except for October 2004, when Bush was taking a strong lead that diminished, but not disappeared towards election day. In other words: Judged by 2004 and 2000 (and not having available polling history on the previous elections), it looks that you may rather be adding than removing noise with your means-reversion, since we don't know whether we are currently seeing a peak, which is bound to decrease over time, just normal up and downs a few months before the election, or the start of a positive trend towards Obama.

For that reason, but also to be able to compare your analysis (which I deem to be superior) to other projection sites on the web, I would be happy if you could, in addition to your "projection map", also publish a "snapshot map". You may then leave it to readers to select the one they want to trust more.

If you want to continue with mean-reversion, it might be justified to look in some more detail into how such effects affect individual states. As I understand, you are now treating mean-reversion effects as being 50% national and 50% state specific, which is a legitimate assumption in the absence of any other data. However, such data is available (e.g. on RCP for the 2004 race), and it might be worthwhile to study mean-reversion effects on state levels in some more details in order to come up with a more accurate allocation of such effects to individual vs. all states. I acknowledge that such analysis may be time-consuming, and you may have other plans for this summer as well :)

obsessed said...

a nice enhancement would be to calibrate the y-axis of the Electoral Vote Distribution chart.

For example, that ominous red spike where Obama loses everything but DC, how many trials out of 10,000 is that?

Tom said...

"IA appears to be the safest of the flipped Bush states."

I was just noticing that if you look on the right column, there's absolutely nothing showing McCain ahead of Obama in Iowa. 9 polls and Obama won ALL of them. Heck, McCain even won one NY poll, but not Iowa.

KH in Houston said...

Awesome update Nate! I love when we go a few days without any subjective updates!

Anonymous said...

I do think there's room for one further bit of construction - unless I misunderstand your work and you did this already. This is to bring time now not only to the projection but also to the pollster weights.

If I'm not wrong, your pollster weights are calculated from very near-election polls only (right? What's your cutoff point?); and also assume no temporal error at that point.

As your recent regression to the mean correction points out, there's always temporal error, even with last-minute polls; and it also suggests ways of incorporating long-range polls. (Obviously how to do this is something you'll be uch cleverer about).

The assumption that even late polls have to face some temporal error means that we know less than your model believes right now about the accuracy of polling firms (because less error is in fact polling error).

The inclusion of long-range polls in the evaluation of polling firms would add information which we throw away.

Finally, many have suggested that different firms may exhibit not simply error but also D or R bias. This is certainly worth exploring.

So, no time to rest on your laurels! We do enjoy those improvements and their brilliant, rational discussion so much!

nieddu said...

I never thought I'd like statistics, until discovered 538.com.

Great work, and yes indeed the best prediction and instructive site available, no one should accuse you guy of bias or partisanship.

Anonymous said...

IA appears to be the safest of the flipped Bush states.

Wonder how much of that reason is that Obama voted for the ethanol subsidies both times, while McCain ridiculed it as pork both times?

Anonymous said...

There's something arbitrary about your state-by-state fine-tuning of the regression to the mean. But you realize no doubt that very near election day this fine-tuning would most likely amount to a .2-.3 percent or so difference between states, i.e. would be well less than your likely error.

lilnev said...

I'm not sure that the LOESS approach is the correct one, for both theoretical and practical reasons. The overall output is extremely sensitive to the very tail-end point -- every poll is trend-adjusted according to that point, so as it swings, the entire popular vote projection swings (with a fractional discount for mean reversion).

Now, how is that endpoint calculated? As I understand it, it's determined by a (linear or quadratic?) regression on the last ~week of data. But that implies some continuity of motion, or "momentum". If Obama was doing better 6 days ago than he was 3 days ago, we're forced to assume that he's doing even worse today.

Is there a good theoretical justification for that assumption? Or is a better assumption that his movement in the last 3 days is independent of his movement between 6 days ago and 3 days ago? My hunch is that the latter, essentially a random walk with large sampling noise, is the better theoretical model.

And if that is the better model, how should we smooth it to obtain a trendline? I would suggest an arbitrary curve with two components to the error function, the sum of the squared residuals and the sum of the squared slopes (the idea being that large fast moves are unlikely in a random walk with assumed Gaussian distribution). Find the arbitrary function y that will minimize:

sum( (y(t)-x(t))^2 ) + b*sum( (y(t)-y(t-1))^2),

where x(t) is the actual daily data points, y(t) is the trendline, and b is a weighting parameter that controls smoothness. (An increased penalty for slope over the last few days should probably be added on the assumption that the last few polls are off the true trend one way or the other; thus upcoming polls will cause reversion to the mean, and any slope in the last few days will likely incur an additional penalty to "undo").

Hm, Nate, could I ask you for the input to the Super Tracker (with the weights), so I can play around with this? kneville at mit dot edu.

Thanks,
lilnev

obsessed said...


Wonder how much of that reason is that Obama voted for the ethanol subsidies both times, while McCain ridiculed it as pork both times?


Interesting thought. I don't understand the full ramifications of the ethanol debate but if IA comes through in the GE after coming through in the primary they'll have brought home my bacon big time.

Juris said...

Wonder how much of that reason is that Obama voted for the ethanol subsidies both times, while McCain ridiculed it as pork both times?

It is pork. McCain was right. But it's also politics. And Obama "out-thought" him on this one and will reap the benefit.

obsessed said...

sum( (y(t)-x(t))^2 ) + b*sum( (y(t)-y(t-1))^2)

God I love this site!

obsessed said...

It is pork.

Well, I'll chip in to send a truckload of carnitas to Iowa. Actually, after reading the Bob Barr thread, maybe we should make that honey baked hams.

Alexander said...

I think the changes are all for the better and most of the results seem very reasonable. However, I still think that the model has some trouble projecting believable results in states with very lopsided support. This will not flip any state, but if you're looking for accurate predictions across the board, you should look into it.

Pointing again to Utah, the model currently predicts that Obama will give Democrats their best result in the last 40 years. This might well be possible, but as a projection it doesn't really fit with the rather conservative estimates elsewhere in the model, such as the assumption that Obama's lead in national polling will shrink.

On the opposite side of the spectrum, another example that have been raised in comments is DC, where the projection gives Obama one of the worst results in recent history, significantly below Kerry's.

At the risk of repeating what I've said in a previous comment, I think one explanation could be how the model treats undecideds. As I understand it, the assumption is that they will split evenly between the two major party candidates (with some going to third party candidates).

But with McCain at only 55.0 in the Utah snapshot, do most undecideds in the state actually ponder whether they should vote for the Republican or the Democrat? Probably not. Most of the people in Utah now telling pollsters that they are undecided will either get behind McCain or stay at home on election day. Some might vote for a third party candidate, and a few will chose Obama. But likely not enough to take his numbers to record levels in the state, if his national numbers stay at the current level. With Obama at only 82.8 in the DC snapshot, the exact opposite probably applies there.

Giving some more thought about how this could be taken into account, I would suggest that you start with assigning undecideds according to the candidates' current shares of the two-party vote. This makes more sense than the 50-50 assumption. In essence, you're saying that currently undecided voters will eventually vote like their neighbors, or not at all (both ought to have the same effect on vote shares). A more sophisticated method might be to use some kind of regression to allocate the undecideds.

Although the model is great as it is, and although you've now declared construction season over, I hope that you can see that it still produces some funny results at the edges, and that you're able to come up with some clever tweak to fix that.

Juris said...

Re ethanol/corn as pork, it's useful to keep in mind that Illinois is a huge corn-producing state, right up there with Iowa, so Obama wasn't just being neighborly toward Iowa when he voted for this.

JGabriel said...

linev: "... is a better assumption that his movement in the last 3 days is independent of his movement between 6 days ago and 3 days ago? My hunch is that the latter, essentially a random walk with large sampling noise, is the better theoretical model."

Lilnev, could you explain this a little further?

In particular, I'm having trouble understanding how a random walk, even after smoothing, has any more predictive value than just following a flat line until more data becomes available. It would appear, on the surface anyway, to just be randomly noisier.

A trendline, on the other hand, would seem to have at least marginally more predictive value than simply leaving the result flat or randomly noisy until the next data point is added.

I'm probably missing something in your explanation. I get the theory behind it, I think - I'm just puzzled over where the predictive value would come in. Please elaborate a little bit?

.

Another Mike said...

This is the best site on the internet for polling and election projection analysis. Thanks again Nate for all the hard work and time you've invested to make it so.

Now, get off your ass and give us those Senate projections!

Another Mike said...

obsessed said..."a nice enhancement would be to calibrate the y-axis of the Electoral Vote Distribution chart."

Each line represents exactly 10 simulations. I'm basing my answer on the fact that there are 538 possible outcomes on the x-axis (round down to 500 for simplicity sake) and average outcome seems to be around two lines up the y-axis. 10,000 (number of simulations) divided by 500 = 20.

obsessed said...

Is there a quick lawyman's explanation of "random walk"?

kubla000 said...

I'm losing trust in this site. Honestly as I go down the list of the state by state, I'm seeing that in state after state, your Polling Average is essentially the same as your projection...

PA: 5.5/6.2
OR: 7.1/6.6
OH: 3.5/3.3
NV: 2.8/2
NM: 4.3/4.1
NH: 6.2/6.9
NC: 4.7/4.1
MT: 7.3/7.4
MN: 10.9/10.2
IN: 1.8/2
IA: 6.2/6.3
FL: 1.4/1.9
CO: 3.8/3

Nearly 1% Difference:

MI: 3.2/4.1
MO: 3.5/2.2
GA: 8.6/7.4
CT: 8.9/13.6

Maybe it's because I grew to respect you during the primaries when you saw things polling did not based upon demographic regressions, and since a General Election is Nation Wide all at once, you can't so same/similar to neighbors as much.

Maybe it's because in many states, polling is frequence enough and weighted heavily enough as to render the Regression models less important.

But, what I'm seeing is less interesting every time a change is made. What you were showing days ago seemed to be a prediction, and what I'm looking at today is about as valuable as Karl Rove's electorial maps which are simply based upon the current average of polling from RCP, or Kos/OpenLeft's daily updates based upon the Pollster leader in any given state.

I fail to see how this site is capturing anything unseen by polling anymore. IT seems that prediction powers have been eliminated as enhancement after enhancement is made... only lending polling more power than demographics. That's what was so interesting before, that you could look at trends in populations and see how things would shake out.

Anyhow, not to be a sour puss, but this just isn't nearly as interesting anymore. It seems that you've done a whole terrible amount of work, and at the end of the day, simply mirror what Pollster/Kos/Openleft and Karl Rove all already do...

What seemed at one point to be a break through in predictions beyond polling has now become simply a prediciton based upon polling averages... and that's sad. I've noted the comments are going down, less and less. I think readers are being lost as the novelty wears off and the findings converge with already established information.

Frank from Germany said...

@ Kubla000: You are maybe overshooting a bit. I get your point, and I aggree with it. I as well have been attracted by the demogrphic analysis, and would like to see some more analysis on how the race unfolds among women, "Americans", seniors etc. On the other hand, Nate's "trend adjustment" has really been excellent and spot on, and, in spite of his current technical pre-occupation, he has recently provided interesting insights like, e.g., the fact that Appalchia seems to come to some terms with Obama. Give the guy his time, and the 'benefit of the doubt' - I am sure he will continue to surprise us all!

lilnev said...

at JGabriel:

What you say, that a random walk model predicts nothing, is true. Given that today the value of our variable is 29, our best guess about tomorrow is that it will be the same. If tomorrow it turns out to be 30, our best guess about the next day is that it'll be the same as tomorrow, 30. There's no restoring force, and there's no momentum. The probability distribution of movement on each day is zero mean, and independent of each previous day.

Contrast this to a model with momentum. If yesterday our variable was 29, and today it's 30, our best guess for tomorrow is 31.

The LOESS model that Nate is currently using is a momentum model, at short timescales. (The point of LOESS is that you find a linear or quadratic fit over each "reasonably small" local interval, and require each fit to meet up with its neighbors.) LOESS can do a lot of great things, fitting trends when you have no a priori reason to expect a particular functional form. And I'm fine with a LOESS fit for the great majority of data points that are not at the extreme outer edges of the dataset being fitted.

But in this case, the outermost point, today's point on the trendline, has a huge influence on the projection. Every single poll gets adjusted according to today's point -- the accuracy of today's point is as important as the accuracy of all other points on the trendline combined. If today's point is determined by a momentum model -- such as LOESS -- and a momentum model is inappropriate because public sentiment in fact moves as a random walk, then the projections based on today's momentum-influenced point won't be optimal. They'll likely swing, sometimes high, sometimes low, but not optimal.

I don't actually know if political races are better described by random walk or by momentum. My hunch is that random walk is the better model, but that's just a hunch. It maybe ought to be discernable from the data? Needs some thought....

Anyway. In terms of who's going to win the election it probably makes less than a half points difference. Hell, probably a lot less than a half point's difference. It's a concern for those of us who get our dopamine charge out of finding the best possible mathematical models for the observable world. Nerds unite!

peace,
lilnev

such sweet thunder said...

I'm no statistician. And what makes me so pleased with the gradual updates to the projection method, is that they come in little increments that I can understand (but could not replicate on my own.) It's kind of wonderful reading each day as the your projection slowly becomes refined. Thanks for all of your hard work.

As to Obama in Iowa, I wonder if his appeal there has to do with his Iowa accent. Iowanian(?) has a specific sound that I can recognize, but is much difference than the accents in other Midwestern states. Obama nails it when speaking to Iowa audiences.

lilnev said...

Heh, I think it's just "Iowan". And I don't know an Iowan accent, but I can tell Wisconsin from Minnesota. It's all in the long "O". Ask 'em to say "soda", and if the "O" is really, really long, that's Minnesota.

Juris said...

Kubla,

Nate pointed out on his latest thread that for most states the contrast between states will become starker over time (the reds become redder the blues become bluer) so that the true "toss-up" states will remain the ones that a good model may do better in projecting than one would obtain from naive assumptions or a naive model (e.g., that the 2008 electoral map will look like the 2004 election outcome).

Nate also noted yesterday that the pace of daily state poll production is picking up, and in states with a lot of polls the polling averages are going to converge to the trend adjusted figures AND to the 538 regression -- if he's doing the regression right.

You use the term "prediction." This was never a "prediction" site. To do that, you'd also have to use your intuition or a crystal ball to predict future contingent events in the campaign. Instead this site was and remains a "projection" site, attempting to systematically forecast the implications of currently available, and updated, information on the electoral college vote shares.

The latest set of improvements to the models takes those projections a step further by, in effect, dampening the implications of current popular preferences on future outcomes -- based on past evidence of "mean regression" of the leading candidate.

There was and is nothing magical or prophetic about this system. But it's extremely clever in how it has been built up step by step, so that now it's a more believable system than it was in the beginning. And it's been built up in interaction with his "panel" of experts (us -- his readers).

To my knowledge nobody else is doing it, at least not in real time on an active blog, with updating as each round of new polls becomes available. Political scientists will be doing it about the 2008 election long about 2009 or 2010 -- after the election is over. That's their usual way. One wag once described political science pejoratively as "slow journalism." Everybody knows what happened already and we have all the data to retrofit the model and "postdict" the outcome of the election. It can be a very instructive exercise, but who wants to wait until 2009 or 2010? (I know that several political scientists are reading this blog; no offense intended!)

For sure some wiseguy is going to come along and tell us that if we know the median polling estimate of the vote share in each state using the polls taken in the last week prior to election day, we can almost perfectly predict which candidate will win each state on election day.

Well BFD. Let's all wait til Mickey Mantle's birthday to pay serious attention to the polls. But that is merely a prediction exercise, and waiting til October 20th wouldn't be any fun and would provide no understanding of how the process unfolded. You don't have to be an astrophysicist to carry out such an exercise.

Here we have the adventure (IMO) of watching that little red LOESS line wriggle onward (discounted in the little yellow line), but as smart as Nate is he (and his model) doesn't know where those lines are going next week, let alone in November.

What he's doing mainly is trying to efficiently summarize and update the likely implications of the best currently available information for where each state's voters will end up in November -- and, in some cases (if you've really been paying attention the last week) anticipating what the next state polls are likely to reveal -- or, if there are no such polls, come up with a good estimate of what the polls would show if they were taken. And he's doing it in real time.

Frank from Germany said...

@ lilnev: "Ask 'em to say "soda", and if the "O" is really, really long, that's Minnesota." If that's true, then many Minnesotians (or whatever they call themselves) must have roots in Hamburg and Schleswig-Holstein. And I always thought, they all ended up in Chicago ...

Juris said...

But do they say "soda" or "pop" or "coke" in Minnesota? Check on this site, and also be sure to click on the detailed map. I grew up saying "sodapop." Must be a hybrid.

http://popvssoda.com:2998/

IA Staffer said...

The IA difference for McCain/Obama is that Obama built a HUGE organization in Iowa that never stopped working because it was built so heavily in local activists and electeds, whereas McCain skipped the state and never built a structure here.

Obama still has gigantic name ID here. It is a safe blue state for him.

Anonymous said...

kubla000, you're ranting against the second law of thermodynamics. Information cannot be conjured out of thin air. All Nate does is to extract as much information as possible, aiming to contaminate it, in the process, as little as possible. And he's really, really good about that.
The diminishing differences between scores are a mark of improved information, in fact, as more recent, reputable polling is available.

Mark Nelson said...

So it looks like you've basically narrowed the bandwidth in your estimator, but you don't mention how you chose either the older one (with a higher degree of smoothing) or the new one (with a lower degree of smoothing). In my field at least (machine learning), the standard way of doing that these days is with leave-one-out cross-validation. Did you just eyeball it or something?

Matthew H said...

Much, much improved IMHO. The red states look red, the blue states look blue, and you don't contradict states where you have recent polls. Since what I mainly want is an accurate poll regression and trends, your site is perfect for me.

Now my question is, can we have top 10 list of states most affected and least affected by national trends?

Allen said...

In my experience (http://election-projection.net), 10K iterations gives you a different looking electoral vote distribution every time. To get a stable (and therefore accurate) curve, 500K iterations looks like a safe number.

Modeler said...

Obsessed,

Here's a quick explanation of a random walk:

Imagine a bug on a string. The bug takes steps that are 1mm long. The probability that the bug takes a step to the right is p, and the probability that the bug takes a step to the left is (1-p). This bug is making a "random walk" in 1 dimension. The path of the bug along the line will look a bit like a drunk stumbling around. After t steps, we don't know exactly where the bug will be, but we do know that probability of the bug being at a given location.

In fact, we can state that after t steps, the expected (average) displacement of the bug is, in mm:

pt - (1-p)t.
Where negative numbers indicate a net leftward movement, and positive numbers indicate rightward movement.

The variance of the distribution of positions after time t is given by:

4*t*p*(1-p)

So the standard deviation of the distribution is proportional to the square root of t.

This problem can be extended to higher dimensions and more complicated rules regarding steps. The math gets more complicated, but the basic idea is the same. Random walks are used to model a wide range of processes, from financial markets to atomic diffusion.

A simple theory of voter preferences would be that it evolves over time as a random walk. That is, each day the probability that a voter will vote for Obama increases with probability p or decreases with probability (1-p). In reality the model would be a little more complex than this, but the result is that the prediction error due to evolving voter preferences should grow over time as roughly sqrt(t), which is what Nate empirically observes.

信次 said...

情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,

freefun0616 said...

酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店經紀,
酒店打工經紀,
制服酒店工作,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
酒店經紀,

,

nike said...

Truly a nice blog! Thanks for your great work! Wish you a nice day!
cheap puma shoes
cheap sport shoes
ed hardy ugg boots
ed hardy love kills slowly boots
ed hardy love kills slowly
discount puma shoes
nike shox torch
nike tn dollar
cheap nike shox
ed hardy polo shirts
ed hardy love kills slowly shoes
ed hardy wear
cheap nike shox shoes
nike shox r4
ed hardy love kills slowly shirts
ed hardy trousers
ed hardy jackets
puma mens shoes
cheap nike max
discount nike shox
ed hardy t shirts sale
ed hardy womens t shirts
ed hardy boots
cheap puma ferrari shoes
nike mens shoes
nike shox nz
ed hardy womens clothes
ed hardy womens shirts
ed hardy clothes
discount nike running shoes
discount nike shoes
nike shox shoes
ed hardy outerwear
ed hardy womens
ed hardy womens jeans

nike said...

cheap nike shoes
nike sports shoes
puma running shoes
puma sneakers
ed hardy bags
ed hardy winter boots
ed hardy t shirts
nike shoes kids
nike women shoes
nike running shoes
ed hardy womens shoes
ed hardy t shirts for men
ed hardy mens jeans
wholesale nike shoes
nike shoes
nike tn dollar
nike air max 90
nike air max 2009
nike air max 2010
nike air max tn
puma cat
ed hardy mens shoes
ed hardy womens hoodies
ed hardy mens tees
puma shoes
ed hardy clothing us
ed hardy clothing
ed hardy outerwear
ed hardy t shirts
ed hardy boots
ed hardy hoodies
cheap ed hardy
ed hardy clothes
cheap ed hardy clothing
ed hardy wholesale
ed hardy men’s
ed hardy women’s
ed hardy kid’s