6.14.2008

We know more than we think (Big Change #2)

The other major change to our methodology (which I am surprised nobody guessed in the teaser thread) is that we are now making adjustments to the results of all states based on a time trend.

One of the problems with our previous way of doing things is that polling data tends to roll in at different times in different states. Both state and national polls conducted since the conclusion of the Democratic nomination process have reflected a bounce of a few points for Barack Obama. For example, we know that Barack Obama has experienced a bounce in his polling results in states like Wisconsin, Michigan and New Jersey, as well as in both the Rasmussen and Gallup national tracking polls. It would be naive to assume that Obama won't also experience a bounce in other states like Pennsylvania and Ohio where new polling data has yet to come out. However, we've had no way to account for these changes in states where the polling data is not fresh.

Our objective, then, is to infer what is likely to happen in states where we don't have fresh polling data based on those states where we do. In order to make such an inference, I apply a four-step process. A version of this process was suggested to be by Professor Robert Erikson of Columbia University, who has spent his lifetime studying polling and public opinion, and who is also a family friend.

Step 1: All polls are placed into groups based on (i) the week of the election; and (ii) the state-pollster unit. A state-pollster unit is a combination of a particular state and a particular pollster; for example "Alabama-SurveyUSA" or "New York-Quinnipiac". The current week is defined as having begun seven days before the current date, with weeks progressing backward from there to the start of the calendar year 2008. One very important note: we treat national polls as a "state". For example, there are units for "USA-Rasmussen Tracker" and "USA-Gallup Tracker". One of the most useful elements of national polls, and particularly national tracking polls, is that they provide a robust baseline for measuring changes in candidate support. We do not include national polls directly in our averages. We do use them, however, to help infer trends, which in turn can inform our state-by-state projections.

Step 2: We run a linear regression with a large number of dummy variables. Specifically, we include one dummy variable for each week, and one dummy variable for each state-pollster unit. The coefficients of the weekly dummy variables give us an inkling of a time trend. Specifically, the time trend looks like this:



Let me explain exactly what is going on here. Suppose that in that in Week 15, Rasmussen shows Barack Obama 6 points ahead in Minnesota. Then, in Week 22, it shows him 9 points ahead in Minnesota. This is a piece of information implying that Obama's standing was 3 points better in Week 22 than in Week 15. If we apply this process to all state-pollster units, we get quite a lot of information about in which way the polls are changing. That's all that this process is doing. It's taking the changes that we see in each poll where we have a baseline for comparison, and inferring an overall time trend based on those changes.

Step 3: The time trend is smoothed by means of a LOESS regression. You probably don't think you know what a LOESS regression is, but if you've ever been over to Pollster.com, you have seen one. A LOESS regression is way to create smooth curves through time series data. In our case, that curve looks like this:



When running a LOESS regression, one may choose a "smoothing parameter" that determines how sensitive the regression line is to changes in the data. I use a fairly conservative smoothing parameter, tending toward a smoother rather than a jerkier curve. Nevertheless, we can make out a few fairly clear trends. Obama's numbers surged in February, when he was winning one primary after another. They slumped in March and early April, as stories like bittergate and Jeremiah Wright dominated the landscape. They have since been gradually improving, but particularly so in the last two weeks since he wrapped up the nomination.

Step 4: Polls from previous weeks are adjusted to match the LOESS estimate from the current week. For example, our LOESS regression line tells us that an average poll in the current week has been about 2.5 points stronger for Barack Obama than a poll in the week ending 5/17. Thus, the Quinnipiac poll of Florida taken on 5/17, which showed John McCain ahead by 4 points, is treated as though it had shown McCain ahead by 1.5 points (i.e. 2.5 points better for Obama). The idea, simply put, is to make all old data match the current polling landscape.

* * *

From there, everything proceeds as it always has. We still run a demographic regression, although it is based on the trend-adjusted polls rather than the original ones. (Also, I am now referring to our result in each state as a "projection" rather than an "average", as that nomenclature is more consistent with our process.

This adjustment presently results in an increase of about 2 points in Barack Obama's projected popular vote margin. Because a large number of states in this election are very close, this results in a somewhat dramatic-seeming change in Obama's win percentage and electoral vote projection. Interestingly, Obama's current win percentage of 64.7 percent almost exactly matches the price of Democratic contracts on Intrade, which also has the Democrats with a 64 percent chance of winning the election.

169 comments

Brandon said...

I don't think your statistics should be based on assumptions though. For example, Obama lost 6 points in his Rasmussen poll in Oregon after he locked up the nomination.

Slack said...

What establishes the 0/0 baseline in the super tracker? An average of all polls, or something like that?

Nate said...

He did lose 6 points in Rasmussen's Oregon poll. And that result is accounted for in our extraction of his time trend.

Anonymous said...

Very interesting - I looked at this for the first time since yesterday and wondered what the hell had happened.

If you get a chance, one thing that would be interesting to see is how this new methodology would affect the old Clinton-McCain stats. I ask because it seems to me (not that I know a lot about this) that none of this inherently helps the Democrat, and it would say something about the relative chances both have vs McCain (for people still arguing about it, including myself).

Benjamin Johnstone-Anderson said...

Nate,

Interesting changes, although as a purist, I almost wish there were a "classical version." Any way to account for primary-season related bumps, clearly visible in OR & IN polling (in different ways)? Or are you just waiting for those to "roll off"?

Anonymous said...

Brandon, from what I understand, I don't think that's quite what Nate is doing.

He's basically trying to account for periods like the middle of Wrightgate when Obama's numbers decreased dramatically. The goal is for all polling to correlate to the current environment. If there was another crisis for Obama's campaign, the model would reflect THAT environment as well.

All, in all, the change IS rather dramatic. I will be curious to see if future state polls reflect the map as Nate currently has it.

Anonymous said...

Could you maybe add a column for the LOESS adjustment factor for each poll in the state-by-state polling detail?

Preyanka said...

Nate, this might be a stupid question for people who are much more knowledgeable about what you're doing, but for someone who just happens upon your site and doesn't understand the methodology, are the number at the upper left (piecharts) a snapshot of the current race, or are they predicting the outcome?

Thanks:)

Silifi said...

I liked the old methodology better. This just ruins the fact that it is a state-by-state analysis.

How do you know that Obama's uptrend is going to be the same in all states? When we move forward into a general election mode, are you telling me that if Obama campaigned really hard in one state, that would somehow uptick his ratings in other states?

For example, if Obama spent a lot of time in Virginia, we would expect that he goes up in the Virginia polls. However, under this new formulae, you're assuming that because he went up in Virginia, he's also going to go up in Ohio. When, in fact, the voters in Ohio don't care that he's campaigned in Virginia.

A state trend is not a national trend, and it shouldn't be treated like one.

I would prefer it if you returned to the old projection formula.

Icyclemort said...

Hi Nate,

You use both state and national polls to infer the time trend. (placing national polls as "their own state")

What you didn't elaborate on is how much weight you assign state vs national polls for your time trend?


-----------

It is also likely that states don't react identically to "mood changes" in the whole country.

a) This does introduce another element of uncertainty for individual state projections.

b) Do you plan to adjust for that? E.g. observing during the GE that Florida is not nearly as volatile as (e.g.) Ohio or Virginia?


Regards,
Icyclemort

Anonymous said...

I tend to agree with Silifi, you are assuming that all states are equal in demographics, exposure to ads, local races, etc. In making the projections more sophisticated, you risk actually oversimplifying the model.

Josh said...

Great Work! Do you think that you will put others in you data? Barr could change the results in state like AK, MT, ND, and SD.

Anonymous said...

If Silifi's objection is valid (I myself got lost in the ins and outs of the model, so I can't say for sure) then the globalizing of state-specific trends robs the model of much of its resolution.

james said...

Hmm, interesting. I think this is a good change. There just aren't enough polls for a lot of states now. However, closer to the elction, that might change. If it does, will you drop factoring in the Super Tracker?

Slack said...

Echoing thoughts of others, I think this model is a simplification, but one that's very strongly shown in the results.

If Obama does something that's perceived as rejecting African-Americans over the coming months, your model will show the hit to his numbers as a small one in all states, when in fact the change will vary tremendously from state to state.

In other words, the changes you model have reasons behind them, and those reasons cannot be equally applied nationally.

I think the data you have is useful, but I'm not sure that messing with poll results from a general poll analysis is wise/honest, especially for newcomers to the site.

Joel said...

I agree with others that some trends won't play out the same in all states.

Also, assuming the pollsters are just lazy, but will probably get around to polling PA, FL, CO, NM, NV, etc... why not just wait for those polls and keep each state its own block? It's nice to be able to look at a state by itself, and say "this is safe" or "this is still within a close margin" on its own ground.

Even states like ND, SD, MT, AK are sure to get some polling if Obama can make them interesting, and based on all your other data, wouldn't each states individual and very recent polling data be a better prediction? Just a thought.

We have five months until the election. No reason to get ahead of ourselves.

DU said...

Have you done any regression testing with these features? Gone back and retrodicted past elections (either from the 2008 cycle or previous) to see how well you do? Because these numbers smell terrible but I'm open to being convinced.

Mac Z said...

If I were to suggest one change in this methodology, it would be to make states effected more by trends in states demographically similar to them and less by state demographically dis-similar to them. For instance, a polling uptick in Minnesota would suggest a polling uptick in Iowa, but not in Texas. Imagine if McCain started campaigning really hard to the conservative core of his party and then a bunch of polls in places like Utah, South Dakota, and Oklahoma came out showing him increasing his performance in those states by 20 points (not unreasonable, since he's underperforming in all those states). The process of becoming more palettable to those states would likely cost him some points in Oregon, Wisconsin, and Iowa, but your model would give him a bump in those states. That's why similarity should be taken into account.

Icyclemort said...

I am not really sure about this change.

There is something to be said for both sides.

A) A national trend does indicate that one candidate is rising/falling in the whole country, therefore also in the individual states.

B) You cannot predict which states are more or less effected by a national trend. (or put in another way: which statewide trends contribute the most/ the least to the national trend)

->
I'd be very careful about the whole thing. I might reduce the effect somewhat ("flattening the time trend")

Icyclemort said...

I am not really sure about this change.

There is something to be said for both sides.

A) A national trend does indicate that one candidate is rising/falling in the whole country, therefore also in the individual states.

B) You cannot predict which states are more or less effected by a national trend. (or put in another way: which statewide trends contribute the most/ the least to the national trend)

->
I'd be very careful about the whole thing. I might reduce the effect somewhat ("flattening the time trend")

Nate said...

Guys,

If you have polls in 20 states showing a 5-point bounce for Obama, what is the best default assumption for what happens in the 21st state? That his numbers improve by 5 points, or that they won't change?

I think the answer to that question is fairly obvious.

For people concerned about the changes being driven by demographic conditions that are peculiar to a particular state: first of all, an improvement in just one state is not going to change the overall tracking tend very much at all. We're getting a dozen or a couple dozen polls each week from which we can infer trends (including both state and national polls), and we already hedge against sharp changes by smoothing our time trend with the LOESS regression. And secondly, it should be possible to run a regression on the residuals on the LOESS curve to see what demogaphics are driving those changes. That is something for version 1.1.

Anonymous said...

wow. for a layperson, this is all clear as mud. and here i was worried that this story at tpm has gallup showing BO is losing his bounce:

http://tpmelectioncentral.talkingpointsmemo.com/2008/06/gallup_obamas_lead_narrowing.php

But i guess that doesn't matter, unless it does. :)

Hllray Clonton said...

I'm dubious about this new model, for a few reasons. But the most glaring thing about this seems to be that the regression is done based on absolute changes in support, rather than proportional to the demographics in the state in which the poll was taken.

For example: Let's say we see an average of about a 6% swing towards Obama. Most of that swing will come from the moderate-to-liberal portion of the spectrum. But what about states in which thatthere just aren't very many moderates/liberals?

A similar problem exists with regard to levels of Clinton support - we should expect a bigger bounce from states where Clinton had a strong support base, as once-angry voters follow Clinton's lead abd back Obama.

Anyway - I'd be much more comfortable with this new methodology if it included some form of proportionality.

Perhaps change in % for Obama relative to his previous performance? This way, a jump from 40% in week 7to 44% in week 8 for Obama in one state Could be interpreted as a +10% (=4/40)relative change, and polls could be adjusted to account for Obama having 110% the support he previously had. Then, a 4th week poll by another pollster showing Obama at 50% could be adjusted to show 55%.

Silifi said...

mac z:

The problem with that is that it'd be hard to scientifically document that sort of thing. You'd be making way too many assumptions by just basing things of regions, or even based on partisan alignment.

If McCain campaigns hard to conservatives, you're likely to see an uptick in a lot of places. You might see an uptick in Indiana, for example, which is typically conservative. However, in the same region, Ohio, you wouldn't see the same thing, because you're more likely to alienate moderates.

And the only way you can discriminate the data based on stuff like partisan alignment or regional affinity is if you try to explain why there's an uptick: which is not scientific. It would be intellectually dishonest, in this site's setting, to include that kind of subjective analysis in the projections.

Silifi said...

Nate:

Yes, if you see that kind of broad trend, you could reliably guess the same thing.

What I'm concerned about is the fact that this new formula seems to also take into account a single state's uptick.

A hypothetical example: say we see, one day, that Virginia gets some 20 point boost in the polls. (extreme and unlikely, I know, but bear with me) In the meantime, you have 10 other states that show litte or no change.

If I'm understanding your formula correctly, this would result in something like a +2% bump in national polls. Even though the real trend was only in Virginia, you're still bumping other states based on something that has nothing to do with their populations.

Modeler said...

Nate,

I think you have this backwards in terms of your regression. From what I understand, you currently do the following:

1) Use recent top-line polling results to adjust the top-line results of previous polls.
2) Using the adjusted polls to determine your regression coefficients.

Instead, you should:

1) Weight recent polls more heavily than previous polls.
2) Use the weighted polls to model the time-evolution of regression coefficients.
3) Smooth the time-evolution of coefficients.
4) Adjust previous polls to reflect the changes in your regression coefficients.

For example, say that Obama starts doing very well among evangelicals, and this is reflected in recent polls, giving him a +3 boost. Your current method, as I understand it, would give him a approximately a +3 boost across the board. So increased support among evangelicals will help him just as much in Nevada as in Georgia. Hmmm...

Instead, you should re-run your regression with more weight on the most recent polls, and your regression should show a higher correlation between percentage of evangelicals and Obama support. You can then apply the new coefficients to your model on a state-by-state basis, and see how Obama's performance has changed compared to the old coefficients. You can than adjust all the polls in the state based on that change.

I think the general idea of modeling the time-evolution of regression coefficients is very appealing. You could use it to get a more refined model of future uncertainty.

In terms of the regression, treating the USA as another state is the right thing to do. But make sure your variables are defined so that they scale linearly with respect to population. Otherwise your model is logically inconsistent.

Bob said...

I know you strive to perfect your methodology but as a casual observer of the site it looks pretty ridiculous when you go in and tinker with the rules and then suddenly Obama's win percentage has jacked up 10 points. Nothing fundamental changed between yesterday and today except you changed the methodology.

I also know you're a numbers purist but think of a hypothetical 2nd trip to CNN on Monday and they asked you why your predictions changed so dramatically on the site despite a lack of fundamental change in the race. I don't think even John King himself would have a clue what you'd be saying.

Matthew H said...

But...but...but...this doesn't make any sense.

Nevada has had two polls in the last 4 months. McCain was ahead by 5 and 6.

And yet, you have it as pale blue!

Obama has had a national bounce since 5/20, but it's hardly 6 points. I'd wager good money that if they had another state poll today, that Obama wouldn't win it. So why is he projected as +2.4?

tunmel said...

silifi et al,

as I understand it, the new model only affects the polls one week or older. it treats newer polls the same way that the prior model did, resulting in a national projection that's more time sensitive, instead of waiting for changes to take effect state by state as the numbers come in. When the state changes do come in, they "correct" the model. So in the final days leading up to the election, with sufficiently heavy polling, there should be virtually no difference in the prediction between this new model and the old one. All this model does is provide more time sensitive information in the mean time.

Avo said...

I'm in the "unconvinced" camp. All this massaging of state results using national and other-state data might work, but shouldn't you back-test it first?

And what's that weird bump from 0 to around 60 electoral votes for Obama?

Modeler said...

Nate,

I just saw your comment above. The problem is that in your current approach, you assume there are only universal time trends. If we were only interested in universal trends, the demographic model would be completely unnecessary.

Even worse, taking universal trends, applying them to state-specific data, and then updating the regression model only makes the regression model worse. You are actually introducing noise into the model.

And hllray makes a great point about weighting. If you track the time trends of the regression coefficients, the weighting is automatic.

Ben C said...

This new model is much better. The concerns are misplaced - if Obama spends alot of time in Virginia and the polling goes up only there, well that will have a negligible affect on the other states due to the 50 state + national regression.

I have one recommendation, however. You should somehow link up nearby states (or demographically similar states) in your regression.

Anonymous said...

I like your site, so please consider my constructive criticism.

My friend you can not "create" data by applying nationwide trends to update states in the interim of their polls (well of course you may do whatever you please, but ... wise ...).

Unfortunately we are living in an increasingly bi-polar society, one in which most states fall on one side or the other (if you don't like it then get the hell out of here, etc...).

Hence while a trend in the nation obviously indicates that many states have gone further in one direction, other states on the opposite side may react with anger and be more fervent in their position.

This is dividing our country like an ax.


RxRxRxRx

Anonymous said...

I like your site, so please consider my constructive criticism.

My friend you can not "create" data by applying nationwide trends to update states in the interim of their polls (well of course you may do whatever you please, but ... wise ...).

Unfortunately we are living in an increasingly bi-polar society, one in which most states fall on one side or the other (if you don't like it then get the hell out of here, etc...).

Hence while a trend in the nation obviously indicates that many states have gone further in one direction, other states on the opposite side may react with anger and be more fervent in their position.

This is dividing our country like an ax.


RxRxRxRx

Modeler said...

Ben,

The way to link up demographically similar states is to simply use the regression model. ;-)

I do like the idea of experimenting with "regional" variables in the model though. What if, for example, there were a way to physically measure average distance from Appalachia? This might help account for regional trends that are not captured in available demographic data.

Peter B Fitzgerald said...

There are clear reasons to include the time-scale & to include national polls to infer changes at the state level. The net effect of this particular change should be that the accuracy of your national projections (especially popular vote) increases relative to the accuracy of state specific information. This naturally follows that you treat all states equal (w/o regard to differing demographics) when applying the effects of the time scale & national polling.

But I would consider the latter's accuracy much more important for two reasons. 1) No matter how sophisticated the national projection, too many unforeseen events can come between June & November, causing the national projection to be of little value at this stage. 2) Only about a dozen states are likely swing states--the electoral system places outsize importance on state specific demographic & ideological trends. E.g, the data regarding New York and Utah is largely irrelevant, while the data regarding Missouri and Nevada is of exceptional importance to those currently involved in campaigns; thus by weighting your projections of relevant states with data from irrelevant states, you should be decreasing the value of your projections.

Logic Penalty Box said...

"Interestingly, Obama's current win percentage of 64.7 percent almost exactly matches the price of Democratic contracts on Intrade, which also has the Democrats with a 64 percent chance of winning the election."

Is this sarcasm from a statistics wonk? Why should this be surprising? Normalization suggests that chaos of the Intrade market should indeed produce a result similar to LOESS regression result, right? :)

LPB

APoxOnBoth said...

What might be interesting would be to attempt some "postdiction" of the Intrade market with your new methods. If your methods matched the historical Intrade market, that would be an indirect indication you're on the right track, since Intrade is a completely different methodology.

Mike H in Cali said...

The asumption at the beginning of this post that a 5-point national bounce is basically uniform throughout the states not only undermines the whole point of a website like this but also is contradicted by this week's state numbers.

Look at this site's chart of polls for NJ and OR. Obama got no bounce at all in those states this week and is doing less good than he was in recent months. MN also showed no bounce. Meanwhile, Obama did show some kind of bounce in WI depending on what poll you focus on and also went up to lock status in WA.

So, I think the new model is based on a false premise. The truth is that different states have different size moves in response to events.

Also, much of Obama's 4-5 point national lead right now comes from "wasted" votes in CA, NY, and IL -- 3 of the 5 largest states. McCain's margins in the other 2 (TX and FL) are smaller.

Anonymous said...

I would love to see this method applied to historical trend data from previous elections. In order to be trusted, we need to see that it can be applied. Otherwise, it seems like too much projection based on something that can change with each news cycle.

Anonymous said...

Nate,

The problem here is your changes assume you "know the game." You may know baseball but you obviously don't undestand politics. Sorry, to be blunt but your lack of political accumen is taking your "science" off course.

You said:
If you have polls in 20 states showing a 5-point bounce for Obama, what is the best default assumption for what happens in the 21st state? That his numbers improve by 5 points, or that they won't change?

I think the answer to that question is fairly obvious."

Well, no it isn't obvious. All elections are local. Floods in Iowa may have a different impact on the race than a terrorist alert in New York.

You also are trying to account for the long term now but have no way to predict either how many bounces each candidate will get or how long they might last.

Finally, an Obama bounce in a state he is already leading or where he will win easily may ary significantly both in depth and length than a state where he and McCain are close. Your method can't predict that, or the other changes, so it is useless.

I will choose to believe you just made an error in judgement rather than believe, as one other has suggested, that you are using this to gin the numbers for your candidate. Still, this is why political pollsters stay nuetral, and why even you suggested internal polls are unusable.

George W said...

Nate, it seems to me that the bounce is largely attributable to a "unity" effect of Dems lining up behind Obama. So the bounce is going to be less in states with lower Dem self-id than in states with larger Dem self-ID. Wouldn't this trend play itself out and be predictable as well? In other words, regress for Dem ID as a variable as well.

Modeler said...

Hi Nate,

Sorry for the flood of posts. My mind is racing around this right now. I hope it's clear that most posters are not trying to pile on; rather we are trying to make this site as good as it can be. A lot of us love this place, check it every day, and we just want to make sure we're comfortable with what we're seeing. It's more fun that way.

I also want to amend my post above. (9:21 pm). I don't think you should change the regression model coefficients to update prior polls in the state. A better approach is to just make your prediction for each state based on a weighted average of the regression model and polls. The longer it has been since we've seen a poll, the more the prediction will be based on the regression model, which in turn is heavily dependent on recent polls in other states.

If you want to consider trends within a giving pollster, just include regression variables for each of the most common polls.

On a similar note, it would be cool if you could show the uncertainty for each state. You could then suggest to your new friends at Rasmussen that they consider polling states with a high level of uncertainty. :-)

Finally, to help new readers on this site, it would be cool if you could publish two maps: One that shows the expected distribution if the election were held today, and another that shows the predicted distribution for the real election.

Modeler said...

Sorry, the above post should read:

I don't think you should use the change in the regression model coefficients to update prior polls in the state.

Mike (sweetjazz3) said...

Nste,

An issue that came to my mind when I saw that your projection is more one-sided is the question of how you are computing win percentage. When you do your simulations, are you assuming all states are independent? It is pretty intuitively obvious that the probability that Obama wins MS given he wins LA is much higher than the probability that Obama wins MS given that he loses LA. Right now you project Obama wins MS 11% of the time and wins LA 18% of the time. If the outcomes in the two states were independent, we could project that Obama will win one of MS or LA with a probability of 1 - (1 - 0.11)(1 - 0.18) = 0.27. But in reality, even if your state win projections are accurate, if Obama loses Louisiana he will almost certainly lose Mississippi. So the chance he wins one of MS or LA is much closer to 0.18.

If you treat the states as independent, you will overestimate the winning chances of whoever is currently ahead in the popular vote. The relatively rare chances that Obama wins strong Republican states like TX, MS, LA, GA, etc. are not independent, so in most of your simulations he should lose all of these, while in some rare simulations he should win multiple (possibly) all of these, corresponding to the times when his national strength is stronger than what the polls currently indicate and he wins in an electoral landslide a la 1984.

I suspect you've already taken care of this, but if not, it is essential that you do so if you want a meaningful prediction of the electoral college outcome.

FWIW, as far as your second methodology change goes, I think it will more accurately account for the current public mood, but because public mood is pretty fickle, it is going to result in your projections becoming much more volatile. This is probably right, as your discussion in your first methodology post indicates.

lilnev said...

At Modeler: Developing a separate timeline for each demographic variable may look better in theory, but in practice the data wouldn't support it. You're asking for a regression model containing 16 (dem. factors) x 20 (timepoints) = 320 factors -- the data available would be seriously overfit.

At Nate: Overall I like the idea of making the model time-sensitive, and this seems like a logical approach. I think it's worth trying to explicitly describe the model's purpose and structure. The state numbers look like, "If the election were held today, this is our best estimate of the likely outcome in this state, and our best estimate of the probability of Obama winning this state." For example, if someone offered you an over-under bet that McCain would win AZ by 9.4, you would be indifferent as to which side of the bet you took. Likewise if he offered 5:1 against Obama winning AZ.

As for the EV distribution and overall win %, this seems to have a different interpretation: Our current best guess of the outcome in November. Because the "noise" added to each of the 10000 simulation runs is based on the square root of the # of days until then.

Do I have those right?

Also, this form of the model is certainly more responsive to broad changes in public opinion (a failure of the old model, which wouldn't catch up to such changes until months later). I worry that this may come at the cost of volativity, however. Could we see confidence intervals on the weekly Super Tracker points, and/or on the LOESS fit?

As always, thanks for the great site.

Anonymous said...

Hi, everyone!
I understand the concern a lot of people seem to have about Nate's new approach assuming that national trends are happening uniformly across the country -- obviously, they're not, and to that extent this model will get things wrong sometimes.
... but the previous model assumed that national trends made no difference at all to the results in any particular state.
Look at PA, for example. There are 18 polls listed there, with 10 from between TX/OH on March 4 and the PA primary in late April, then 4 in early May, and nothing since May 21, almost a month ago. So there's a lot of evidence (not surprisingly) about how Pennsylvanians felt in the six weeks before their primary, and nothing about how they've felt in the past three weeks.
That would be misleading anyway, but of course those six weeks were also relatively bad weeks for Obama, thanks to the Rev. Wright and so on. And, indeed 6 of the 10 polls that had McCain winning PA were taken in that time.
Nate's old model was gradually discounting the value of those polls over time, but it couldn't do anything about the fact that there aren't any recent polls in PA. The new model may not be the perfect way of compensating for the lack of recent polls, but at least it does something -- and as soon as more recent polls come along in PA (and other close states) those will weigh heavily in the results.
Sorry that I went on so much for a first post -- I just think that the other point of view should be heard on Nate's changes!

Ray said...

Hi Nate.

I think the change is for the best. Some people just don't understand genius when they see it:) In fact I was thinking of suggesting something similar myself.

One problem though. Why don't the scenarios match the data on the left. eg ohio + Obama wins/loses ohio - McCain Wins Loses ohio does not equal Obama wins.

Also, is there really a 15% or greater chance that different candidates will win the electoral and popular vote? iirc that has only happened twice in 50+ elections. Is this year really so different?

Anonymous said...

Why don't you give some stats on how well the model does at predicting results of polls?

Patrick said...

Nate, I humbly agree with those who have said that national polling is weighted too much. It's just too chaotic. Those votes could have been picked up anywhere, and there is no reason to assume that they were picked up everywhere.

You're trying to augment the relevant data of statewide polling with the perhaps illuminating but in the end irrelevant national polling data, and I think that in the end it just decreases the overall integrity.

The problem that you're trying to solve is old data getting out of date, but there already is a system to make data last longer: the regression analysis, which projects what theoretical new polling data would be. Instead of messing with old data, mess with the regression, which will have a side effect of only drastically affecting states that don't have any new data.

My suggestion (borrowed from someone else) would be treating the nation as a state that you can run regression on, and then include it in the analysis and increase the weight of the 538 regression accordingly. That'll make sure that at least the regression is up to date with national polling, without tainting the actual state polling data.

Charles said...

I was under the impression that as new polling data came in for some states, the 538 Regression adjusted in other states based on their demographics. I thought that this had been seen where a poll in Washington would come out and BO's win percentage in Alaska would increase (or something like that). So it seems that this demographic adjustment is already happening, and that this is supposed to represent a national movement instead of a demographic movement.

The only thing that I wonder might be worth testing is to model some type of coefficient on how much a state follows this movement. I.e. Maybe Ohio generally follows the movement well so it has a coefficient near 1, but some states might be very sluggish and they might have low coefficients. But, if it wasn't working out that most states moved fairly well with this, I'd imagine it probably would have never been implemented.

I just worry about the sudden jump in BO's numbers, I don't want people to become complacent :)

Anonymous said...

Any suggestion that Nate ginned the model to favor a particular candidate is absurd on its face and ought to be dismissed as such. In the end the test of his model is going to be whether he got the election outcome right. Except for the possibility that the actual election results might be falsified or corrupted in some way (not a totally outlandish possibility given recent history), the model is going to be proved right or wrong based on reality on the ground.

Similarly, I think the arguments that Nate shouldn't estimate the vote preferences in states where the polling data are thin don't hold up. After all, the previous approach did exactly that by, in effect, reverting back to the 2004 vote outcome where the recent polling data was thin or unreliable. However this version of the model has the advantage of taking into account information about national trends in place of relying on what is, in effect, just "historical" data (including outdated polls in a given state).

On the other hand, I think some of the suggestions here may be helpful in improving the modeling, in particular those by Modeler. And I like the idea of having two versions of your projections, one based on your old methodology and one based on the new. They ought to converge toward one another as election day approaches.

Nate, I think that after you've given this some more thought to revisions, you might want to have a "chat" or equivalent discussion session for a couple of hours. I don't think you have true "chat" facility on "Blogger," but perhaps you can use your baseball site to run such a chat.

Modeler said...

lilnev,

I think you may have misunderstood my post. The regression model would still only be based on 16 factors. However, as new polls get added, the regression model would be updated. Each update will still only include 16 variables. The change in the model with respect to time gives you an idea of the time trends of the coefficients.

Once we have a sense of how much each of the coefficients fluctuates with respect to time, this information can be used for more accurate simulations of future events.

If you look at my 10:37 post, I've decided that it doesn't make sense to retroactively adjust previous polls. Simply updating the regression model and decreasing the weights of old polls should suffice.

Joel said...

Nate,

The strange thing will all these numbers is that in most cases they correspond to my gut feelings, the completely non-scientific observations I've made. However, I liked having my feelings checked by real, hard data.

I was simply waiting for new polling to update the data, which I still think is a safer bet. As new polls came out, your projections would have been fine. Why extrapolate data when you don't need to?

If the new polls coincide with your predictions, then I suppose there isn't much of a problem, but then isn't it redundant?

Even if you are right, I don't see a single reason to change the data at this point in time. True, you might be understating Obama's popularity but that would soon correct itself when new polls are done.

I still beg the question, "why not wait until the pollsters do more polls?" That doesn't seem to harm your projections in the least.

If more polls are done contradicting some of this new data, please reconsider and revert to version 1.0

Ray said...

Nevermind. I just realized why both my objections were wrong:

The projections for Ohio are conditional, not joint.

Popular vote is the expected popular vote (if the election were held today -- or perhaps it doesn't matter since you're modeling national trends as a random walk meaning polls are as likely to go up as down.) NOT the probability that each candidate wins the popular vote. I would however argue that you are overestimating both candidate's vote shares by neglecting third parties -- but then it's hard to make a pie chart if you're just giving a margin of victory as your number.

Also can you retroactively compute winning percentage and expected electoral vote given the new methodology, and graph them?

Thanks
and keep up the good work.

asmodeus said...

Much better now Clinton is off my screen.

obsessed said...

One virtue of the new system, if I'm understanding it correctly, is that since it tangentially makes use of the daily tracking polls, all the charts will change daily even if there are no new state-specific polls.

Right?

Better for addicts, and, like the Rasmussen Approval poll (our chimp is down to an all-time low of 31!), even you disagree with absolute weighting, you can still follow the trend.

Slack said...

After reading through the thread, I suppose I'll rephrase my critique as well.

Stats only go so far as the information they communicate. I worry that presenting edited poll results skews the nature of polls, in trying to hide trends within a state.

If this system worked perfectly, then all polls in an ideal state would match. This is all well and good, but it doesn't mimic the actual results, which do display trends.

You may want, instead, to use the same regression to have a "poll projection" to go along with your "538 regression" in weighting out a state. That would present your analysis more clearly as analysis rather than straight data.

I also favor keeping polls as is because of the staying factor - we saw in NH that although Senator Obama led hugely post-Iowa, those gains were not "solid," and his support quickly evaporated on election day.

A poll projection from the super tracker combined with the 538 regression would provide two different and useful methods of analysis.

Thoughts from the gallery?

Anonymous said...

As I understand it, if the trendline is showing a +2 Obama bounce, the past polls of all states will get a +2 bounce?

It makes more sense to me to give a bounce based on the percentage increase relative to his previous share of the vote. So if a trendline is showing a jump from 49 to 51 for Obama for a +2 bounce, a state that has him at 49 would get +2, but a state that has him at 33 would get +1.33. Or something like that to account for the fact that +2 in New York (or nationally) doesn't equate to a +2 in Alabama.

Or maybe that is what the new method is doing? Any insight or comment would be appreciated.

Anonymous said...

Most of us - me included - are upset with your homogeneous application of the time trend.

I think you also lose something through the application of the time trend itself. Your previous model had the virtue of sobriety: it rejected the lurid headlines. A series of Obama victories lifting his support? A Wright controvery? Just you wait - your model implicitly said - things will settle down to where they were, more or less - which we have by still keeping, albeit at a reduced level, the old results.

But now you caved in and let the headlines dictate your story as they dictate everyone else's.

If anything, I think yor model was not sticky enough, because after all it was so responsive to individual polls.

I agree there is something improbable about the current stickiness. We all feel the 6/3 transition was real, a defining moment that would not simply "settle down" (did our opponents, however, not hope the same for the Wright affair?). I would even prefer a radical ad-hoc solution, suggesting that all pre-6/3 polls are to be offered at a 20% reduced value (with perhaps a similar wholesale reduction after the conventions?).

Those are just idle thoughts. What you do definitely need to do is to discuss in more detail why you went for this particlar methodology, rejecting others offered in your previous open methodological thread as well as in the current discussion - which is, you may note, something of the first open rebellion ever in 538. You need to offer something to quiet down the masses!

Anonymous said...

Shouldn't the line "Win%" read "Obama Win%"?

Aranae said...

I seem to find myself torn between supporting the change outright and agreeing with many of the concerns listed here. The answer may lie somewhere between the before and after Big Change #2.

The problem the change is correcting is fairly clear and I think the need for it has been outlined well. Nevertheless there is the concern that a shift in the national trend may be driven by specific voter types or regions. Correct me if I'm wrong, but since you're running a simulation, samples sizes aren't that much of a concern (other than a possible need to do more reps) and so the presence of a lot of added variance is tolerable (if not desirable) because the simulation will sample across that variance.

So what if you were to look at states, regions, and the nation through a nested effect. You can adjust for the state-specific effects albeit with an admittedly large variance. Higher nesting levels may allow for some precision, while lower levels may be more accurate.

Hypothetical scenario: NH has a lot of swing voters that have the potential to shift the state wildly. When the national trend is Obama +1, NH shows Obama +3; when it's Obama -1 nationally, it's Obama -3 in NH. Your new correction will undersell Obama's chances in NH when he's up and reduce them when he's down. Meanwhile, pretend PA has very few swing voters. A national +3 may only be a PA-wide +1. A sensitivity correction may be in order based on how far off a specific state is to the national estimate. This can help determine how much weight the national trend should add to the individual state estimates. That correction estimate may have a lot of variance, but the simulation samples across that variance (right?).

Likewise, specific national crests and troughs will have statewise and regional effects. People in CA didn't care what Obama said about Pennsylvanians and guns and religion. People in PA and neighboring states cared so much that it had a real influence on national numbers. Meanwhile Obama's current surge is largely due to bringing in Clinton supporters. That's going to have a much bigger effect in PA than in UT (where the handful of Democrats were already voting Obama anyway). Including a national, regional, and statewide (using a sliding window perhaps?) component, might tease out these local effects yet still allow for the ups and downs to be incorporated.

It's interesting to see how many commenters seemed to be shaken by the influence that input parameters have on these analyses. The notion seems to have been that Poblano has the answers because Poblano knows math and you can't argue with math. As I said in Big Change #1 comment section, I do think some of the flipping out could be helped by graphically representing just how much variance surrounds these estimates. Adding some pink and light blue to the pie charts on the top would help. As would error reporting around statewide margins. McCain is projected to win AK by a larger margin (4.1) than FL (3.0), yet he has a higher win percentage in FL (37.5%) than AK (33.5%). Why? Because the AK estimate has more variance. That does not mean that AK is more of a swing state than FL, it's just an artifact of greater variance around the AK estimate. I think that's worth reporting.

Modeler said...

Slack,

I think we're all getting at the same thing, which is that previous poll results should not be modified, as long as we can use the regression model to include recent polling data in state-by-state predictions.

I think that the current method of listing the various weights of the polls and the regression model makes a lot of sense. For example, in SD the result is heavily weighted by the regression model, as it should be. Would your "poll projection" just be the weighted average of polls without including the regression model?

Aranae, I agree that reporting the variance would be useful.

Anonymous said...

I know very little about stats. But I've been watching politics for a long time. This "feels" right to me. Today, it feels like Obama has a 65% chance of winning the election. It feels like he has a 40% shot at Florida and a 60% shot at Michigan. Most states line up pretty well on a gut level.

Yes, the new model overweights the "current headline trends." Two weeks ago, for example, it didn't feel like Obama had a 65% chance of winning. But I don't think that's a bad thing. The November election is a snapshot, and it will overweight "current headline trends," too. Nate's new model is less sticky and more volatile than the old one, but you know what? Election days are not very sticky either.

In any case, the new time trend will be less and less of a factor going forward, because as we get more state polls, the weight of the 538 regression becomes less. So if you are among the group who thing we should put more emphasis on a straightforward state poll average, have no fear. When there is enough data to go by the state polls, the model will adjust itself. Right now, we have only scattered state polls -- some taken at the heights of February and some taken in the lows of April -- and we need the 538 regression to make some educated guesses about where the ball is today.

Anonymous said...

Nate,

Your explanation is ambiguous. Are you making this adjustment only in the 538 regression? Or are you also using it in the weighted average projection?

The former makes more sense to me than the latter.

If you are doing the latter, I would appreciate an additional column in the state poll tables showing what kind of adjustment is applied to each poll as a result of the time trend.

Zolo said...

How would Barr, Paul and Nader come into the equation?

Anonymous said...

Obama is winning 300+ electoral votes! Finally the public has wised up and is ready to vote the way I knew they would all along.

Why am I not surprised that trend extrapolation begins with Obama's first surge in the polls? Should I be surprised if a future McCain surge coincides with a refinement to "reduce the noise" and "take the long view"? I think I'll skip the kool-aid and go back to pollster.com and RCP for my averages.

APoxOnBoth said...

I have to say that I have to fall into the "feels right" camp, as well. There was something wrong with the old way, it was predicting way too damned much of a "coin flip" in a year when GOP fundamentals suck. It also didn't predict momentum, which felt wrong from a dynamics viewpoint.

It was almost as if it was over-weighting the 2004 results, which would fit with the "underpolling" explanation provided, the less a state got polled, the more it reverted to the regression based off Kerry's numbers.

Which tied into the "Battleground" vs. "50 State" argument, if you define certain states as the only ones you're competitive in, it's a self-fulfilling prophecy because they're the only ones you *will* compete in. Kerry lost a lot of states by only a few points that he never visited or spent a dime in, and lost the two states he spent the most time and money trying to win. If you start from the assumption that past results are the only meaningful guide to future strategy, you get exactly what you measure.

Jallenrule said...

Nate: your site has just become my #1 bookmark. I heart normal distribution... keep up the outstanding work sir!

aaron said...

First time poster but devote reader since I found you
I don't know how I feel about this new methodology. I do understand that the goal is to try and deal with the fact that the lack of polling in some states will miss the present trends, occurring in other states and at the national scale.

However, I have several suggestions (many of which have been brought up by other posters
1) Even assuming that the underlying assumptions are valid, the adjustments should not be additive but rather relative to the previous support within each state

2) Since the 538 Regression already took into account the problem you are concerned with (though an attempt to including national polling is justified) and in fact being demographic specific

Modeler said...

Nate,

Thinking about this even more, it seems that you have developed a method to effectively further decrease the weighting of old polls. When you modify an old poll based on recent polling data, you are approximately replacing

("Week x" margin)

with

("Week x" Margin) + (Avg current week margins) - (Avg "week x" margin)

Accounting for smoothing and pollster bias makes it slightly more complicated, but this is effectively what is going on.

You are keeping the overall weight the same, thus effectively diluting the impact of ("Week x" margin). In fact, you are practically forcing the average margin of "Week x" polls to be very close to the average margin of polls taken in recent weeks. Correct me if I'm wrong.

I think the reason this sort of dramatic shift in the weighting seems appealing right now is because we have had a major recent event: Clinton's concession. Intuitively, the weight of polls before this event should be even lower than normal. However, I'm afraid that in order to more effectively capture the intuition that Obama is broadly outperforming old polls now, you have damaged the general usefulness of your model.

There are several ways to capture rapid shifts in opinion that don't increase the general variance of your method. If you consider the "random-walk" model of the evolution of voter preference (which is consistent with the square-root-of-time trend), you can consider periods of major interest in the campaign to be an accelerant for this walk. Thus instead of using cumulative time to determine weighting, you can use cumulative interest (interest integrated over time.) One way to measure interest would be to use Google Trends:

Obama, McCain interest in the USA

Or even by state:

Obama, McCain interest in Ohio

You could even estimate how far we are from the election in "interest space" by looking at prior election results:

Kerry, Bush interest in the USA in 2004

The recent high level of interest in the election would be a signal that there has been a great deal of recent movement in voter opinion, and thus old polls should be weighted less than usual.

Anonymous said...

This adjustment is a good one. My assumption is that most of the criticism on this board stems from people who support McCain, and just wet themselves seeing how poor his chances (at this moment) are in the fall.

asmodeus said...

Dear Nate Dog,
If you fly a time machine into November and then fly back to the present time you can weight your numbers according to the actual election results! (or is that cheating?)

Mard said...

You know, this new map reminds me of an older one...

It seems that fate/karma/God has a sick sense of humor.

KQuark said...

I have to admit I am never happy with poll data that is single point and has no error included numerically or graphically. I think you should publish 3 scenarios. The average poll data should be one map and the other maps should be shown based on propagating the error to one standard deviation. This would give you a maximum and minimum possibility of winning for each candidate.

LP said...

Someone correct me if I'm wrong, but I think we can say this looks to be more of a snapshot of what is going on now than a true prediction.

If that is true, will this get more accurate as it gets closer to the election?

Anonymous said...

One of the best things about this site was that it looked at the race from the long term perspective and didn't overly focus on the most recent polls. For example, if you had implemented this method during the Rev. Wright scandal, Obama's win percentage would have unrealistically plummeted. I liked the old method better.

Anonymous said...

"This adjustment is a good one. My assumption is that most of the criticism on this board stems from people who support McCain, and just wet themselves seeing how poor his chances (at this moment) are in the fall."

Stupid comments like this make me sorry I read the comments section. You offer no reason why you think the adjustments are better yet make wild assumptions about people who dare to disagree with your conclusion. Perhaps there is a reason why you like the adjustments so much yet can't say anything about it other than it's "good?"

Anonymous said...

"This adjustment is a good one. My assumption is that most of the criticism on this board stems from people who support McCain, and just wet themselves seeing how poor his chances (at this moment) are in the fall."

Stupid comments like this make me sorry I read the comments section. You offer no reason why you think the adjustments are better yet make wild assumptions about people who dare to disagree with your conclusion. Perhaps there is a reason why you like the adjustments so much yet can't say anything about it other than it's "good?"

Anonymous said...

Poblano:

After thinking it over I agree with your intent in making this change. I agree with the results you got in this case. I even agree with your concept behind getting from A to B. But I severely disagree with how you are using the data you have to project what current polls would say in unpolled states. To illustrate, let me offer the following scenario to your previous:

Lets say that CA, HI, NY, FL, ND, NJ, IL, and IA all polled up exactly 5% one week over the prior in which they all happened to also be polled and AZ and KS both polled up exactly 1% in the same fashion. The average of these ten states would be an increase of 4%. So are you saying, then, that you would expect Utah to poll up 4% over the previous week? I say this is madness. Indeed, if country-wide polling showed Obama up only 2% over the previous week, I'd expect him to either not have improved in Utah or to actually be doing worse.

Effectively what you should be doing is trying to model what the changes in the state-space have been over the past week based on only a few data points, and what you are doing is using a flat value to model the entirety of the state space. What I suggest you do instead is to consider all of the states, DC, and the country in their demographic space. For instance, if you were to consider 10 demographic variables as being significant to your regression, then you should plot each of these 52 as a point in the demographic space (so California would be at x% registered democrat, y% college graduate, z% latino, etc). Then, assuming major shifts happen in demographics proportionally (i.e. a change in the latino vote would have twice the effect in a state with twice the percentage of latinos), you should instead of a flat increase use the best-fit linear relationship that also passes through the graph point for the country-wide increase. Specifically, like you would use a best fit line (a one-dimensional subspace) over a one-dimensional space on a two-dimensional graph to model the changes with respect to one demographic, you should use a ten-dimensional linear sub space over your ten-dimensional demographic space in your eleven-dimensional graph to model the changes here. This may sound complicated, but the mathematics are essentially the same as with the best-fit line, only now you're using vectors for the x-values rather than scalars. The other complication is that in certain weeks you might not get the ten state contests in addition to the national polls you need here to make the subspace you are looking for well-defined. In such weeks where you have fewer data points than significant variables, you would instead take the subspace through your available points which has the minimal gradient magnitude.

In this way you would somewhat account for similar demographics without discounting either the country-wide polling nor the changes in groups of similar state which happen to not have a recorded change in a particular week.

Also, I'll vote for not changing any polls. Just do your normal regression, show that value, and then add a line which shows your recent change fudge factor for states not recently polled and the results of that fudge factor as well. Keep all of the polls and regressions constant and separate from recent-change projections and only change their weights over time.

As for all of the other stuff going on in the model, well, you should really explain it all over from scratch in full like you've done in the past so people stop asking the same questions over and over and so people can actually critique your methods properly. Have a full report on my browser by Monday. :)

Jason K said...

Just wanted to add my support to what some other commenters have suggested. It should be possible to go back on a state by state basis and see how well the past polling changes over time have correlated to the national trend line over the same time period.

That way if there is a state by state difference in how much they respond to the national trend line it can be factored.

On a separate matter I have a question. It seems that poling is extremely dependent on the pollsters turn out model. It seems like the turn out model is something that pollsters might be tweaking over time to try and better model the atypical turn out that has been seen this cycle. When looking at these various polls is it possible to tell if they are using the same turn out model in current polling as they were using a few months ago?

Anonymous said...

It's ironic that idiotic comments like:

>>>
Stupid comments like this make me sorry I read the comments section. You offer no reason why you think the adjustments are better yet make wild assumptions about people who dare to disagree with your conclusion. Perhaps there is a reason why you like the adjustments so much yet can't say anything about it other than it's "good?"
>>>

tickle me to no end, and contribute in no small part to why I do read the comments.

I think Nate has made a very smart assumption of factoring in the national polling numbers as a valid data point.

My earlier comment, however, was more of an observation on how (unnecessarily) personally readers of this blog seem to be taking the methodology. I have been an avid reader of this blog since the day Nate founded it (and a loyal reader of his diaries on Kos for months before that.) Also, I have been a card carrying, dues paying member of Baseball Prospectus for years as well. If Nate's political statistical acumen serves him half as well as PECOTA has served my fantasy baseball team's performance over the past few years, you can start printing the "Re-elect Obama 2012" bumper stickers now.

Clem G. said...

10 states go up 5 point, does the 11th state go up 5 points?

No, not if it is DC and Obama already leads 95/5. That last 5% aren't all going to move.

It's not clear from the explanation if you're applying a % of a percent (after the LOESS fit?) or just tacking it on.

Short of the full blown demographic fit, a prudent way of applying the effect is take a % of supporters moving between candidates (e.g., 5% of 10% in a 90/10 split vs. 5% of 50% in a 50/50 split).

To be even more precise, the 5% would no longer be 5%, it would be based on whatever the inferred % of supporters who moved were--subject to the other adjustments, of course.

Anonymous said...

I don't think this change will matter much the closer we get to the election, right now it seems a big deal, but you have to accept that we are now just under five months from the election, right now polling is still rather sparse, we'll know how effective this is in a few months.

Chill guys let Nate do what he thinks is accurate, and if you don't like it, well create your own site with your own model and see if yours is better

anjiaoshi said...

That "super tracker" graph for Obama reminds me of some of the graphs I saw charting Bush's popularity rating, which showed that in the absence of some kind of major national security event that bumped up people's confidence in him, he lost popularity points at a steady rate. Obama seems to be the opposite: in the absence of some kind of news event that damages people's confidence in him, he gains popularity at a steady rate.

TokyoPat said...

I agree with the changes in theory but agree with the posters saying there will be too much variation due to short term news cycles. An obvious solution is to make the LOESS parameter even more conservative to iron these out.

Anonymous said...

Nate,
The answer for validity concerns is data.

Can you pull some old data to show the movement-based method predicting the results of states in newer polls? And yeah, the magnitude of the effect of attitude-movement will depend strongly on things like the number of undecideds.

Anonymous said...

Damn, I would kill for a Pennsylvania poll right about now. Why in the hell hasn't anyone polled that state in the last month?!?

Riley Murray said...

I'm not worthy!!! A stats geek nirvana.

Finally stumbled onto your blog, I'm in love and all shook up!

Here's my humble take on projection modeling. Takes long view of historical tends since 1948.

http://blueridgedata.blogspot.com/2008/06/its-economy-and-8-years-of-increasingly.html

Brandon said...

Presidential Polls

AR- Rasmussen

McCain 48%
Obama 39%

NV- Mason-Dixon

McCain 44%
Obama 42%

Anonymous said...

Hey, Obama Pollester Pimp (OPP) nice work! It looks like you compressed the scale and put a 7th order polynomial in your "Super Tracker" to get it to snake through all the points and give Obamma a big upturn at the end. What is the correlation coefficent--0.1?

You are the OPP!

VOR said...

In a lot of ways, I like to conceive of this exercise as trying to plot the course to November 4th based on past and current information, but discounting or dampening bumps in the road and using indirect information when direct information is lacking.

If all you want is a reading of what the recent polls tell you, you can follow the RealClearPolitics (RCP) average of polls. You can take them raw, without any discounting for sample size, date of the poll, or credibility of the survey organization. It's there for you.

If you want to make up a bit for the shortage of polls in some (many) states, you can substitute some baseline for the polls, such as the 2000 and 2004 election averages for the R and D candidates or state demographic characteristics. That's a good bit of work and takes some statistical know-how. That's what 538 did in his "538 regression" in its earlier iteration.

Keep in mind that the "538 regression" isn't the main factor in the prediction. It's treated as just "one poll" that is averaged together with the adjusted polling averages for each state (adjusted for sample size, recency, and reliability of the pollster).

But Nate found some major limitations nonetheless. One was that he was underestimating the uncertainty in the regression model by relying only in recent elections as a baseline. So he added more elections -- from 1988 to 2004 -- for this purpose.

Another limitation to his previous method was in its total reliance on state polls for the other part of his projection. There was still a paucity of polling in many states, and moreover as this polling "aged" it became a less up to date indicator of the trends in candidate preferences.

At the same time, the major polling organizations were regularly conducting national polls, and he sought some way to take this newer information into account to update his state estimates. Without repeating what he wrote in his columns yesterday, I would only say that this is a worthwhile effort and it was bound to bring the state estimates more into line with recent national trends, which in the short term meant that some state McCain-Obama polling results that were indirectly affected by the fact that Obama and Clinton were still competing against one another, would now be discounted more heavily because they were "old" and the "national polls" were new.

In a few weeks, almost all the state horserace polls will hopefully come from a time after Clinton suspended her campaign. At that time, neither the 538 regression nor the national polling will play as strong a role in the 538 projections as they do today. We're in a transition period in the campaign, in the polling data, and in the method.

I think it's a stronger approach that will steer a straighter course toward November 4, with fewer random bumps and potholes in the data roadway.

Anonymous said...

Hmmm....I'm not convinced this works properly. Main issue is that not all states are equal and it risks introducing errors rather than clearing them up.

Still, there might be a way around it. Easiest (although that doesn't automatically mean most accurate) way would seem to be to split the electorate into 3 rough groups, self-id'd dems, reps and inds. The bounce post Obama's win should mostly be in the dem section, so States with higher dem representation are likely to experience more of a bounce than those with low. I'm assuming you have figures on the percentage of each group in each state.

secondly, I'd lower the weighting of the "new" poll you introduce so it acts as just one new poll for each state. It will therefore have the effect of adding an extra guesstimate while not overpowering the data. I'd also not use it in any state that actually has a new poll that week.

Pete (from uk but following this election)

C Ryan King said...

Nate,
Another statistical concern is how sensitive your simulation calculation is to the variance of your perturbation term. Given asymmetrically distributed weak/strong states, when you apply a factor which increases the variance of your perturbations (for example, the trend-tracker looks to have some pretty significant variance attached to it), even if it pushes the mean in the right directions, you will dramatically change the results.

I.e., McCain leads barely in more states but has fewer strong EVs; when you apply a bigger perturbation you obliterate the effect of the narrow-lead states and switch the win to Obama.

Looking at the epdf posted today, I would guess that the increase in variance accounts for the big swing in %Win_Obama. Fundamentally, this is the result of the difficulty in interpreting the meaning of the simulation results. Most people see 64% and think "Wow, that's pretty strong confidence!" whereas the right interpretation is "Wow, a huge chunk of EVs are in the air and Obama has more solid EVs."

C Ryan King said...

To back up my assertion that it's the variance and not the effect on the average of the trend that everyone's paying attention to, look at the landslide percentages. You have near a 40% chance of a landslide! A joint 30%-10%! Your readers are not groking that the interpretation is "we have no idea what's going to happen" they're just seeing "65% Obama! Awesome!"

Anonymous said...

The bounce post Obama's win should mostly be in the dem section, so States with higher dem representation are likely to experience more of a bounce than those with low.

Actually, from what I've seen, a lot of the change has been in the independent section. While there's been some consolidation of Dem support for Obama, the battleground is likely to be in the middle. I agree with the idea in principle of trying to model the shifts of the three groups, however. Not sure that kind of data is always available, at least not in state level polls.

lilnev said...

I'm tentatively in favor, but would like to see some validation. Specifically, go date-by-date from early in the season. At each date, generate the full model, with both the old methodology and the new. Treat the old "weighted ave" and the new "projection" as a prediction for about-to-be-released polls. Which method would have been more accurate over the course of the season so far?

And, one of the drawbacks of the new methodology is that it makes it hard to see the impact of new polls, or even to figure out whether they're good news or bad. Mason-Dixon now has McCain +2 in NV. They haven't polled there before, so that won't impact the Super Tracker. It's better (for an Obama supporter) than any of the actual recent polls, but not as good as the 538 regression or projection. So he underperformed? His projected bounce here didn't fully materialize and/or he didn't live up to the demographics. But what effect will this have on the 538 regression in other NV-like states?

The new Rasmussan poll from AR is even more confusing. McCain +9, vs. +24 on 5/12, +29 on 3/18. So this has got to be an improvement. It will show up in AR obviously, in AR-like states via the regression, and also in every state that was polled the week of 5/12 or 3/18 (although through the smoothing of LOESS, this effect will be spread among neighboring weeks). 15 and 20 point jumps are pretty striking. How much will this individual poll move the LOESS?

p smith said...

The proof that the new methodology is flawed can be found by looking at the regression figures for FL and NV. In FL McCain has been ahead of Obama in every poll and yet the regression has a small lead for Obama. In NV McCain has led by 5 points in the only two polls taken in the last month or two but the regression shows a large lead for Obama. Let's not forget both states were won by Bush in 2000 and 2004. There is therefore no basis for these regression figures.

I don't disagree that the polls may end up here but that is pure guesswork. The attraction of this site was that it relied on cold hard data and changes to the map had to be earned by the currency of cold hard polls. Sorry Nate but you've overreached here.

I want Obama to win but wishing is not the same as seeing.

VOR said...

The proof that the new methodology is flawed can be found by looking at the regression figures for FL and NV. In FL McCain has been ahead of Obama in every poll and yet the regression has a small lead for Obama.

That's not proof. Have you looked at the dates of those polls, most of which are from February and March? Yes it's true that none of published state polls has shown an Obama majority, but none have been published since Clinton suspended her campaign either.

Steven said...

Nate, CONGRATS, you were proven Genius today by Rasmussen

where a state isn't polled, things certainly change over time... leaving a state Solid McCain while the landscape changes makes not sense. You're trying to be a leading edge indicator... and you've succeeded.

On 5/12, Rasmussen had Arkansas +24 McCain
538 Regression has the state +11 McCain
Rasmussen today has state +9 McCain


The Regression ended up being closer to reality than very old polls... That's the point of this method. CONGRATS, detractors can now shut up

Modeler said...

I'm also concerned about the self-consistency of this new method. If I understand correctly, the new method dramatically underweights old polls, but assumes that the accuracy of current polls decays as a function of (sqrt(t)). If the accuracy of current polls decays as this rate, shouldn't it be assumed that the accuracy of old polls decays at the same rate?

Modeler said...

Steven,

I think you misunderstand many of the detractors. The problem isn't the use of regression; I think most of us are in favor of it. The problem is the way in which the regression is developed. It is now very highly dependent on the most recent polls.

For example, this new method most likely would have done much worse in IN and NC, because the polls in those states the week prior to the election showed Clinton doing much better than she did. However, because the model assigned reasonable weights to "old" data, it was able to make a more accurate prediction.

VOR said...

Modeler: I assume that Nate applies a decay function to old polls, too, as he did before. They eventually roll off the ledger as their weight falls below .05. Though I admit I'm not quite sure the FAQ has been updated enough to clarify this point; it just seems likely he's still doing this.

Anonymous said...

Steven, I think you're wrong.
The tracker correction projects an hypothetical predicted current poll. This is not made explicit but can be inferred from the 538 and the overall projection.

It is clear that Nate's AK hypothetical predicted poll is about McCain +18, not McCain +9.

This is precisely the worry people had with Nate's new approach: by applying his trend homogeneously, he overpredicts here and underpredicts there. In this case the new AR result is a direct confirmation of the widespread hypothesis (which Ras themselves mention) that the current bounce is correlated with past Clinton support. (Good news for FL?)

Anonymous said...

Meant "AR" above.

Modeler said...

Vor: I'm pretty sure he does apply a decay function. But before he does this, he now updates the old polls to bring them in line with more recent polls. This, combined with the decay function, results in very little effective weight on old polls.

The end result may seem more intuitive because it's more heavily based on recent data; this is especially true now given the recent major shift in the election (Clinton's concession).

However, Nate's strength in the past has been proving short-term conventional wisdom wrong by examining long-term trends. He now dramatically discounts old polls and dilutes the state-by-state demographic information with general trends.

As a side note, I think the decay function should not be exponential; rather, it should be:

1 / (sig^2 + t*sigt^2)

Where sig^2 is the variance due to pollster error and sampling size and sigt^2 represents the increase in variance with time. This is more consistent with both a "random walk" of voter preferences and his observation that error increases as a function of sqrt(t).

Gypo said...

Making up numbers will hurt the credibility of this site.

Election Day is 5 months away. What's the rush? Can't we be patient and wait for poll numbers rather than making them up based off of assumptions?

This site should be run off of hard, raw data, not assumptions.

It's OK to be pro-Obama, but this is obviously a case of twisting and spinning numbers in his favor.

hosertohoosier said...

Many people here think using a trend involves "making up numbers". Many of these people obviously have little background in stats. Aticles in respectable journals "make up numbers" all the time.

If you have a large survey, for instance, and people drop out, you may bias that survey by ignoring those that drop out. Thus often values are imputed for the missing people.

Similarly, Bayesian methods (well a Gibbs sampler) in econometrics (I'm generalizing here), involve drawing values from a probability density function that is probable (combining priors, which are assumed, and what you have based on your data).

Anonymous said...

It has been shown that 73.5% of all statistics are made up.

Anonymous said...

even in red states Obama's numbers have bounced up since Hillary has left, lets see how this plays out in the next few weeks before calling Nate's new method flawed or making up numbers.

I think he's right, and his numbers seem to be playing out in recent polling.

Considering we are five months out, there is no harm from him experimenting with a new system, he can always goback to the old one if he finds too many flaws

Anonymous said...

plus his data seems to be pretty close to what i've seen on Electoral-vote's and MyDD's maps

Anonymous said...

So much hostility in these threads!

Nate, eliciting criticism and hostility from people is one of the surest ways to know that you've arrived ;)

Rex said...

Keep pimping those Obama polls!

Your site is a joke.

Can't wait to see what your regression says when the 527's swiftboat the tar out of Obama.

Umeeksk said...

Here's an interesting idea of how to illustrate this data on a state-by-state basis:

First produce a LOESS curve like the national one you have produced, but do one for each state.

Then take the first derivative of the LOESS curve and use it to get a "momentum" figure for Obama and McCain.

Then put little a pair of red and blue arrows/bars on each state on the map, pointing up for rach candidate's positive momentum and down for negative momentum, proportional to the size of the momentum figure.

That way we can start to visualise which way the polling is going in different parts of the country and guess which states may flip next, etc...

Will said...

It seems that Nate’s weighting of old polls vs new polls agrees pretty well with the observed data. If you apply a similar analysis to the recent Dem primaries in OH, TX and PA, where a lot of polling data is available, the prediction seems to closely match the actual results.

Great job Nate.

PS I quite like umeeksk’s idea of a derivative to show the momentum vector.

Slack said...

Modeler-

The projection would compensate for the trends seen in the super tracker, and provide an estimate of how a state might be currently polling even if there's no recent data.

In other words, it would provide the same kind of analysis as is currently present, but explicitly presented as an educated guess, as opposed to changing the data based on the current trend. It could also show where Obama is in a particular state in relation to his status nationally.

However, it's Nate's call, and I certainly don't have the background in stats to argue well. If the new polling bears the new method out, then I'll have to eat my words.

Benjamin Schak said...

My main criticism of this site was that it failed to account for national surges in states that happened to lack recent polls. I'm glad to see that you've done something to address that (although I admit I haven't read closely enough yet to understand it).

(I'm also glad to see that your prediction is now closer to my own estimate of 312.1 electoral votes.)

Brian said...

I'm trying to understand the new method. Can someone tell me if this is right?

Suppose there was a Rasmussen poll in Alabama two weeks ago where Obama was at 35%. Today a new Rasmussen poll comes out that shows that Obama is at 37%. Suppose also that over the same two-week period, Obama's nationwide support has gone from 51% to 55%. According to the new method, because Obama gained two points less in Alabama than he did nationwide, would 538 conclude that Obama has _lost_ support in Alabama?

Anonymous said...

I'm glad Nate did this. Its just logical. I live in Florida and their is no way that McCain is as strong now than before Hillary dropped out.

I personally can't wait for new polls from FL and other major states but part of the merit of this site is that it reflects the current environment as it changes everyday... To give McCain such a substantial lead in some states while leaving him with a nearly 50/50 chance of winning if the election were today is simply misleading and disingenuously.

Anonymous said...

It seems to me - the underlying assumption is that the change in support is uniform across the country. I'd like to see solid data before accepting this as true. For eg. if in national poll Obama goes up by 3% - is it not possible this is because there is a larger surge in support in blue states like WA, CA, MA (like we have seen recently) compared to FL or TX ?

Using old data you should be able to see how correct the assumption is.

Anonymous said...

Brian,
No... it's more like this:
Let's say the last poll from Alabama was a month ago, and showed Obama at 36%. Then, let's say that the average of national and state polls over that month had shown Obama gaining five points. Then this new method would estimate that Obama's support in Alabama was around 41%. If new polls come out that show that Obama's still at 36 in Alabama, then those count for more, but if there isn't a lot of recent polling, then this method gives us a chance to estimate what might be happening in a particular state.
Am I getting this more or less right?

Anonymous said...

Nate,

to convince people that your new methodology is better than the old one, it might be interesting to show in a post the old win percentage tracker together with a win percentage tracker based on the new methodology applied to data from march onwards. my guess is that one would clarly see that obama's win percentage decreased quite dramatically at the beginning of march and then stayed stable for a while, whereas the old one showed a slow decrease that lasted until end of april. this would mean that the new methodology is able to get the changes at the time they happen, and not just weeks later, as you could clearly relate the dramatic changes to events that happend. moreover, this would show that the new methodology has not a candidate bias.

Anonymous said...

Not to go off topic, but Mason-Dixon has a new Nevada poll out, 44% McCain, 42% Obama.

bemused said...

Nate, your changes are looking good so far. To PSmith, check the latest NV poll out (not yet in Nate's quiver).

"As the presidential candidates square off for the general election, Nevadans are closely divided between Democrat Barack Obama and Republican John McCain, according to a statewide poll.

"If the election were held today, 44 percent would vote for McCain, 42 percent for Obama, while 14 percent of likely voters remain undecided, according to the poll of 625 likely voters, conducted Monday through Wednesday by Washington, D.C.-based Mason-Dixon Polling & Research Inc. for the Review-Journal and reviewjournal.com."

John H said...

it might be interesting to show in a post the old win percentage tracker together with a win percentage tracker based on the new methodology applied to data from march onwards.

I would second this, Anonymous @ 1.27. It's an excellent idea. If it does show a swift structural change rather than a drift, we have evidence that this is a better way to do it.

Those who are worried about assuming a national increase of two points means the same increase in all 50 states - surely you will agree it's more likely to mean that than to mean no increase in any states.

Benjamin Schak said...

Anonymous of 2008-06-15 14:05 said: "It seems to me - the underlying assumption is that the change in support is uniform across the country. I'd like to see solid data before accepting this as true."

The old tacit assumption was that changes in opinion in different states were 0-correlated, which goes against common sense.

The new assumption is, as you stated, that the change is uniform across the nation. That is, if the LOESS regression line says that polls are 2.5 points stronger today than they were on 5/17, then Step 4 simply adds 2.5 points to every poll on 5/17. In correlation and variance terms, this corresponds to an assumption that different states are 100% correlated and have equal volatilities.

The new assumption has much more common sense to it than the old one. That is, if you were 100% sure that Obama's numbers went up 2 points yesterday in Michigan, and you had to guess whether Obama's numbers went up 0 points or 2 points in Ohio, you would reasonably guess that Obama gained 2 points in Ohio as well.

I think that something even slightly better would be to try to measure (or make explicit assumptions about) the correlations between different states. If movements in OH and MI are 98% correlated, then you would guess that Obama gained 98%*2 = 1.96 points in OH if he gained 2 points in MI. However, if you believed that AK and MI are 75% correlated, then you would guess that Obama gained only 75%*2 = 1.5 points in AK. (This assumes still that all states are equally volatile.)

genetastic said...

I think it would make more sense to calculate the inter-state correlation on historical polls, and then use that correlation to weight how much polls in the other states affect the predicted polling in the unpolled state.

For example, a recent poll in Alabama is likely more predictive of what the polling will be like in Mississippi than a recent national poll or a recent California poll.

genetastic said...

I see someone else just said the exact same thing. That'll teach me not to read to the bottom...

Modeler said...

John h,

The question isn't whether the national increase should be reflected in all of the states; it's how the increase should be reflected.

Currently Nate does it by retroactively changing old polls.

I believe that he should do it by changing the regression model to reflect the national change and updating the states using the new regression model. From the comments here, others have similar opinions.

To understand the difference, imagine there hasn't been an poll of Utah for a long time, but recent polls show that Obama is doing increasingly well in states with large Latino populations and about the same elsewhere. The current method would say that these recent polls are evidence that Obama is doing better among Mormons.

Basically Nate has removed much of the impact of demographics in favor of broad national trends. Right now, this makes intuitive sense because the "unity bounce" seems to be broad. But if Obama disproportionately gains support in one group, the new method won't pick it up as well. In essence, the demographic model has been largely replaced by a model that says:

Performance in state = last poll in state + national trend since last poll.

It's better than basing the performance entirely on the last poll, but I liked the demographic analysis much more. In my opinion, it is what has made this site so successful.

Genetastic and Benjamin Shak: what you propose re: correlations between states is exactly what the demographic model would capture. The closer the demographics of two states, the more they would be correlated.

Charles Pluckhahn said...

Question and a comment:

1. Did you backtest this new methodology against past presidential elections?

2. I'm always skeptical of any new methodology that tells me what I want to hear. This doesn't mean that I'm critical of you; it means that I'll have to kick the tires on this over time and see how things go.

Charles Pluckhahn said...

Oh, and an observation. Someone described this new model as a "simplification." In fact, by definition a model is a simplification of reality. Otherwise, it's not a model.

Fielding said...

Awesome job, Nate!

I think that your new approach is more accurate, but I worry that it is too "hypothetical" for some people. Accordingly, I think you should consider running two charts side-by-side, one based on your previous approach which uses only "hard" data, and one based on the newer, more analytic approach. This would satisfy both camps, and give you an opportunity to better show off your chops.

Keep up the terrific work!

Charles Pluckhahn said...

Specifically:

1. How can you have Obama winning NV when, in the latest poll, he's down by 2%? Are you straight-lining the improvement in his standing there and assuming the trajectory will continue? As someone who used to predict corporate earnings as a major part of his job, that strikes me as being a bit dicey.

2. You've got him winning VA, and while I would love for it to turn out that way I've noticed that he went backwards in GA and IN.

3. The projections for OH, MI and IN seem pretty optimistic, too.

Please believe me when I say that I'd be overjoyed if this were to come true, but it'll be a while before I'll buy this new methodology. I know all about tweaking models to make them spit out a number you'd like to see.

In my old business, that would work for, oh, one quarter. Sometimes you'd be right when you threw one of those hail mary passes (in either direction), but there were more than a few times when I was "right" and knew that I had done the equivalent of placing a winning roulette wager.

p.s.: While I (and others) have suggested backtesting, that's no panacea either. The past and the future are two different things.

Anonymous said...

Benjamin Schak:

I've thought about your suggestion before but there is a small problem with it: If PA and WI are 66% correlated and WI polls up 6% one week over the previous, how do you know if PA went up 4% because of this, or PA in fact went up 9%? Since these are each properly 66% correlated with 6% we can't know. Further, choosing the lower value would always underestimate, choosing the higher would always over estimate, and choosing the average would also always over estimate.

At June 15, 2008 4:00 AM I proposed a way around this by finding where states without data lie on the linear trends connecting all of the other states. In this way we also allow our extrapolations to the present to remain accurate in weeks where we don't have a wide distribution of polls. So, if only solid blue states and swing states are polled in a given week, your model doesn't seem to give a good prescription for what the red states are doing. In mine, if both are up 5% then it is reasonable that the red states would be up 5% too. If instead the blue states are up 5% and the red swing states are up 1% or less, then it is reasonable that the solid red states would actually be doing worse for Obama.

Your and my models can be combined somewhat by using a Poisson kernel in the demographics space rather than the plane I suggest, but I'm not sure that that'd be more valid and it would certainly be much more complicated.

Anonymous said...

Can we start banning people who say "I live in state X and I know that candidate Y is going to win here." Ugh.

John H said...

Modeler - my comment was more aimed at those who seemed to think this was "making up numbers".

The methodology you suggest certainly seems a more straightforward way to go about it, and I've always been a fan of straightforwardness.

However, shouldn't this have been happening already? The regression model always gave more weight to more recent polling. That said, on the whole I was always convinced by those who said it was foolish to neglect the information provided by the nationals.

Mike (sweetjazz3) said...

It'll be interesting to see what the model predicts as new data comes in. For example, the new Nevada projection comes in 4 points below where Nate predicted based on extrapolating from recent trends. I think this will cause his model to dampen the overall estimate of Obama's "bounce" and decrease his winning chances in other states besides Nevada. Of course, the latest Arkansas poll has Obama doing 2 points better than what Nate projected, so this will have the opposite effect.

Basically, because this model projects changes in polling data, a candidate's winning chances will change not based on whether he improves in the polls but based on whether his change in the polls is better or worse than what is projected.

Also, in the end I don't think this change will make much of a difference in November when the key swing states will all have lots of recent data available.

To me, the biggest issue I have is assigning an absolute change across the board rather than a relative one. For example, suppose state X has 20% undecided voters and state Y has 10% undecided voters. (The other voters, for the sake of simplicity in the argument, have already settled on a candidate and will never change their preference.) A new poll comes out in which candidate A's support increased by 2 points in state X, which now has only 18% undecdided. (Candidate B's support, therefore, is unmoved.) What would be the reasonable projection for state Y? In my view, it would be that candidate B wins 10% of their undecided voters, which would result in an increase in 1 point and not 2 points.

Benjamin Schak said...

Charles Pluckhahn:
You wrote, "How can you have Obama winning NV when, in the latest poll, he's down by 2%?" I think it is quite reasonable to have Obama winning NV in spite of the most recent polls (indeed, in spite of the most recent three polls). The poll that came out today (and the other two polls) all had a fairly large MoE. When you weight that against significant evidence of an across-the-board Obama surge,... well, I don't know what you get -- it really depends on the extent to which you believe states tend to move in unison. I believe (and Nate seems to believe) that states do tend to move in unison, in which case, you get that NV is fairly even or slightly pro-Obama despite the most recent polls there. (Taking into account the latest NV poll, I have Obama at 51.3 in NV with a 4.8 MoE for the election-day results and a 2.1 MoE for how things stand now.)

Anonymous of 2008-06-15 16:11: I'd make an exception for people from DC or UT.

genetastic: No problem, we've all done the same thing. ;-)

Anonymous said...

Mike (sweetjazz3):

Again, this retains the problem of assuming you know what demographic is in fact shifting in the polls. In a given week it could be undecideds, it could be republicans switching, it could be people previously committed to a third-party candidate, or it could only be latino voters over the age of 50. By scaling things according to only one demographic you are implying that a linear trend should be used and that only that one demographic is significant. But, with multiple data points among states with varying demographics each week we can do better and consider things along many more demographic scales yet still retain the same logical linear relationships.

Heck, one thing I hadn't even considered yet is that many polls do in fact break things down by useful demographics, and that could even serve as a sort of check on the process.

Anonymous said...

Benjamin Schak:

I think that Poblano is likely overstating the bump in NV, but because the reality of the situation is more uniform here than it would be on a random week, he isn't overstating it by much. Simply put, Obama recived the biggest bumps in states where Hillary had a clear victory over him. NV is not one of those states, and the bump Poblano is giving everywhere is affected heavily by those Hillary states as they happened to be polled more heavily than Obama's in the past week. As such, NV is closer to where it should be than before, but I'd still put it at 1%-3% more for McCain.

APoxOnBoth said...

I think Nate may need to add some poll basics to the FAQ, now that he's attracting attention from non-stats geeks (not to mention the McCain partisans).

The most fundamental fact you have to understand is that a election poll is *always* a prediction, specifically it is a prediction by the potential voters of how they will vote come election day. Some of those will change their minds, and those who are "undecided" are the most volatile. So the actual election result is always going to be different from even the most recent polling.

What Nate is trying to create is a modeling system for how likely it is that the numbers are going to change in one direction or another enough to change the electoral vote result. No more, no less. As we get closer to Election Day, given his methods, the range of possibilities will get refined down, and that curve at the top will become narrower, there simply won't be time for states to change enough to flip.

It's a given of statistics that you get what you measure, and of management that you see only what the numbers tell you. We saw this in the 2000 and 2004 elections, the perception of which states were close enough to be considered "battlegrounds" became a self-reinforcing process, a state was considered a battleground, so it was polled more, so it looked more volatile, so more money was spent there, which meant more polling and more volatility....

Nate's methods are genuinely new, and this entire season is going to be a test of their utility.

Benjamin Schak said...

Anonymous of 2008-06-15 16:00: From reading your first paragraph, I think you're saying something like the following: "When two things are 66% correlated, it means that one thing changes 66% as much as the other." That's not the right understanding of correlation.

Suppose you have two intelligence tests (let's call them the SAT and the ACT) that are 75% correlated, and I get a +4 standard deviation score on the SAT. Common sense suggests that I do well on tests in general, but that I also had a little bit of luck on my side to get such a great result; therefore, you would expect that I'd do quite well on the ACT, but not 4sd well. Conversely, if I took the ACT first and got a +4sd score, then you'd predict that I'd get a good score (but not as good as 4sd) on the SAT.

(More formally, if you take two random variables with a joint normal distribution with correlation r, and take the cross section of their distribution where the one variable is at n standard deviations, then the other variable is normally distributed with a mean of r n.)

Here's how this applies here. Before a new poll comes out, you have an a priori notion of how PA's and WI's results are distributed: They are normal, and they are positively correlated with correlation r, just like the a priori distribution of my SAT/ACT scores. Then if you measure PA to be n standard deviations above your a priori estimate of its results with certainty, you should think that WI is r n s.d. above your a priori estimate of its results. Likewise, if you instead measured WI to be n s.d. above your a priori estimate with certainty, then you should think that PA is r n s.d. above your a priori estimate.

(I like this write-up of regression to the mean.)

It should get a little more complicated, since
1) A typical PA poll doesn't tell you the PA results for sure. This means that your a posteriori estimate of PA is less (probably substantially so) than n s.d. above your a priori estimate, and also that your a posteriori estimate of WI of less than r n s.d. above your a priori estimate. There are ways to formalize how to trade off old information against uncertain new information, but that requires further assumptions about how volatile public opinion is over time.
2) States might have different a priori levels of uncertainty. If your uncertainty indicated a s.d. of 3% for PA and 5% for WI, then a +3% certain measurement in PA should translate into a +r*5% estimate in WI.

(By the way, anyone who's suggesting that Nate is doing this to get the results he wants is crazy. There are many easier ways to get the results you want than to implement a new four-step process that happens to fix a large flaw in the old method.)

michael said...

Other than the idiot troll who must spew his partisan Obama hate, I am really impressed by the thoughtfulness and insight of the posters. Anyway we can get rid of the far right Obama-hating trolls and keep this a site about the numbers?

Juris said...

Regression to the mean: here's another "practical" illustration by our blogmaster.

http://www.baseballprospectus.com/article.php?articleid=1897

Anonymous said...

Benjamin Schak:

I realize all that you said in your last post, but what you are saying seems to be based wholly on the following assumption (correct me if I'm wrong):

The mean for a state's distribution should be at "unchanged". That is, by going r n s.d. above where you were, you are starting with the mean at wherever it was the last time a poll was done, and then increasing that number by r n s.d. So, by "regressing to the mean", you are essentially regressing to things not changing.

This assumption seems contrary to Poblano's intentions with this whole extrapolation and seems like it would always produce very conservative results. If five states each increased by one standard deviation for Obama and a state with 5% correlation with those states wasn't polled, would you assign a 5% s.d. increase there? a 25% s.d. increase? If the whole country in fact did increase a standard deviation, you would be missing most of the change in most of the states. And the problem with going with 25% there is that a state which was highly correlated with those five states, say 60% with each, would then have to increase 3 standard deviations for Obama.

I feel the test analogy is faulty because standardized tests are based around the assumption that the whole population doesn't make sudden shifts, and that for each person that scores high there is one that scores low. But elections don't fit under unchanging (or even slowly changing) bell curves. The whole population or any portion can move one way or the other suddenly meaning you can never predict where the mean is likely to be.

Now, having said all of that, I think there is a way that what you are proposing could be modified slightly to produce very accurate results, and that is if we simply had a way to determine the means separate from the correlations, and to determine the correlations separate from the means. I'm not a stats person (MS in pure math only) so I wouldn't know how, but my guess is if you subtract the country-wide average change from each state change before determining their correlations to eachother and then only use the correlations to determine changes on top of the country-wide averages, you might be able to model what you want with country-wide polling change + "unchanged" as your new expected mean.

Something like this would be good, but I'm still not sure it achieves what Poblano's intentions are (especially considering the stark contrast it holds to his current method). In the end, though, your model (with the modified mean) and mine would predict roughly the same thing if you assume that the correlations between states is accounted for mostly by demographics in the states.

Charles Pluckhahn said...

benjamin, I hear you, and perhaps that will be the case. But I think one impact of the new methodology is likely to be an increase in volatility of the projections here.

Because I'm not a mathematician, I'm going to use an analogy here that might or might not work. (Please forgive me if this is basic knowledge to you.) In the bond world, there's a concept called "duration," which is the average maturity of interest payments.

Imagine a 10-year bond with a face value of $1,000, paying 5% interest. Every six months, you get a "coupon" (interest) payment of $25. At the end of 10 years, you get your principal (the $1,000) plus your final coupon payment of $25. If you throw all those payments into a spreadsheet, you'll find that, due to the time value of money, the average maturity of the "10 year" bond is actually 7 years, because a bunch of those coupons were paid early.

Now, let's imagine two of these bonds. One is the 10-year bond I've mentioned. The other is a 10-year "zero coupon" bond, which pays all of its interest, plus the principal, at the end.

If interest rates should happen to change before the maturity date, the current price of the zero-coupon bond will fluctuate a lot more than the current price of the standard bond. This is because, by putting everything at the end, changes in the time value of money are focused on one point, way in the future.

Nate's new model still allows for the equivalent of "coupons," i.e., new poll results, but it seems to me that he's using some manipulations to extrapolate change patterns farther out in time than his old model did.

So, for example in Nevada, if a new poll comes out showing Obama 6 points back, or a new national poll shows a significant slippage, you'll trigger a cascade of new trend assumptions, state by state.

Maybe I'm wrong about this, but I'm thinking the new methodology is going to be giving us all somewhat (how much, I'm not sure, because I'm rotten at geektitude) more thrills 'n chills along the way. It'll be interesting to watch.

Charles Pluckhahn said...

Correction: Duration is the average maturity of bond cash flows, i.e., including principal, not just interest payments.

Charles Pluckhahn said...

For non-finance types, a bit of trivia. Because a 10-year bond has a duration of 7 years, and because the average household mortgage borrower moves every 7 years, the 30-year fixed rate mortgage has historically been tied to the 10-year Treasury bond.

They start with the Treasury bond because it's the so-called "risk free rate," i.e., because the government creates the currency, you'll always get paid. The difference, or "spread," between 10-year Treasury rates and mortgage rates is a function (in theory) of default rates, competitive factors, and costs such as underwriting and servicing.

The current residential mortgage crisis notwithstanding, that relationship endures. For how long is another question, of course, and beyond the scope of this exercise in Trivial Pursuit.

Modeler said...

John H,

You are right in that national polls should be used, and Nate is doing the right thing by treating the nation as a state.

The regression model still gives more weight to recent polling, but in my opinion it actually overweights recent polling. Instead of being based on high-weighted recent polls and low-weighted older polls, it is based on high-weighted recent polls and low-weighted older polls that have been adjusted to more closely resemble recent polls.

One of the problems is that by running a regression on polls that have been adjusted based on national trends, the regression will be trying to fit noise that has nothing to do with the underlying demographics of the states. The predictive power of the resulting regression is very likely reduced as a result.

lilnev said...

As I see it now, the underlying assumptions:

Support in a state is the sum of a demographic component, a state-specific residual from the demographic component, and a time trend that is the same for all states. We're thus trying to fit a two-dimensional space -- states and times. (Previously, we collapsed across time and only tried to fit the states.)

The new approach is to remove the demographic and state-specific components in order to estimate the time trend. Hence the dummy variables to isolate the tracking polls and the pollster-state pairs. (Incidentally, is this a tacit admission that the expected value of a Ras. poll is not necessarily the same as the expected value of a SUSA poll in the same state, i.e. that pollsters may be biased?)

Then, the time trend is removed by adding the LOESS figure to each poll, and we try to estimate the regression model and state-specific components.

The concerns people have voiced seem to be twofold (well, threefold, but I don't count "Poblano is hacking the numbers because he's an Obama shill." He, and most of us, are here because we want the clearest insight we can get from the available data. If we wanted cherry-picked hackery, there are plenty of other sites available....).

First, that state data shouldn't be contaminated with national data. This is basically asking to go back to the zeroeth-order approximation, where it was assumed that there was no time trend.

Second, that the current approach isn't sophisticated enough, because it's only a first-order approximation of time trends and therefore assumes that the trend will be uniform across all states/demographics. These people are asking for a model that includes interaction terms between time and states/demographics.

I don't know. Overall I think the changes yesterday are an improvement, though we need to be clear that we're now explicitly trying to predict the results of a poll/election held today, rather than a backward-looking indicator of past support. The additional time-demographic interaction terms that some people would like are both appealing and dangerous. They would add a lot of new variables to the model, increasing the risk of overfitting and hence too much volatility to each new datapoint. I'm already concerned about the volatility added yesterday. Let's see some tests of the current models stability and predictive power before we call for even more to be added to it.

Nate: You should take the amount of concern in this thread as a compliment (excepting those of the "Nate is Obama's BFF lol" flavor). It means that you've managed to attract an intelligent and critical audience.

Anonymous said...

lilnev:

I may be wrong (I haven't run any simulations and I have to run right now so I probably won't) but I think my method (proposed at June 15, 2008 4:00 AM) would keep the same overall volatility of Poblano's current method, but just assign more of that volatility to states similar to those which show the greatest changes in the polling. Benjamin Schak's proposed method (and even the one with the modified correlations and means), on the other hand, actually reduces the volatility by regressing to a mean.

Modeler said...

lilnev,

A couple of points:

-- We are not trying to predict the results of an election held today. We're trying to predict the results of one held in November (see Big Change #1)

-- Using the demographic model to track changes over time adds no new variables to the model. The "time" component is automatically included through the weighting mechanism.

Also, could you explain how mixing state and national data has anything to do with time trends? They are separate problems, especially if you consider the nation to simply be another state (as Nate does).

I agree that the concern on this thread is a good indication of how successful Nate has been, and how much we like his work.

Benjamin Schak said...

Anonymous

Yes, I do assume that the expected value of the signed change of a state's distribution, given no other information, is 0, or "unchanged." I think this is a reasonable assumption -- after all, with no other information, why would one expect a state to move towards or away from Obama.

And yes, by "regressing to the mean," I do mean "regressing to not changing," in that, when one state turns out to be more pro-Obama than previously expected, one's expectation about other state's becomes commensurately more pro-Obama, but not by as much as the state that we know moved.

Let me try to unpack your hypothetical, since it's a compelling question.

Your hypothetical was: Suppose three states (A and B) each increased by 1sd, and another state (X) with 50% correlation to the others wasn't polled. (I increased your 5% to 50% since I believe that correlations between states are rather high, and I changed five states to three for simplicity.) The right thing to do in this case depends on the correlation between A and B. For example, if they are 100% correlated to each other, then the right thing to do is clearly to increase the estimate of X by just 0.50sd, since the extra knowledge about B provides no useful information.

Suppose, though, that A and B have correlation r with each other. Let me introduce some notation. m_A, m_B, m_X, s_A, s_B, s_X are the means and s.d.'s of the three states before taking any new information into account. m' and s' are the means and s.d.'s after taking new knowledge about A into account. m'' and s'' are the means and s.d.'s after taking new knowledge about A and B into account.

After taking information about A into account, the new estimate of A is 1sd above the old estimate, and the new uncertainty about A is 0. That is, m'_A = m_A + s_A, and s'_A = 0. By what I was saying before about regression to the mean, the new estimate of B is m'_B = m_B + r s_B, and the new uncertainty about B is s'_B = s_B * sqrt(1-r^2). The new estimate of X is m'_X = m_X + 0.5 s_X, and the new uncertainty about X is s'_X = s_X * sqrt(1-0.5^2).

Now let's take the information about B into account. We are given that m''_B = m_B + s_B (i.e., B had a 1sd jump in terms of the a priori distribution). Thus,
m''_B = (m'_B - r s_B) + s_B
= m'_B + (1-r) * s_B
= m'_B + (1-r) * s'_B / sqrt(1-r^2)
= m'_B + s'_B * sqrt((1-r)/(1+r)).
In other words, B turns out to be sqrt((1-r)/(1+r)) standard deviations (post-A standard deviations, that is) better than our post-A expectations. The new estimate of X is then 50% * sqrt((1-r)/(1+r)) * s'_X better than m'_X. Hence,
m''_X = m'_X + 0.5 * sqrt((1-r)/(1+r)) * s'_X
= (m_X + 0.5 s_X) + 0.5 * sqrt((1-r)/(1+r)) * s_X / sqrt(1-0.5^2)
= m_X + s_X * (1/2 + sqrt((1-r)/(3+3r))).

This result fits intuition well, and the same result would be achieved if you applied B's result and then A's. When r=1, B provides no extra information from A, so m''_X simply equals m_X + s_X * 50%, as expected. When r=0.5 (a reasonable enough number), B contributes some information but not much, so X rises 61% of a standard deviation. When r=-1, we get nonsense, which is right, since it's impossible for A and B both to rise if they're perfectly negatively correlated.

Another interesting if unrealistic boundary case is when X = A+B and s_A=s_B and r=-0.5 (the minimum possible value for r). In this case, A's and B's anticorrelated movements are expected to cancel, so that it's quite unexpected for X = A+B to rise as much as it does. And indeed, the above formula has X rising by 1.5sd, even though it would only rise by 0.5sd given information about either A or B rising.

All this is implicitly incorporated into the technique I use on my own site (I hate to shill, but here goes). I took assumed correlations from an analysis that got me a formula based on distance between states and racial composition. (I think I got the correlations in a pretty crappy way, but the main point is that I believe they're positive and fairly high, 50-100%.) For the expected values, I essentially use Bush/Kerry as a starting point (as, IIRC, Nate here does), and tweak the expected value as each new poll comes along (and there have been enough polls that the current expected values don't look much like Bush/Kerry any more).

You're right that I fundamentally assume that movements in elections follow a normal distribution, and that this is probably an incorrect assumption. After all, a major scandal might cause a 15sd event. I recognize that this is a flaw, and have been thinking about how to fix it, maybe using one of the platykurtic distributions people sometimes use for the stock market.

I don't think your comment about the test analogy makes sense. I wasn't talking about multiple people taking tests; I was talking about one person talking multiple tests. The two test results for one person are analogous to two states' movements for one election. In the one case, one test result implies something about the same person's other test result. In the other case, one state's result implies something about the other state's result.

I do agree, or at least suspect, that correlations that correlations between states are accounted for mostly by demographics in the states.

BTW, I also have little background in stats, other than some absorption here and there from being in math undergrad and grad.

Benjamin Schak said...

charles pluckhahn:

I agree that the new changes will lead to more volatility. Under the old regime, big movements in the underlying popular opinion got factored into the 538 prediction rather randomly, as new polls appeared for each state. Under the new regime, movements in underlying popular opinion should appear in all states' estimates as soon as a decent handful of polls appear in any states.

I don't see how the new method extrapolated out in time; it seems to extrapolate out in space -- the results of one state applied to another state.

(BTW, I do know all about bond coupons and duration, but didn't quite follow the analogy between them and polls.)

Charles Pluckhahn said...

Benjamin, the point is that the new methodology appears have a 1:1 correlation with current data as before, but rather now takes a combination of current and past data and extrapolates a future result, as in showing Obama ahead in NV when the polls show him behind.

In doing so, it's as if you've combined some coupons into a later payoff, i.e., in figurative terms you've lengthened the duration, which will increase volatility.

I know it's a really imperfect analogy, so let me go straight at it: Will this method introduce more volatility in response to results that do not conform to a projection? Take NV as an example. If the next result (or average of them) is meaningfully different from what the new algorithm expects, will there be a bigger swing in the NV results than under the old formula, and if so, why?

Charles Pluckhahn said...

Let me try that again, 'cause the first paragraph got garble. It seems like the old methodology was closely correlated with current polling data. It seems as if the new methodology incorporates more of a future forecast, and that in doing so, it relies on data not from the state in question but on national numbers and (?) states thought to be paired or otherwise linked to each other.

In doing so, it's as if you've combined some coupons into a later payoff, i.e., in figurative terms you've lengthened the duration, which will increase volatility.

I know it's a really imperfect analogy, so let me go a different way at it: Will this method introduce more volatility in response to results that do not conform to an existing projection?

Take NV as an example. If the next result (or average of them) either in NV or a combo of NV and the exogenous variables not previously included in the algorithm, is meaningfully different from what the new algorithm expected when it produced that optimistic reading ON NV even though the polls show Obama 2 points behind, will there be a bigger swing in the NV results than there would have been under the old formula?

In the last couple days, we've seen Obama's post-Hillary Concession bounce starting to wear off. Am I more likely than before to be waking up to a really gloomy-looking picture in, say, a week?

Charles Pluckhahn said...

Arghh! I should have read your original response more thoroughly, so feel no need to answer my last attempt at a posting. You already did. Sorry about that.

homunq said...

This is a great methodology, and I endorse the change - with one, significant reservation. As Benjamin Shak says (and as I thought BEFORE I skimmed this comment thread) you cannot possibly assume that your average state is correlated to the national movement with a regression coefficient of 1. The easiest way to fix this would be to just multiply your national movement by a single, global regression coefficient taken for the average state. I am sure you could figure out some way to measure this correlation; in fact, your model already implicitly has such a correlation, it is the ratio between the national error standard deviation and the state-by-state error standard deviation when you do your simulation runs.

(Most of the suggestions in this thread of how to use demographics would also need to be scaled back by such a correlation factor, although correcting for demographics is a way to get a higher correlation it is still not 1.)

Please - you already had by far the best methodology, and, with this fix, your time-based correction would be making it even better. Do not let this one mistake ruin your numbers. Even if you do not believe me that the math requires you to fix this, just apply your corrections at 50% strength. Besides being mathematically more correct, it is the conservative and sensible thing to do from a non-mathematical standpoint.

homunq said...

Also, I know nothing about the math of your LOESS regression, but looking at the graph it certainly appears to lose about half of its smoothing power for the points at the edge, which have half the neighbors. This is particularly worrisome for your use of the methodology, because the current week (recent edge of history) is used in every single correction factor. Perhaps you could just add one "next week" point, using this week, before you did your regression, in order to be a bit more conservative. Or even more conservative - you could move "next week" one this-week-standard-error (for total this week effective sample size - should be around 1%) in the direction of last week, ie, always assume that this week is an outlier until you get evidence to the contrary.

But these suggestions are just tinkering really. The suggestion in my last post is much more important, serious, and, in my opinion, mathematically justified.

lilnev said...

Re: Predicting now vs. predicting November. Consider AK. It's currently McCain +4.1. Presumably that's our best guess of what a poll taken today would show. It's also our best guess for the November outcome. Obama's win chance there is listed as 33.5%. I'm not sure what this number is supposed to mean. Is it the chance that Obama would win an election today (based on sample sizes and PIEs), or the chance that he will win in Nov (based on sqrt of # of days)?

At Anonymous 4:00 AM -- fitting a 10-dimensional hyperplane in demographic space requires adding 10 parameters. I'm afraid of overfitting.

At Modeller -- It sounds like you want to use the time series of regression coefficients as they were determined by the old method on previous dates, yes? But the coefficients determined on, say, Apr 30 would be only partly based on April's data, and partly on March and Feb (though discounted). One of those coefficients would be the "constant" term, reflecting the national margin (more technically, the margin at the demographic mean of the polls included -- the nation hasn't been uniformly sampled). Hmm, not sure where this is going; needs more thought.

Actually, it would be nice just to see what the history of the regression coefficients is, to get a sense for how stable the regression model is from week to week. Are some variables jumping in and out of the model, or are weights swinging wildly? In general, regressions onto many variables can be tricky, especially when some of them are strongly correlated. Very different looking models can sometimes produce nearly equally good fits. I wouldn't be surprised if the first few factors (the ones that capture the most variance) are stable but the last few jump around quite a bit.

'Nuff for now. G'night y'all.

Anonymous said...

I understand numbers, and statistics to a point, that is what made me intrigued by this site. But to be fair I am a political animal, not a statistician.

I said earlier and I have to reiterate, you numbers are only as good as your understanding of the events that create them. It was Tip O'Neill, a likely Obama supporter were he still alive, who said "all politics are local." I just can't support a statistical analysis which makes an assumption that a bounce in one state will be reflected in any other state (even a bordering state) with any degree of certainty. That just isn't plausible and certainly not supported by any statistics I know of.

I'd join those who are asking you to prove that theory using past polling, or drop this methodology change.

And, for the record, my dislike of the new methodology has nothing to do with whom I support.

(oh, and i do think you need to include third party candidates. While they will have no impact nationally they most certainly may have an impact on one or two states..enough to possibly make a difference.)

Anonymous said...

Hi Poblano,

I think it's a mistake to assume that all states or all demographic groups will evolve with time uniformly. It's not difficult to image cases in which demographic groups may diverge and only track national opinion in the average. Assuming this may well lead to inaccurate predictions in states with "atypical" demographics.

What you're doing is certainly an improvement, but it's 0th order when what you should be shooting for is 1st order.

Dan

Anonymous said...

Benjamin Schak:

Thanks very much for your thorough explanation of the mathematics. I have never had the opportunity to take a class on that material so it explains a lot. I think I appreciate what you are getting at now even though I'm not fully convinced that leaving the means static is the best that we can do. My issue with the test analogy wasn't that different people would be taking the tests when matching it up to states in the elctoral map. My issue was that the mean you would be regressing toward when you predict the person's score on the second test would be predicted by the average scores of a huge population taking the second test, which hypothetically doesn't change rapidly. But, what if we had nation-wide studies showing that the average change in the mean scores for the two tests went up 3% this week. Would you still use the mean from last week for predicting what the person's score on the test would be this week? I'm unintentionally still mixing up the analogy here because I want to focus on projecting the changes in the means between tests or states and not necessarily changes over time, but I'm having trouble explaining it exactly and I hope this is clearer.

Also, thanks for the link to your blog. I think it is worthwhile to have multiple models being run by different people so I'll probably be checking it from time to time.

lilnev: Thanks for the warning about over-fitting. I always forget to avoid that in modelling, but in my line of work I rarely have to take statistics into account so it often works out anyway. I guess if we had the amount of polling data that would justify 10 parameters then we wouldn't have any states that needed projecting anyway.

APoxOnBoth said...

@Anon 9:37

I just can't support a statistical analysis which makes an assumption that a bounce in one state will be reflected in any other state (even a bordering state) with any degree of certainty. That just isn't plausible and certainly not supported by any statistics I know of.

Okay, let me see if I can explain this: He's trying to account for the "dogs that aren't barking", filling in the gaps for states that haven't been polled based on the data he does have, national and neighboring state polls.

Let's take an example: Connecticut. Some places are rating it as a "tossup" state because the most recent public poll gave Obama only a 3-point edge there. That makes no sense, not only is the poll extremely out of band with the earlier (but much older) polls for that state, and Connecticut is normally a very "blue" state, but since that poll was taken he's been improving everywhere that has been polled, as well as nationally. That data can't have a direct correspondence with what we don't have, a recent poll for CT, but it has to have some relationship to it.

Another way to look at it is that he's now making a projection of what a good poll for a given state would look like right *now*, and using that for his model. And every time a new poll comes out, he's got a chance to check the projection, as just happened in NV and AR, where he was very close, well within margin of error.

Anonymous said...

I have a funnel. I drop marbles thru it onto target. I decide to adjust the aim of the target: if the last drop was a a little south of the target, I move the funnel a little north. If it was a west of the target, I move it east.

Results: worse accuracy.

jdk said...

http://election.princeton.edu/

What lessons can we learn?

信次 said...

情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,美國aneros,rudeboy,英國rudeboy,英國Rocksoff,德國Fun Factory,Fun Factory,英國甜筒造型按摩座,甜筒造型按摩座,英國Rock Chic ,瑞典 Lelo ,英國Emotional Bliss,英國 E.B,荷蘭 Natural Contours,荷蘭 N C,美國 OhMiBod,美國 OMB,Naughti Nano ,音樂按摩棒,ipod按摩棒,美國 The Screaming O,美國TSO,美國TOPCO,美國Doc Johnson,美國CA Exotic,美國CEN,,矽膠按摩棒,猛男倒模,真人倒模,仿真倒模,PJUR,Zestra,適趣液,穿戴套具,日本NPG,雙頭龍,FANCARNAL,日本NIPPORI,日本GEL,日本Aqua Style,美國WET,費洛蒙,費洛蒙香水,仿真名器,av女優,打炮,做愛,性愛,口交,吹喇叭,肛交,成人用品網,情趣用品討論,成人購物網,鎖精套,鎖精環,持久環,持久套,拉珠,逼真按摩棒,名器,超名器,逼真老二,電動自慰,自慰,打手槍,仿真女郎,SM道具,SM,性感內褲,仿真按摩棒,pornograph,hunter系列,h動畫,成人動畫,成人卡通,近親相姦,顏射,盜攝,偷拍,本土自拍,素人自拍,公園露出,街道露出,野外露出,誘姦,迷姦,輪姦,凌辱,痴漢,痴女,素人娘,中出,巨乳,調教,潮吹,av,a片,成人影片,成人影音,線上影片,色情影音,色情光碟,線上A片,免費A片,A片下載,成人電影,色情電影,TOKYO HOT,SKY ANGEL,一本道,SOD,S1,ALICE JAPAN,皇冠系列,老虎系列,東京熱,亞熱,武士系列,新潮館,情趣用品,情趣,情趣商品,情趣網站,跳蛋,按摩棒,充氣娃娃,自慰套,G點,性感內衣,情趣內衣,角色扮演,生日禮物,生日精品,自慰,打手槍,潮吹,高潮,後庭,情色論譠,影片下載,遊戲下載,手機鈴聲,音樂下載,開獎號碼,統一發票號碼,夜市,統一發票對獎,保險套,做愛,減肥,美容,瘦身,當舖,軟體下載,汽車,機車,手機,來電答鈴,週年慶,美食,徵信社,網頁設計,網站設計,室內設計,靈異照片,同志,聊天室,運動彩券,大樂透,威力彩,搬家公司,除蟲,偷拍,自拍,無名破解,av女優,小說,民宿,大樂透開獎號碼,大樂透中獎號碼,威力彩開獎號碼,討論區,痴漢,懷孕,美女交友,交友,日本av,日本,機票,香水,股市,股市行情, 股市分析,租房子,成人影片,免費影片,醫學美容,免費算命,算命,姓名配對,姓名學,姓名學免費,遊戲,好玩遊戲,好玩遊戲區,線上遊戲,新遊戲,漫畫,線上漫畫,動畫,成人圖片,桌布,桌布下載,電視節目表,線上電視,線上a片,線上掃毒,線上翻譯,購物車,身分證製造機,身分證產生器,手機,二手車,中古車,法拍屋,歌詞,音樂,音樂網,火車,房屋,情趣用品,情趣,情趣商品,情趣網站,跳蛋,按摩棒,充氣娃娃,自慰套, G點,性感內衣,情趣內衣,角色扮演,生日禮物,精品,禮品,自慰,打手槍,潮吹,高潮,後庭,情色論譠,影片下載,遊戲下載,手機鈴聲,音樂下載,開獎號碼,統一發票,夜市,保險套,做愛,減肥,美容,瘦身,當舖,軟體下載,汽車,機車,手機,來電答鈴,週年慶,美食,徵信社,網頁設計,網站設計,室內設計,靈異照片,同志,聊天室,運動彩券,,大樂透,威力彩,搬家公司,除蟲,偷拍,自拍,無名破解, av女優,小說,民宿,大樂透開獎號碼,大樂透中獎號碼,威力彩開獎號碼,討論區,痴漢,懷孕,美女交友,交友,日本av ,日本,機票,香水,股市,股市行情,股市分析,租房子,成人影片,免費影片,醫學美容,免費算命,算命,姓名配對,姓名學,姓名學免費,遊戲,好玩遊戲,好玩遊戲區,線上遊戲,新遊戲,漫畫,線上漫畫,動畫,成人圖片,桌布,桌布下載,電視節目表,線上電視,線上a片,線上a片,線上翻譯,購物車,身分證製造機,身分證產生器,手機,二手車,中古車,法拍屋,歌詞,音樂,音樂網,借錢,房屋,街頭籃球,找工作,旅行社,六合彩,整型論壇,整型論壇,珠海,雷射溶脂,婚紗,網頁設計,水噹噹論壇,台中隆鼻,果凍隆乳,改運整型,自體脂肪移植,新娘造型,婚禮顧問,下川島,常平,常平,珠海,澳門機票,香港機票,貸款,貸款,信用貸款,宜蘭民宿,花蓮民宿,未婚聯誼,網路購物,婚友,婚友社,未婚聯誼,交友,婚友,婚友社,單身聯誼,未婚聯誼,未婚聯誼, 婚友社,婚友,婚友社,單身聯誼,婚友,未婚聯誼,婚友社,未婚聯誼,單身聯誼,單身聯誼,白蟻,白蟻,除蟲,老鼠,減肥,減肥,在家工作,在家工作,婚友,單身聯誼,未婚聯誼,婚友,交友,交友,婚友社,婚友社,婚友社,大陸新娘,大陸新娘,越南新娘,越南新娘,外籍新娘,外籍新娘,台中坐月子中心,搬家公司,搬家公司,中和搬家,台北搬家,板橋搬家,新店搬家,線上客服,網頁設計,線上客服,網頁設計,植牙,關鍵字,關鍵字,seo,seo,網路排名,自然排序,網路排名軟體,交友,越南新娘,婚友社,外籍新娘,大陸新娘,越南新娘,交友,外籍新娘,視訊聊天,大陸新娘,婚友社,婚友,越南新娘,大陸新娘,越南新娘,視訊交友,外籍新娘,網路排名,網路排名軟體,網站排名優化大師,關鍵字排名大師,網站排名seo大師,關鍵字行銷專家,關鍵字,seo,關鍵字行銷,網頁排序,網頁排名,關鍵字大師,seo大,自然排名,網站排序,網路行銷創業,汽車借款,汽車借錢,汽車貸款,汽車貸款,拉皮,抽脂,近視雷射,隆乳,隆鼻,變性,雙眼皮,眼袋,牙齒,下巴,植牙,人工植牙,植髮,雷射美容,膠原蛋白,皮膚科,醫學美容,玻尿酸,肉毒桿菌,微晶瓷,電波拉皮,脈衝光,關鍵字,關鍵字,seo,seo,網路排名,自然排序,網路排名軟體,英語演講,托福,Toastmaster,

酒店上班請找艾葳 said...

艾葳酒店經紀公司提供專業的酒店經紀, 酒店上班小姐,八大行業,酒店兼職,傳播妹,或者想要到打工兼差打工,兼差,或者八大行業,酒店兼職,想去酒店上班, 日式酒店,制服酒店,ktv酒店,禮服店,整天穿得水水漂漂的,還是想去制服店上班小姐,水水們如果想要擁有打工工作、晚上兼差工作兼差打工假日兼職兼職工作酒店兼差兼差打工兼差日領工作晚上兼差工作酒店工作酒店上班酒店打工兼職兼差兼差工作酒店上班等,想了解酒店相關工作特種行業內容,想兼職工作日領假日兼職兼差打工、或晚班兼職想擁有快速賺錢又有保障的工作嗎???又可以現領請找專業又有保障的艾葳酒店經紀公司!

艾葳酒店經紀是合法的公司工作環境高雅時尚,無業績壓力,無脫秀無喝酒壓力,高層次會員制客源,工作輕鬆,可日領現領
一般的酒店經紀只會在水水們第一次上班和領薪水時出現而已,對水水們的上班安全一點保障都沒有!艾葳酒店經紀公司的水水們上班時全程媽咪作陪,不需擔心!只提供最優質的酒店上班,酒店上班,酒店打工環境、上班條件給水水們。心動嗎!? 趕快來填寫你的酒店上班履歷表

水水們妳有缺現領、有兼職缺錢卡奴的煩腦嗎?想到日本留學缺錢嗎?妳是傳播妹??想要擁有高時薪又輕鬆的夜間兼職工作,打工機會和,假日打工,假日兼職賺錢的機會嗎??想實現夢想卻又缺錢沒錢嗎!??
艾葳酒店台北酒店經紀招兵買馬!!徵專業的酒店打工,想要去酒店的水水,想要短期日領,酒店日領,禮服酒店,制服店,酒店經紀,ktv酒店,便服店,酒店工作,禮服店,酒店小姐,酒店經紀人,
等相關服務 幫您快速的實現您的夢想~!!

freefun0616 said...

酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店經紀,
酒店打工經紀,
制服酒店工作,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
酒店經紀,

,