As I've hinted a couple of times, we now have a new version of the model running, which attempts to account for the interrelationships between polling movement in different states. Before I can work up the energy to fully describe that, let me first tell you about the new Tipping-Point States metric that I've developed to accompany it.
A Tipping-Point state is now defined as a states that would be most likely to alter the outcome of a close election if it were decided differently. More specifically, a Tipping-Point State is among the closest states –- taken alone or in combination –- that would give the losing candidate at least 270 electoral votes if transferred to him from the winner’s column, with no wasted electoral votes.
Let me give you a couple of examples. First, 2004, which is an easy one. The closest states won by Bush were New Mexico and Iowa. However, these would not have given John Kerry enough electoral votes even if he had won them. So the third-closest state, Ohio, was the lone Tipping-Point State for that election, since it would have gotten Kerry to 270 all on its own.
Now a somewhat more complicated example: 1960. Richard Nixon was 51 electoral votes shy of winning that election. The closest Kennedy states were as follows:Hawaii (3) -0.06%So, we start with a 51-EV gap and begin whittling those numbers down for Nixon. Giving him Hawaii makes it 48, Illinois makes it 21, Missouri cuts Kennedy's lead to 8, and New Mexico to 4. And then we hit New Jersey, which gives Nixon the election. But it also gives him 12 extra electoral votes.
Illinois (27) -0.19%
Missouri (13) -0.52%
New Mexico (4) -0.74%
New Jersey (16) -0.80%
So we go back through the list in reverse order and see if there is any wastage. In this case, there is. If we place New Mexico back in the Kennedy column, Nixon still has 8 electoral votes to spare. In fact, while we must keep Missouri and Illinois, we can also eliminate Hawaii. So the Tipping-Point states for Nixon in 1960 were Illinois, Missouri, and New Jersey. This was the most efficient possible combination of states that would give him a winning electoral margin.
But I tried to slip one by you there. What do I mean by a "close election"? I mean one in which the electoral math matters. That was the purpose of the graph that I posted this morning:
The Tipping-Point calculation is weighted across each simulation based on the popular vote result predicted in each election. Specifically, it is weighted according to the probability that a candidate with that popular vote share will lose the Electoral College. So a simulation in which the popular vote is divided exactly evenly will be weighted at .5 -- the highest possible weighting. A simulation in which the popular vote margin is 3 points -- that gives the popular vote leader about a 97 percent chance of winning the election -- will be weighted at a .03. Basically, most of the calculation is derived from elections that are decided by a point or two.
So this definition is rather complicated mathematically -- but at the same time, I think it's more intuitive than the previous version. It's definitely a lot more robust.
The hot-off-the-presses Tipping Point numbers are as follows:
Michigan and Ohio will each prove to be decisive in a close election about 30 percent of the time. After that are Colorado and Virginia, which serve as gateways to their respective regions.
The really interesting thing is to compare the Tipping Point states with and without the intrastate (or should that be interstate?) correlations. A state like North Carolina is punished, for instance, for reasons that most of you can probably figure out. But we'll save that for later.
Thursday, July 10, 2008
Tipping Point v2.0
-- Nate at 11:53 AM 48 Comments...
Labels: electoral math, meta, methodology, site
Wednesday, July 9, 2008
Leaners
Some pollsters -- Rasmussen has recently gotten into this habit -- list separate polling results with and without "leaners" included. Where this occurs, our policy will be to include leaners in the tallies.
Why? Because most polls push undecided voters to some degree or another -- such as by asking a follow-up question or not providing the 'undecided' option for them -- meaning that their results already implicitly include leaners. Most of the time, you'll have no idea that this process occurred; the topline results have the leaners rolled in to them. Other times, as in Rasmussen's case, you'll have the data broken out for you with and without leaners included.
But I've never seen a case where a pollster pushed leaners and then didn't include them in their press release. You'll either get the "with leaners" version in the topline numbers, or you'll get both versions, but you'll almost never get only the "without leaners" version and then have to sort through the cross-tabs for the leaner data.
In other words, listing the results with leaners -- however the pollster chooses to define them -- is generally the default industry standard. It will therefore be our standard as well.
-- Nate at 2:20 PM 57 Comments...
Labels: methodology, site
Quibble
Very interesting read from Justin Wolfers at the New York Times' Freakonomics blog, who cites research from friend-of-538 Robert Erikson which suggests that polls tend to understate the performance of the incumbent. Political scientists Robert Erikson (of Columbia) and Christopher Wlezien (of Temple) have recently mined daily polling reports from the last half-century of elections, mapping the relationship between early polling numbers and final election returns. At this point in the race, they find that around half of any lead should be discounted, as early advantages tend to dissipate. (You can read the full paper here, or an ungated version here.)
I just looked at this too. Although I doubt that my methodology is anywhere near as robust, I found the same thing that Erikson did: you would need to discount a polling lead more heavily for an incumbent than for a challenger.
Profs. Erikson and Wlezien point to another reason to be wary of Sen. Obama’s early polling lead: On average, the voting public tends to be more strongly anti-incumbent three-and-a-half years into an administration than they are on Election Day. Based on patterns in previous cycles, the professors suggest that this exaggerated anti-incumbent feeling is boosting Sen. Obama’s lead by around three percentage points.
So if you first halve Obama’s six point lead, then subtract a three point anti-incumbency bias, you are left with a dead heat.
The problem is that while McCain comes from the incumbent party, McCain himself is not an incumbent. He is not even a pseudo-incumbent (a.k.a. the sitting Vice President), the first time this situation has occurred since 1952.
Things start to get fairly complicated if you look at the interaction effects between a candidate's lead in the polling, the number of days until the election, and the incumbency, especially since there are different degrees of incumbency, and also interaction effects and various other sorts of non-linearity between all these different variables.
But from what best I can tell, the incumbent effect that Wolfers and Erikson have identified is smaller when we're dealing with the incumbent party rather than an actual, incumbent President -- probably more like one point rather than three at this stage of the cycle. And in '52, when you had neither an incumbent President nor a sitting Vice President running, the large lead that Dwight Eisenhower had in the polling held up quite well.
My general philosophy behind my modeling is to make everything "candidate-neutral". The model knows that there are two candidates and that they have certain polling numbers, but it doesn't know who is a Republican or who is a Democrat, or who is an incumbent and who isn't. So I won't say "the polling numbers will move toward Candidate X", although I do say "the polling numbers are likely to move toward whichever candidate happens to be trailing".
If I did look at that stuff, the numbers might be moved a couple points in McCain's direction, because of this incumbent issue I just described and also because there has been some tendency for the polling to overstate the performance of the Democrat. On the other hand, if one starts to consider candidate-related variables, there are another whole set of 'meta' variables that one might want to evaluate too, such as the condition of the economy, the presence of a war, incumbent approval ratings, and party registration figures. Most of those factors would tend to favor Mr. Obama.
Making things candidate neutral is partially a marketing decision -- I can't imagine how much yelling and screaming there would be if I said "let's give McCain 2 bonus points because he's a Republican" or "let's give Obama 2 bonus points because the economy stinks". But also, the set of past polling data is not that robust -- about 14 elections that were polled scientifically, in only the last several of which did you have multiple agencies releasing polling data at regular intervals. If you want to look at something like "Democrat challenging Republican quasi-incumbent in wartime with crappy economy", your dataset gets down to zero pretty quickly.
-- Nate at 11:42 AM 47 Comments...
Labels: meta, methodology
Zogby Interactive Data Dump
Zogby Interactive has polling detail out in 34 states.
Zogby's regular polls rate as middle-of-the-pack and get beat up upon a little more than they probably should. His interactive polls, however, are not so good. We give them most skeptical treatment possible by not regressing Zogby Interactive's results to the mean when computing their pollster rating. But they will be included, weighted approximately one-quarter as much as regular polls. Some of his states have huge sample sizes, but as you can see from the chart above, our ratings have figured out that big sample sizes can't cover for questionable methodology.
For what it's worth, I'm actually more sympathetic than you might think to the idea of Internet-based polling. Something like 20 percent of Americans are cellphone-only now, and many others use all sorts of call screening technology that pretty much ensures that they will never be contacted by a pollster. These problems are going to get worse before they get better, and Internet-based polling may eventually need to be part of the solution.
But I don't know. There are sure are some strange results in here, such as Bob Barr's numbers being fairly high in many states, as well as Obama's performance over parts of the South, where Internet usage rates tend to be lower, thereby making a quasi-representative sample more difficult to obtain.
-- Nate at 7:55 AM 111 Comments...
Labels: methodology, pollsters, zogby
Monday, July 7, 2008
Senate Polling Weekly Update, 7/7

There's been very little in the way of Senate polling this week (Presidential polling, as you may have noticed, also slowed to a crawl over the weekend). But the handful of exceptions are as follows:
- In Louisiana, a Southern Media & Opinion Research poll shows Mary Landrieu's lead tightening to 6 points, down from 12 points in early April. We still think the fundamentals of this race look fairly good for Landrieu: she remains relatively popular, and is running against a largely undefined opponent with marginal fundraising numbers.
- Public Policy Polling has polled North Carolina again, and shows Elizabeth Dole expanding her lead to 14 points, up from 8 points last month. It looks like Kay Hagan might have gotten a bit of a bounce surrounding her primary victory which is now receding, and the Democrats are in danger of seeing this one downgraded to the Kansas/Idaho tier of pickup opportunities.
- Rhode Island has been polled for the first time, with Rasmussen showing incumbent Jack Reed with a huge 72-20 over Bob Tingle, who is a Pit Boss at Foxwoods Casino. Tingle would probably be a pretty fun guy to get a beer with, but this race will not be competitive.
And that's pretty much it. There are also Rasmussen numbers out in states like Massachusetts, Alabama and Georgia, but nothing that would lead us to change our characterization of those races as being non-competitive.
There are, however, a couple of small methodological refinements. Per the rule that we instituted on the Presidential side of things last week, the most recent poll of a state is now guaranteed a minimum weighting of 0.25. This brings a November Idaho poll into play. There was also an internal Idaho poll released last week by Democratic candidate Larry LaRocco that showed very similar numbers, but we do not include it because it violates our ban on internal polls.
Speaking of which, I have delisted the three Public Opinion Strategies that were included before, which had been conducted on behalf of the U.S. Chamber of Commerce. Our rule will be to exclude any polls conducted on behalf of (i) candidates for office, or (ii) registered Political Action Committees (which things like the DNC and USCC are, but something like Daily Kos isn't) unless they are conducted with a bipartisan partner or the organization has a long and demonstrable history of reporting all its polling to the public record.
Finally, a state's partisan ID index (based on 2004 exit polling) is now showing up as statistically significant and has been added to the regression model.
Overall, we now project the Democrats to hold 55.5 seats after the election (not counting Sanders and Lieberman), down slightly from 55.7 last week.
-- Nate at 10:01 PM 16 Comments...
Labels: methodology, senate, senate polls
Friday, July 4, 2008
No Fireworks, But A Few Small Changes
This is the closest thing I've taken to a day off in quite some time, but briefly:
As I hinted yesterday, the amount of variance in each simulation run now differs from state to state. There are actually two different components to this. The first is how responsive a state is to national trends. We had already figured out a way how to estimate this. However, we now apply it specifically to each simulation. For example, let's say that New Hampshire polls move 120 percent as much as national polls. If in simulation #3,268, Obama's national trend has moved downward by 5 points, New Hampshire's polls will move 120 percent that much, or down 6 points. If in simulation #7,008, Obama's national trend has moved upward by 10 points, New Hampshire's polls will move up by12 points. It's as simple (or as complicated) as that.
Separately, however, there is also a question of how much variation there is within a given state's polling, period. You could have a state where the polls are relatively uncorrelated with national trends, but where the polls nevertheless seem to fluctuate wildly, marching somewhat to their own drummer. The way we account for this type of variance is to take the standard deviation across all polls conducted in a state after having stripped out the national trendline. Then, we run our demographic regression against these standard deviations to see whether anything systematic seems to be driving the amount of volatility in a given state's polling. It turns out that there are a few such things: variance tends to be lower in states with large numbers of African-Americans, for instance, but higher in states with large numbers of elderly voters.
The most important implication of this is that the polls are liable to be more stubborn in the Deep South (excluding Florida) than they are elsewhere in the country. So even though Obama has whittled down McCain's lead to the single digits in states like Georgia, Mississippi, and South Carolina, those are going to be tough points for him to make up. On the other hand, the polls have been quite volatile in Appalachia, where you have a lot of conflicted, downscale voters who are not particularly fond of either of these candidates. So, even though we show McCain with the same 8-point lead in both Georgia and West Virginia, our model gives him a 91 percent chance of hanging on to win Georgia, but just a 75 percent chance of winning West Virginia.
You will notice, by the way, that this second adjustment doesn't distinguish between cases where the polls vary a lot over time, and cases where they vary a lot at the *same* time (as they do in Florida right now, for instance). That's perfectly okay, because both things increase our degree of uncertainty about exactly what's going to happen in November.
Lastly, I have swapped out a couple of variables in the 538 regression analysis. The 'partisan' variable has been replaced by a liberal-conservative Likert scale for each state drawn from 2004 exit polling. This seems to provide slightly more unique information to the model than the partisan ID index, particularly as partisan identification tends to change more quickly over time than one's political philosophy. I have also added a variable for Hillary Clinton's performance in the primaries (the results from caucus states are adjusted). Yes, all else being equal, Obama does worse where Hillary had done better.
p.s. Happy 232nd birthday, America!
-- Nate at 8:46 PM 43 Comments...
Labels: methodology, site
Thursday, July 3, 2008
Speaking of North Dakota...
...it's one of several states that haven't been polled in months and where the highest-ranked poll is in danger of dropping below the 0.05 weighting threshold and entirely out of our averages. The other states in this group are Delaware, Hawaii, Idaho, Illinois, Maryland, North Dakota, South Dakota and Vermont. Rhode Island would have been too, but is getting new numbers today.
From among this group, the most annoying omission is North Dakota, which Obama is visiting today and may actually hope to compete in. For good measure, it's probably also worth polling South Dakota. As to the rest of these states -- it's fairly obvious in which direction they're going to go, although Maryland is very bourgie and would therefore be helpful for calibrating demographics, and it's surprising that Illinois, which has the fifth-largest population in the country, has been polled so little when California and New York have been polled so much.
For the time being, however, what we're going to do is establish a requirement that the highest-rated poll in each state will be assigned a minimum weight of 0.25, with any other polls in that state calibrated to that number. This is admittedly a little ad-hoc, but all it's doing is affecting the extent to which we weight the polling as opposed to the regression, which was an ad-hoc decision to begin with. It doesn't feel right to completely ignore polling that has taken place in a state when that polling is the best thing we have, even if that polling is a little (or a lot) out of date.
This precedent will also be applied to our Senate polling numbers, where it is somewhat more consequential, as some Senate races are polled quite rarely.
-- Nate at 7:47 AM 78 Comments...
Labels: methodology, site
Wednesday, July 2, 2008
A Small Change
A relatively low-impact and presumably noncontroversial change to our model. Up until now, we have been estimating the national popular vote by taking our projected margin in each state, and multiplying it by turnout figures from 2004.
Henceforth, we will be accounting for population growth, by using the current Voting Eligible Population estimates determined by George Mason University, and assuming that the same proportion of the eligible population will turn out in each state as did in 2004.
This is by no means the fanciest assumption we could make. There seems to be a pretty clear relationship, for instance, between how close a state is, and what turnout winds up being. We'll likely see much better turnout in Indiana this year, for example, since whether or not the state remains a dead-heat in the polling, it will certainly be much closer than it was in 2004.
But whereas those things are somewhat speculative, the basic demographic reality on the ground is not. States like Texas and Arizona have gained substantially in population since 2004, whereas Louisiana has lost it as a result of Hurricane Katrina. It is straightforward to account for these things.
As most of the population growth is concentrated in red states, this change winds up boosting John McCain's projected share of the popular vote by about four-tenths of a percentage point. However, the way the model is designed, it should not really change the projected electoral vote much at all. It does though reinforce the idea that John McCain is more likely to win the popular vote and lose the election rather than the other way around.
-- Nate at 8:53 AM 68 Comments...
Labels: methodology, popular vote, site, turnout models
Tuesday, July 1, 2008
Senate Projections Methodology Primer, Part I
This entry will focus on the methodology we use to generate our regressi
48 comments
Post a Comment