Monday, April 21, 2008

It's the turnout, stupid! [UPDATED]

Note: The below was written before the latest SurveyUSA poll came out, which is now showing more consistency with other polls of the state.

I think I've finally gotten it figured out.

Why have we seen such wildly disparate results in the Pennsylvania polling? It may all have to do -- as it so often does -- with likely voter models.

The key to unraveling all of this is the Franklin & Marshall poll, which is the only poll that I am aware of that published separate results for likely and registered voters. In F&M's most recent poll, Clinton led Obama by 10 points among registered voters, but just 6 points among likely voters. In their March poll, Clinton led by 22 points among registered voters, but 16 points among likely voters. So, there's roughly a 5-point gap between the likely voter and registered voter numbers, which is relatively large insofar as these things go.

SurveyUSA, the pollster that has shown the most favorable results so far for Clinton, is notorious for using a very lax likely voter screen. And we can see this in their Pennsylvania results as well. In its latest survey, SurveyUSA reported that, of the 1401 registered adults that it contacted, 638 (45.5%) were likely to vote in the Democratic primary. For comparison's sake, 50.4% of Pennsylvanians who are registered to vote are registered as Democrats, according to the very latest figures from the PA Secretary of State.

So SurveyUSA has 45.5% of registered adults voting ... out of only 50.4% who theoretically could vote, since this is a closed primary. In other words, their model assumes turnout among registered Democrats to be more than 90%! This assumption, taken in a vacuum, is almost certainly wrong. It would imply turnout of about 3.8 million of the state's 4.2 million registered Democrats. For comparison's sake, Ohio had turnout of 2.2 million -- in an election in which independents and Republicans were eligible to vote. For that matter, there were only 2.9 million Kerry voters in Pennsylvania in 2004.

What is a more realistic assumption for turnout? Pennsylvania is a closed primary, which will take a big chunk out of turnout vis-a-vis Ohio. On the other hand -- the level of campaign activity in the state has been very intense, more so than even Ohio. The latest version of my turnout model projects turnout at 2.09 million -- almost exactly half of Pennsylvania's registered Democrats -- with a 95% confidence interval of 1.83 million to 2.35 million.

Franklin & Marshall assumes that roughly 67% of registered Democrats are likely voters, which would imply turnout of about 2.8 million if all its likely voters voted. This is a lot closer to my estimate -- a little high, indeed, but a pollster probably should fudge upward, both because "likely voter" does not mean "certain voter", and because there is probably some non-response bias toward likely voters.

Apart from SurveyUSA and Franklin & Marshall, I can't find any other polls that have disclosed this level of detail about their turnout assumptions. But the conventional wisdom is that apart from SurveyUSA, which eschews likely voter models as a matter of philosophy, the other robopollsters like Rasmussen and PPP tend to have tightish voter screens, as the cost of making additional calls is cheapest for them at the margins. Those are indeed the pollsters with the most favorable results for Barack Obama.

However, that is all assuming that likely voter models work -- which is a big assumption. If you guess wrong at who the likely voters might be, you might very well be better off not having a likely voter model at all. SurveyUSA, certainly, has had a great deal of success throughout the primaries with its very lax (almost non-existent) likely voter screen.

In this case, however, I am somewhat more inclined to trust the pollsters that are applying tighter likely voter screens, the main reason being the nature of the undecided vote in the state. Namely, while it appears to me that undecided voters are indeed leaning toward Clinton -- it may be that what they're really leaning toward is not voting at all. For example, Mason-Dixon finds that 17% of gun owners are undecided, as compared to just 3% of non- gun owners, and that 11% of voters in the "T" (rural and small-town Pennsylvania) are undecided, as compared to 6% of voters in Southeastern Pennsylvania (e.g. Philadelphia). In other words, when you do include these less-certain, largely rural and blue-collar voters -- a lot of them are not ready to make a decision. That suggests that you perhaps should not have included them in the first place.

If Obama is to stay within a few points of Clinton on Tuesday, what he'll need is for a lot of those unlikely/undecided voters in the central portion of the state to decide they're fed up with the whole thing and not vote. So, Obama should probably be rooting for low turnout overall. For Obama to actually win on Tuesday -- not just stay close -- he will probably also need high turnout in Philadelphia, and maybe among a couple of other select groups like newly-registered voters (who favor Obama 3:2 according to Franklin & Marshall) and students.

I don't think this scenario is entirely off the table -- but Obama also does not control his own destiny. He both has to win his enthusiasm/GOTV game, and have Clinton lose hers -- and if anything, the latter is probably more important than the former. If Obama were to win on Tuesday, the headlines would probably be that (i) the negative tone of the campaign depressed turnout outside of Philadelphia, and/or that (ii) the Clinton ground game was compromised by financial problems and internal dissent.

Additional Thought: One of the things I suppose I am suggesting is that -- it may have been a net positive for Obama, all else being equal, for the tone of the campaign to have been negative. The largest single variable in this primary is probably Clinton's ability to turn out voters outside of the major metropolitan areas -- including people who are not used to voting in primaries, since Pennsylvania has not had an important Presidential primary since 1984. Attacking Barack Obama might not be particularly helpful to her if these voters are really making a decision between voting for Clinton, and not voting at all.

UPDATE: Also, the relationship between Clinton support and undecided voters has now completely broken down.



Trendlines for all agencies that released data both before and after last Wednesday's debate:



10 comments

unertl said...

Are the 1401 voters contacted by SUSA registered Democrats or just registered in general? You interpreted it as the latter but if it is the former, then 45.5% turnout seems pretty likely to me.

538/poblano said...

Just registered period, is my reading of it.

Ben said...

Great blog! Just reading it makes me feel smarter.

What do you make of the latest PPP poll:
http://www.publicpolicypolling.com/pdf/PPP_Release_042108.pdf
Statistically, it looks like a great survey, with a sample size of 2,338 and a MoE of only +/-2.0%. Most importantly, it's showing Obama leading 49-46 with only 5% undecided.

538/poblano said...

IIRC, PPP uses a rather tight voting screen conditioned on previous voter behavior. It's definitely something endemic to the way they're conducting their polls, since they've now had three polls showing Obama in the lead, whereas no other pollster has shown any Obama leads.

Basically, this is the "Obama voters turn out, Clinton voters don't" scenario. Under those conditions, I think this election is winnable for Obama. I wouldn't dismiss this scenario -- PPP is a good pollster. But I don't think it's the most likely scenario -- lots of other good pollsters have surveyed this state too.

Ben said...

I guess we'll just have to wait and see, the last answer for all polls. I checked PPP's performance, and it's startlingly impressive. They nailed the last three big contests in Wisconsin, Texas, and Ohio.

Rasmus said...

True, and they also had good results in NY and the best in SC (still 7 points too low- 20 point margin for Obama- but still better than all other pollsters)- maybe they improved their basics since 2004?

Anyone interested in comparing the pollster results of 2004 to those in 2008? xD
I am not...

Joseph Eisenberg said...

I like the new chart. SUSA's new poll was a surprise; I wonder if they changed their methodology.

If we throw out PPP (and ARG?) as outliers, the trend toward Clinton reappears. But perhaps we shouldn't cook the data to fit our assumptions.

I'm still saying 11% for Clinton (range 16% to 6%).

She won't get the 20% victory she needs in the delegate and popular vote race, but she's not going to drop out with anything less than a single digit win (or an actual loss, perhaps)

DaveStew said...

Even though the ARG trend is interesting, ARG (iirc) has been thoroughly bashed for being an outlier. They are pulling Clinton's numbers up quite a bit based on being six points higher than the nearest other poll.

Is there a reason to include them in a non-corrected average like you did at the end of your post?

Derek said...

SUSA shows high Other and Undecided figures outside of the metro areas (i.e., "the T"). Likewise, among people who say that terrorism or immigration is the most important issue many are responing "Other" or undecided.

If these people stay home, then Clinton's margin is roughly 6 points. If these conservative, undecided voters stay home, and Obama's better-funded and staffed ground game in the SE of the state turns out the Obama vote, then the margin is 2 - 4 points. I don't see Obama winning.

But if the conservative undecided voters turn out and vote against Obama, for Hillary, then the margin will be at least six.

unertl said...

poblano,

Those 1401 voters are probably registered as Democrats. It wouldn't make sense for SUSA to call up all voters when only Dems can vote.

This dKos article compares the PA polls and notes that the wide range of results comes from the Philly and Southeast PA region. I wonder if the polls accounted for the newly registered Dems, who tend to break for Obama. New registrations in the SE region are up by a huge amount.