Note: The below was written before the latest SurveyUSA poll came out, which is now showing more consistency with other polls of the state.
I think I've finally gotten it figured out.
Why have we seen such wildly disparate results in the Pennsylvania polling? It may all have to do -- as it so often does -- with likely voter models.
The key to unraveling all of this is the Franklin & Marshall poll, which is the only poll that I am aware of that published separate results for likely and registered voters. In F&M's most recent poll, Clinton led Obama by 10 points among registered voters, but just 6 points among likely voters. In their March poll, Clinton led by 22 points among registered voters, but 16 points among likely voters. So, there's roughly a 5-point gap between the likely voter and registered voter numbers, which is relatively large insofar as these things go.
SurveyUSA, the pollster that has shown the most favorable results so far for Clinton, is notorious for using a very lax likely voter screen. And we can see this in their Pennsylvania results as well. In its latest survey, SurveyUSA reported that, of the 1401 registered adults that it contacted, 638 (45.5%) were likely to vote in the Democratic primary. For comparison's sake, 50.4% of Pennsylvanians who are registered to vote are registered as Democrats, according to the very latest figures from the PA Secretary of State.
So SurveyUSA has 45.5% of registered adults voting ... out of only 50.4% who theoretically could vote, since this is a closed primary. In other words, their model assumes turnout among registered Democrats to be more than 90%! This assumption, taken in a vacuum, is almost certainly wrong. It would imply turnout of about 3.8 million of the state's 4.2 million registered Democrats. For comparison's sake, Ohio had turnout of 2.2 million -- in an election in which independents and Republicans were eligible to vote. For that matter, there were only 2.9 million Kerry voters in Pennsylvania in 2004.
What is a more realistic assumption for turnout? Pennsylvania is a closed primary, which will take a big chunk out of turnout vis-a-vis Ohio. On the other hand -- the level of campaign activity in the state has been very intense, more so than even Ohio. The latest version of my turnout model projects turnout at 2.09 million -- almost exactly half of Pennsylvania's registered Democrats -- with a 95% confidence interval of 1.83 million to 2.35 million.
Franklin & Marshall assumes that roughly 67% of registered Democrats are likely voters, which would imply turnout of about 2.8 million if all its likely voters voted. This is a lot closer to my estimate -- a little high, indeed, but a pollster probably should fudge upward, both because "likely voter" does not mean "certain voter", and because there is probably some non-response bias toward likely voters.
Apart from SurveyUSA and Franklin & Marshall, I can't find any other polls that have disclosed this level of detail about their turnout assumptions. But the conventional wisdom is that apart from SurveyUSA, which eschews likely voter models as a matter of philosophy, the other robopollsters like Rasmussen and PPP tend to have tightish voter screens, as the cost of making additional calls is cheapest for them at the margins. Those are indeed the pollsters with the most favorable results for Barack Obama.
However, that is all assuming that likely voter models work -- which is a big assumption. If you guess wrong at who the likely voters might be, you might very well be better off not having a likely voter model at all. SurveyUSA, certainly, has had a great deal of success throughout the primaries with its very lax (almost non-existent) likely voter screen.
In this case, however, I am somewhat more inclined to trust the pollsters that are applying tighter likely voter screens, the main reason being the nature of the undecided vote in the state. Namely, while it appears to me that undecided voters are indeed leaning toward Clinton -- it may be that what they're really leaning toward is not voting at all. For example, Mason-Dixon finds that 17% of gun owners are undecided, as compared to just 3% of non- gun owners, and that 11% of voters in the "T" (rural and small-town Pennsylvania) are undecided, as compared to 6% of voters in Southeastern Pennsylvania (e.g. Philadelphia). In other words, when you do include these less-certain, largely rural and blue-collar voters -- a lot of them are not ready to make a decision. That suggests that you perhaps should not have included them in the first place.
If Obama is to stay within a few points of Clinton on Tuesday, what he'll need is for a lot of those unlikely/undecided voters in the central portion of the state to decide they're fed up with the whole thing and not vote. So, Obama should probably be rooting for low turnout overall. For Obama to actually win on Tuesday -- not just stay close -- he will probably also need high turnout in Philadelphia, and maybe among a couple of other select groups like newly-registered voters (who favor Obama 3:2 according to Franklin & Marshall) and students.
I don't think this scenario is entirely off the table -- but Obama also does not control his own destiny. He both has to win his enthusiasm/GOTV game, and have Clinton lose hers -- and if anything, the latter is probably more important than the former. If Obama were to win on Tuesday, the headlines would probably be that (i) the negative tone of the campaign depressed turnout outside of Philadelphia, and/or that (ii) the Clinton ground game was compromised by financial problems and internal dissent.
Additional Thought: One of the things I suppose I am suggesting is that -- it may have been a net positive for Obama, all else being equal, for the tone of the campaign to have been negative. The largest single variable in this primary is probably Clinton's ability to turn out voters outside of the major metropolitan areas -- including people who are not used to voting in primaries, since Pennsylvania has not had an important Presidential primary since 1984. Attacking Barack Obama might not be particularly helpful to her if these voters are really making a decision between voting for Clinton, and not voting at all.
UPDATE: Also, the relationship between Clinton support and undecided voters has now completely broken down.
Trendlines for all agencies that released data both before and after last Wednesday's debate: