

In Pennsylvania and Indiana, the previous times that we conducted this exercise, the results from our regression model were closely in line with the composite polling averages in those states. In North Carolina, however, while most polls show a tightening, single-digit race, our model steadfastly forecasts a solid, double-digit victory for Barack Obama.
If you'll recall, the way that I produce these projections is to rely purely on demographic data from previous primaries. So the unstated assumption is this: if voters in North Carolina behave like demographically-aligned voters in other states, this is about what we should expect. On the other hand, if something has changed in the way that some groups of voters view the candidates -- our model may be inaccurate.
There does appear to me to be some evidence that Hillary Clinton is overperforming the position she has generally held throughout most of the recent primaries. But there is also some strong evidence that the current polling in North Carolina may be understating Barack Obama's support in that state.
But let's set that discussion aside for a moment and first look at the model's projections on a Congressional District by Congressional District basis.
CD-1 (Northeast / Albermarle): Without meaning to sound reductionist, this is one distrct where you can pretty much tell who somebody is going to vote for when you see them walk through the door. Most voters will be African-American, as the black population exceeds 50 percent in the district. However, the white population is impoverished and tends to be fairly old. The delegate split will almost certainly be 4-2 in Obama's favor. Prediction: Obama 68.1, Clinton 31.9; Obama 4-2 Delegate Win.
CD-2 (East Central / Raleigh): The Congressional Districts in North Carolina tend to be a little amorphous, as the major cities are divided up between two or three CD's. CD-2 includes the Raleigh city center, but then spiders out into some of the poorer areas of the state. However, it's an equal opportunity kind of poor, as their are plenty of blacks, plenty of whites, and a decent number of Hispanics. It should be a safe Obama win, but it will be a close battle to see whether he can get 4 out of 6 delegates. The model notes that CD-2 is the youngest district in the state and the most male and gives him the delegate majority. Prediction: Obama 61.5, Clnton 38.5; Obama 4-2 Delegate Win.
CD-3 (East / Cape Hatteras): The North Carolina coastline tends to be rugged and difficult to navigate, so you do not have so many of the wealthy, resort-type towns like you get in SC-1. Nevertheless, the white population in NC-3 is far more middle-class than the two Inner Banks districts that we just finished describing, and the district tends young and male, which should help Obama to make up for a relatively small black population. Overall, it's one of the more heterogeneous regions in the state and should track the state's returns fairly closely. Heavily Republican area, so expect a low turnout. Prediction: Obama 56.3, Clinton 43.7; 2-2 Delegate Split.
CD-4 (Central / Research Triangle): Among the best educated districts in the country, but also having a decent-sized African-American population and lots of collegians, the rapidly-growing Research Triangle area around Durham and Chapel Hill is Obama's best opportunity to run up the score. The district has nine delegates and turnout should be in the 170,000 voter range; Obama will get the lion's share of both, but a 7-2 delegate split looks unlikely. Prediction: Obama 69.8, Clinton 30.2; Obama 6-3 Delegate Win.
CD-5 (Northwest / Boone): White and lower-middle class, this Appalachian region most certainly favors Clinton, but the demographics aren't nearly as extreme as the neighboring regions in Tennessee. I have a hunch that she'll outperform our popular vote projection, but Clinton has no realistic chance to hold Obama to fewer than two delegates. Prediction: Clinton 55.9, Obama 44.1; Clinton 3-2 Delegate Win.
CD-6 (Central / Asheboro): With otherwise average demographics but a black population below 10%, this district leans Clinton, but Obama could steal the odd-numbered delegate from her if he does well with unaffiliated voters, as this is the least Democratic district in the state. Prediction: Clinton 51.8, Obama 48.2; Clinton 3-2 Delegate Win.
CD-7 (Southeast / Wilmington): Once fairly rough and tumble, this is a rapidly-growing district that is beginning to look something like suburban Atlanta, with an emphasis on service-sector employment. It has a high enough black population to lean Obama, but with an even number of delegates, we are most likely looking at a split. Prediction: Obama 52.8, Clinton 47.2; 3-3 Delegate Split.
CD-8 (South Central / Kannapolis): With a 30% African-American population, Obama will almost certainly win CD-8, but the white population is relatively working-class. It should break down about 60:40, almost exactly in line with a 3-2 delegate split. Prediction: Obama 60.7, Clinton 39.3; Obama 3-2 Delegate Win.
CD-9 (Charlotte - Gastonia): CD-9 comprises the ritzy and professional suburban areas immediately south of Charlotte. That is a demographic that has favored Obama over the course of most of this primary season, but performed less well for him in Pennsylvania. The model is giving Obama a 4-2 delegate win on the strength of independent voters, but I'd feel a little more comfortable for him if the African-American population were slightly larger. Prediction: Obama 60.2, Clinton 39.8; Obama 4-2 Delegate Win.
CD-10 (West / Hickory): The demographics are nearly identical to CD-5, and so naturally our model expects a nearly identical result. Prediction: Clinton 55.6, Obama 44.4; Clinton 3-2 Delegate Win.
CD-11 (West / Asheville): Artsy and eccentric, the Asheville/Smoky Mountains area is different from anything else in the state. Although the black population is small, there would otherwise be enough of a latte liberal crowd to keep things fairly balanced, except for the fact that the high quality of life tends to attract a large retirement community. As such, Clinton is bound to win, and will be right on the brink of winning a 4-2 delegate split. Prediction: Clinton 57.6, Obama 42.4; 3-3 Delegate Split.
CD-12 (I-85 Corridor): This gerrymandered district hugs Interstate 85 and includes portions of Charlotte, High Point and Winston-Salem. It's predominately middle class, but nearly half the district is African-American, which heavily favors Obama. In fact, we have this as his best district in the state, with the voting margins lining up almost exactly with a 5-2 delegate win. Prediction: Obama 70.6, Clinton 29.4; Obama 5-2 Delegate Win.
CD 13 (North / Greensboro): The demographics here are similar to some of Obama's better districts in Virginia, with a mix of blacks and relatively well-educated whites in Greensboro and suburban Raleigh. On a good day, Obama is within reach of the 64.3% vote share he'd need to take a fifth delegate from this district. Prediction: Obama 63.4, Clinton 36.6; Obama 4-3 Delegate Win.
Statewide Results
(Note: Typographical error fixed in table -- statewide popular vote totals were neglecting to include CD-12 and CD-13).
The model expects Barack Obama to pick up 66 delegates to Clinton's 49. Relative to Indiana, the delegate situation is a little more fluid: we are very close to the delegate thresholds in CD-6, CD-9, CD-11 and CD-13, and a delegate could also easily change hands in CD-2.
However, at this point it's not the delegate math that counts so much as the margin of victory, and that's showing up impressively for Barack Obama: a 17-point win, which would net him approximately 150,000 in excess of 200,000 popular votes.
From a 30,000-foot level, it should not be so difficult to see how the model might come up with results like these. Obama won by 29 points in South Carolina and 28 points in Virginia, the two states that share a long border with North Carolina. Although North Carolina is somewhat whiter than South Carolina, and somewhat less wealthy than Virginia, this projection nevertheless looks perfectly reasonable next to those numbers.
And indeed, this projection did match the polling data until relatively recently. On April 22, the morning before the Pennsylvania primaries, Obama led Clinton 51-36 in the Real Clear Politics average in North Carolina, a 15-point margin that matches our regression output almost perfectly.
If North Carolina were North Dakota, I'd probably just caveat this as "Clinton may presently be outperforming her long-term demographic trends" and leave it at that. However, North Carolina is a Southern state -- and pollsters have had an awful lot of trouble polling the South. Let me bring up two interesting facts.
Interesting Fact #1: The polls have significantly underestimated Barack Obama's margin of victory in Southern states with substantial black populations.
That is a graph I drew up a few weeks ago comparing the actual results in a state to the final Pollster.com averages. As you can see, Obama outperformed his polling averages in almost all Southern states: by 3 points in Tennessee, 5 in Florida, 6 in Missouri, 9 in Mississippi, 11 in Virginia, 15 in Alabama, 18 in South Carolina, and 22 in Georgia (the only exception among Southern states that received widespread polling was in Texas). Moreover, the discrepancies appear to be related to the number of African-Americans in the population. In North Carolina, which is about 22 percent African-American, the regression line would forecast an underestimate of about 10 points. Add those 10 points to the 6- or 7-point lead that Obama presently holds in the North Carolina polling consensus -- and you get right to my model's estimate of a 17 point win.
Interesting Fact #2: Early voting data in North Carolina suggests that the pollsters may be significantly underestimating the proportion of African-Americans in the voting population.
As we discussed over the weekend, we have a unique wealth of data in North Carolina the likes of which I don't recall seeing ever having seen before. The state's Board of Elections has not only released the number of early voters, but also provided significant information about their demographics. According to the
The difference between these two estimates is significant. As you've probably found if you've played around with our North Carolina prediction tool, an increase of 1 percent in the fraction of the electorate that is African-American translates to roughly a 1-point increase in Barack Obama's margin over Hillary Clinton. So -- if the pollsters are assuming 33% black turnout when it will actually be 40%, that would add 7 points to Obama's margin -- putting us in the 13-14 point range, again fairly close to my model's estimates.
As Blumenthal notes, the Obama campaign has tended to place more of an emphasis on early voting -- and so it's possible that his voters are just getting to the polls sooner. On the other hand, studies in other states suggest that people who vote early tend to be of higher socioeconomic status -- whereas black voters as a group tend to be of lower socioeconomic status, particularly in a state like North Carolina. Moreover, 61% of the early voters are female, which is not typically an Obama demographic. (Although, there are a couple of alternate hypotheses for this: the black electorate tends to be disproportionately female, so this could be another reflection of the higher African-American turnout. Also, men tend to procrastinate.)
But fundamentally -- would 40% black turnout be a reasonable number in North Carolina? Or is it just completely out-of-bounds? The short answer is that yes, 40% is a perfectly defensible estimate -- and very probably a better estimate than 33%.
I had worked on this problem a little bit before, attempting to estimate African-American turnout as a proportion of the state's underlying African-American population. But I recognized that there is one alternate metric that might be helpful in this regard too: the proportion of a state's John Kerry voters who were African-American. This is reasonably easy to infer based on 2004 exit polling data.
Below are a set of statistics for each state that has held a primary so far and in which at least 5% of that state's population is African-American (I exclude the disputed contests in Florida and Michigan). This compares the size of the African-American electorate in the primary against (i) the percentage of African-Americans in that state's population, and (ii) the percentage of Kerry voters in that state who were African-American.
Note that in North Carolina, 52 percent of John Kerry's voters were African-American. While 40 percent black turnout sounds high compared to a 22 percent population baseline, it does not sound so high compared to this figure.
In fact, by running a simple regression model, we can come up with a relatively good estimate of African-American turnout based on these two figures. The regression model finds that it's helpful to draw from both sources of data, and comes up with the following best-fit equation:Turnout = .65 * Population + .57 * Kerry
If you run these numbers through for North Carolina, you come up with African-American turnout of 44% (you also come up with 15% in Indiana).
Now, I don't necessarily expect that African-American turnout is going to be quite that high -- and I should point out that the standard error of the forecast is fairly large (roughly +/- 5 points). But it does seem to me that 40 percent -- which falls within that standard error interval -- is a pretty reasonable estimate. I am not so sure about 33 percent. That would imply turnout of approximately 63 percent of the black share of the Kerry vote. Only one other state (Oklahoma, where Obama did not really wage a campaign) came in that low. Other Southern states have been in the range of 64 percent (Arkansas) to 126 percent (Missouri).
I did play around with some more sophisticated versions of the model that looked at things like the amount of campaign activity in a state, or combined the data you see above with pre-election polling from SurveyUSA. These alternate methods produced turnout estimates ranging from about 37 percent to 43 percent -- but orienting around the 40 percent number. Considering that nearly 400,000 votes have already come in at a turnout rate of 40 percent and that other polls in Southern states have substantially underestimated Barack Obama's victory margin, I would be surprised if African-American turnout came in lower than about 37 percent in North Carolina. In fact, that figure is probably conservative. If I am right about this, then a double-digit win for Obama in North Carolina is not only possible, but perhaps somewhat likely.
By the way -- I think I know what the pollsters are doing wrong. They're calibrating black turnout to a proportion of the population, but not to a proportion of the Kerry vote. This is a significant mistake, because in some states, the vast majority of the available white voters will vote as Republicans -- meaning that black voters make up a larger share of what remains in the Democratic electorate. It may even be that the Obama campaign recognizes all of this, which is why they have devoted a disproportionate share of their resources to Indiana. But, we will know soon enough. I am prepared to be spectacularly wrong on Tuesday.
EDIT: Public Policy Polling's Tom Jensen suggests that there is some history in North Carolina of black voters tending to make up a larger share of early voters. Still, the presidential primary in North Carolina was not competitive in either 2000 or 2004, so I'm not sure how much can be inferred from that. If you want to pin me to a number, I'll take 38% black turnout.
Monday, May 5, 2008
North Carolina Prediction: Obama by Double Digits
at 8:26 AM
Labels: african-americans, bradley effect, north carolina, primaries, turnout models
Subscribe to:
Post Comments (Atom)
35 comments
Obama will clean up Watauga County (Boone). He won the county straw poll up there something like 75-25, and the county Dems are, IMO, the most well organized county party in the state.
In the last 4 years they've flipped the county commission from 5R-0D to 0R-5D, taken the town council with a 4-1 progressive majority, and beaten a longtime incumbent state house member, and a state senator. They also beat Virginia Foxx by 1,000 votes in her own county in an off-year election. Watauga County Dems are the perfect model of progressive grassroots organizing, and they mean business.
peace,
faithfull
My only concern with your otherwise brilliant analysis is the political context. I mean the climate is different now than it was when all those other primaries took place so just projecting the NC results from them seem to miss the fact that there are a handful of facts different in NC.
In that regard, I see the high number of AA undecideds and I see Hillary in the high teens in the latest IA poll and I start to wonder if some of those AA undecideds could be Hillary voters that dont want to admit it because they dont want to be seen as voting against the AA candidate.
How plausible is that theory ?
I am not quite as optimistic as you, but I still see a 6 percent underprediction. I predict 12% to Obama.
I wouldn't read much into the Insider Advantage numbers. Two other polls -- Zogby and Research 2000, had black support for Clinton at something like 7%, and SurveyUSA had it at 11%.
Insider Advantage gets decent enough results overall, but it does very, very strange things with its cross-tabs and I think the best policy is probably just to ignore their demographic breakouts.
The NC race will be a very interesting test of your methodology. Kudos to you for putting it all out there for examination.
I will stand on my head and then dance a jig if Obama holds CD 11 to a 3-3 tie. HRC has been here twice, Bill twice, Chelsea once - outside of Asheville, it is poor pickings indeed.
But encouraging to see that your model, which I believe includes the 'Appalachian influence' believes otherwise...
Thanks for posting. Been waiting with bated breathe to see what your model spat out...
Great analysis.
"Congressional Districts in North Carolina tend to be a little amorphous"
That's a nice way to put it. The less nice way is "North Carolina is one of the most, if not the most gerrymandered state in the nation."
Very interesting analysis, thanks for the great work as usual.
"I think I know what the pollsters are doing wrong. They're calibrating black turnout to a proportion of the population, but not to a proportion of the Kerry vote."
Really hard to believe pollsters are that stupid.
Aren't your projected turnout numbers really low when compared to PA results? Seems like you're projecting 50% of the PA turnout yet NC has 72% of PA's population.
Interesting analysis, but please re-check your numbers. According to exit polls, only 50% of primary voters in MS were black.
The population of VA is only about 19% black.
Anecdotally, not only African-Americans but Hispanics are turning strongly in support for Obama.
For some, there is a strong convergence between race-baiting and immigrant-bashing.
I can only guess how this is going to play out in places like Texas...and Arizona.
- cskendrick
A question for you, jpm:
Is the population of NC Democrats 72% of the population of PA Democrats?
Interesting.
My- very simple- prediction of NC fits not so bad with yours.
I predict a 57,5-42,5 win for Obama, based on my formula margin=((McCain-Obama Average)-(McCain-Clinton Average)*3)
For Indiana, it would be 52,7 Obama- 47,3 Clinton based on your numbers, I use a different weighting system and list the SUSA/Downs Center poll as Downs Center, not as SUSA polls, so I get a prediction of
51,9 Obama
49,1 Clinton
I am 101% certain that your Indiana numbers are in error, Rasmus.
I am also, but nevertheless that´s my prediction. And I got Pennsylvania within 0,5% with that ( I predicted a 9,6% margin, but that numbers changed, since the Rasmussen and Quinnipac polls from 24 and 26. April were good for Obama. Now I would predict a 6,2% margin. )- and Texas with the polls that were out on 4/3 also within 2%, even if I would now predict a 50,1-49,9 Clinton split.
And for Ohio I don´t want to calculate the prediction for 3/4, that would be too complicated, but for today I´d predict a 9,8% Clinton margin- that is also close to the election results.
You see, that method works not bad, and the NC numbers look reasonable also- I don´t know why I get so strange numbers for Indiana. Maybe the Republicans there love him- or we will see a big upset tomorrow.
Very interesting. If, instead of a multiple regression, you run two linear regressions against (1) Kerry vote, and (2) black population, you get two distinctly different predictions in NC: Kerry vote predicts 53%, while population predicts 33%. So I think you're right, the pollsters are using population only. The mean of these numbers is 43%, which is close to what one would expect base on early voting, and matches your multiple regression too (as expected).
In Indiana, single-regression on Kerry gives 16%, and on population gives 9%, with a mean of 13%.
Also interesting: plugging 43% into your spreadsheet below gives a 17% Obama margin of victory.
Anon @ 10:59 - ROFL
Rasmus...read your post again to realize what Anon was talkin about.
Oo
I realize.
51,9-48,1.
Sorry, and thanks Anon, I love your sense of humor.
The one problem I see with using the Kerry vote in NC is that NC weird in that it votes for Dems at the state level and Reps at the federal level. NC primary voters will be more conservative than you would expect from the Presidential election, so a better comparator would be the results from the 2004 governors race, in which blacks comprised around 41% of the Dem vote.
anonymous,
Independents can vote in NC.
I would guess that 72% of NC Dems
and Indep are quite close to 72%
of PA Dems.
Another demographic-based regression pretty much agrees with you. The NC election will offer a very interesting test case on the accuracy of polling methods in this election cycle.
Pretty much agrees!? Those results are nearly identical:
538: 58.6% O, 41.4% C; 1.15 million
SSS: 58% O, 42% C; 1.2 million
Any prediction for Indiana?
My Indiana prediction is
Clinton 52.3
Obama 47.7
Though I'd much prefer that it were flipped around.
I honestly think Obama will overperform in Indiana (especially North west Indiana), as well as NC. The main reason here is that Indiana borders Illinois, and NW IN is in the Chicago Suburbs / Media Market. I know this isn't new news, and the disproportionate support in the region is reflected to a certain extent in the in the polls, but The home state advantage thing here throws the turnout projections out of whack. Not only are they more likely to vote for someone to whom they feel a home state connection, it MAKES THEM MORE LIKELY TO VOTE AT ALL than they might be otherwise. The people who don't feel the home state connection, on the other hand, feel less obligated to vote. So He'll overperform state wide due to high turnout (mostly due to Obama supporters) in NW IN.
So the pollsters are estimating only a 32% to 33% AAQ turnout. Dio you supposed they forgot to include the effects of the significant voter supression that Rendell's minions pulled off in Philly? While we're talking about disinformation (or non-information) promogated (or covered up) by the privatized propaganda ministry, how many noticed that the Gallup poll that shows Clinton +7 CHANGED to a "likely voter" scam from another class of algorythim (sp). Wouldn't we like to know what the results would be if they'd treated their traw data consistently? Bet it wouldn't favor Clinton, thoough.
Polls, and statics, and predictions and pundits, are always wrong. And if they are once in a blue moon right it's not because they were right, it was because reality happened to coincide with their guess. Zogby used to be spot on. No more. Rasmussen is the go-to guy now, but for awhile. I stopped believing the polls and stats after drinking the kool aid too many prior elections in the past.
Anon@5:04PM:
NW IN being in Chicago suburbs could be a double-edged sword. According to some talking head on MSNBC, that area has been inundated with stories about Rev. Wright from the chicago media. Implication was that it was actually helping Clinton.
Just wondering: could the counter-Bradley effect be manifesting itself indirectly via the turn-out model (and not directly via the expressed preference), i.e. pollsters stick to a low AA turn-out model because they often hear from AA respondents "won't vote", meaning "I rather not have this conversation with *you*, thank you".
Empirical question: can we go back to the data and see how much of the Obama underprediction in Southern polls is due to AA turn-out models, how much due to wrong reading of AA preferences, and how much to wrong reading of White preferences?
Has this model been used ex-post-facto on prior states to judge how it would have fared in prior contests?
If so, in what circumstances has it tended to over estimate which candidate, etc?
Obviously there are some issues in applying the model to previous states, such as changing perceptions of the candidates over time, but it would still seem useful.
CD-01 in NC will go at the very least 75-25 for Obama and likely it will be something like 80-20. You will see. :)
rcp has it at 49.7-42.7.
Given the trend of late breakers favoring clinton, a double digit victory suggests that the late breakers will go 2 to 1 obama. rcp had it at 49.5 clinton, obama 43.4-final #? clinton gained 5.1 to bo's 2% gain.
I don't see= as I have yet to see it.
Obama's base is maxed at this point(aa/higher ed), while hillary has a lot broader demographic to draw from.
If those poor white hicks take time to stop acting 'clingy', put down their guns and bibles long enough to vote, hillary will still lose, but by 6 or less.
Indiana is more likely to go double digit, the other way.
"rcp had it at 49.5 clinton, obama 43.4-final #? clinton gained 5.1 to bo's 2% gain."
referencing PA
Great analysis. I applaud your willingness to make predictions and lay out your methodology before the voting starts. We'll be following your story at Babbledog.
Post a Comment