In Pennsylvania and Indiana, the previous times that we conducted this exercise, the results from our regression model were closely in line with the composite polling averages in those states. In North Carolina, however, while most polls show a tightening, single-digit race, our model steadfastly forecasts a solid, double-digit victory for Barack Obama.
If you'll recall, the way that I produce these projections is to rely purely on demographic data from previous primaries. So the unstated assumption is this: if voters in North Carolina behave like demographically-aligned voters in other states, this is about what we should expect. On the other hand, if something has changed in the way that some groups of voters view the candidates -- our model may be inaccurate.
There does appear to me to be some evidence that Hillary Clinton is overperforming the position she has generally held throughout most of the recent primaries. But there is also some strong evidence that the current polling in North Carolina may be understating Barack Obama's support in that state.
But let's set that discussion aside for a moment and first look at the model's projections on a Congressional District by Congressional District basis.
CD-1 (Northeast / Albermarle): Without meaning to sound reductionist, this is one distrct where you can pretty much tell who somebody is going to vote for when you see them walk through the door. Most voters will be African-American, as the black population exceeds 50 percent in the district. However, the white population is impoverished and tends to be fairly old. The delegate split will almost certainly be 4-2 in Obama's favor. Prediction: Obama 68.1, Clinton 31.9; Obama 4-2 Delegate Win.
CD-2 (East Central / Raleigh): The Congressional Districts in North Carolina tend to be a little amorphous, as the major cities are divided up between two or three CD's. CD-2 includes the Raleigh city center, but then spiders out into some of the poorer areas of the state. However, it's an equal opportunity kind of poor, as their are plenty of blacks, plenty of whites, and a decent number of Hispanics. It should be a safe Obama win, but it will be a close battle to see whether he can get 4 out of 6 delegates. The model notes that CD-2 is the youngest district in the state and the most male and gives him the delegate majority. Prediction: Obama 61.5, Clnton 38.5; Obama 4-2 Delegate Win.
CD-3 (East / Cape Hatteras): The North Carolina coastline tends to be rugged and difficult to navigate, so you do not have so many of the wealthy, resort-type towns like you get in SC-1. Nevertheless, the white population in NC-3 is far more middle-class than the two Inner Banks districts that we just finished describing, and the district tends young and male, which should help Obama to make up for a relatively small black population. Overall, it's one of the more heterogeneous regions in the state and should track the state's returns fairly closely. Heavily Republican area, so expect a low turnout. Prediction: Obama 56.3, Clinton 43.7; 2-2 Delegate Split.
CD-4 (Central / Research Triangle): Among the best educated districts in the country, but also having a decent-sized African-American population and lots of collegians, the rapidly-growing Research Triangle area around Durham and Chapel Hill is Obama's best opportunity to run up the score. The district has nine delegates and turnout should be in the 170,000 voter range; Obama will get the lion's share of both, but a 7-2 delegate split looks unlikely. Prediction: Obama 69.8, Clinton 30.2; Obama 6-3 Delegate Win.
CD-5 (Northwest / Boone): White and lower-middle class, this Appalachian region most certainly favors Clinton, but the demographics aren't nearly as extreme as the neighboring regions in Tennessee. I have a hunch that she'll outperform our popular vote projection, but Clinton has no realistic chance to hold Obama to fewer than two delegates. Prediction: Clinton 55.9, Obama 44.1; Clinton 3-2 Delegate Win.
CD-6 (Central / Asheboro): With otherwise average demographics but a black population below 10%, this district leans Clinton, but Obama could steal the odd-numbered delegate from her if he does well with unaffiliated voters, as this is the least Democratic district in the state. Prediction: Clinton 51.8, Obama 48.2; Clinton 3-2 Delegate Win.
CD-7 (Southeast / Wilmington): Once fairly rough and tumble, this is a rapidly-growing district that is beginning to look something like suburban Atlanta, with an emphasis on service-sector employment. It has a high enough black population to lean Obama, but with an even number of delegates, we are most likely looking at a split. Prediction: Obama 52.8, Clinton 47.2; 3-3 Delegate Split.
CD-8 (South Central / Kannapolis): With a 30% African-American population, Obama will almost certainly win CD-8, but the white population is relatively working-class. It should break down about 60:40, almost exactly in line with a 3-2 delegate split. Prediction: Obama 60.7, Clinton 39.3; Obama 3-2 Delegate Win.
CD-9 (Charlotte - Gastonia): CD-9 comprises the ritzy and professional suburban areas immediately south of Charlotte. That is a demographic that has favored Obama over the course of most of this primary season, but performed less well for him in Pennsylvania. The model is giving Obama a 4-2 delegate win on the strength of independent voters, but I'd feel a little more comfortable for him if the African-American population were slightly larger. Prediction: Obama 60.2, Clinton 39.8; Obama 4-2 Delegate Win.
CD-10 (West / Hickory): The demographics are nearly identical to CD-5, and so naturally our model expects a nearly identical result. Prediction: Clinton 55.6, Obama 44.4; Clinton 3-2 Delegate Win.
CD-11 (West / Asheville): Artsy and eccentric, the Asheville/Smoky Mountains area is different from anything else in the state. Although the black population is small, there would otherwise be enough of a latte liberal crowd to keep things fairly balanced, except for the fact that the high quality of life tends to attract a large retirement community. As such, Clinton is bound to win, and will be right on the brink of winning a 4-2 delegate split. Prediction: Clinton 57.6, Obama 42.4; 3-3 Delegate Split.
CD-12 (I-85 Corridor): This gerrymandered district hugs Interstate 85 and includes portions of Charlotte, High Point and Winston-Salem. It's predominately middle class, but nearly half the district is African-American, which heavily favors Obama. In fact, we have this as his best district in the state, with the voting margins lining up almost exactly with a 5-2 delegate win. Prediction: Obama 70.6, Clinton 29.4; Obama 5-2 Delegate Win.
CD 13 (North / Greensboro): The demographics here are similar to some of Obama's better districts in Virginia, with a mix of blacks and relatively well-educated whites in Greensboro and suburban Raleigh. On a good day, Obama is within reach of the 64.3% vote share he'd need to take a fifth delegate from this district. Prediction: Obama 63.4, Clinton 36.6; Obama 4-3 Delegate Win.
(Note: Typographical error fixed in table -- statewide popular vote totals were neglecting to include CD-12 and CD-13).
The model expects Barack Obama to pick up 66 delegates to Clinton's 49. Relative to Indiana, the delegate situation is a little more fluid: we are very close to the delegate thresholds in CD-6, CD-9, CD-11 and CD-13, and a delegate could also easily change hands in CD-2.
However, at this point it's not the delegate math that counts so much as the margin of victory, and that's showing up impressively for Barack Obama: a 17-point win, which would net him
approximately 150,000 in excess of 200,000 popular votes.
From a 30,000-foot level, it should not be so difficult to see how the model might come up with results like these. Obama won by 29 points in South Carolina and 28 points in Virginia, the two states that share a long border with North Carolina. Although North Carolina is somewhat whiter than South Carolina, and somewhat less wealthy than Virginia, this projection nevertheless looks perfectly reasonable next to those numbers.
And indeed, this projection did match the polling data until relatively recently. On April 22, the morning before the Pennsylvania primaries, Obama led Clinton 51-36 in the Real Clear Politics average in North Carolina, a 15-point margin that matches our regression output almost perfectly.
If North Carolina were North Dakota, I'd probably just caveat this as "Clinton may presently be outperforming her long-term demographic trends" and leave it at that. However, North Carolina is a Southern state -- and pollsters have had an awful lot of trouble polling the South. Let me bring up two interesting facts.
Interesting Fact #1: The polls have significantly underestimated Barack Obama's margin of victory in Southern states with substantial black populations.
That is a graph I drew up a few weeks ago comparing the actual results in a state to the final Pollster.com averages. As you can see, Obama outperformed his polling averages in almost all Southern states: by 3 points in Tennessee, 5 in Florida, 6 in Missouri, 9 in Mississippi, 11 in Virginia, 15 in Alabama, 18 in South Carolina, and 22 in Georgia (the only exception among Southern states that received widespread polling was in Texas). Moreover, the discrepancies appear to be related to the number of African-Americans in the population. In North Carolina, which is about 22 percent African-American, the regression line would forecast an underestimate of about 10 points. Add those 10 points to the 6- or 7-point lead that Obama presently holds in the North Carolina polling consensus -- and you get right to my model's estimate of a 17 point win.
Interesting Fact #2: Early voting data in North Carolina suggests that the pollsters may be significantly underestimating the proportion of African-Americans in the voting population.
As we discussed over the weekend, we have a unique wealth of data in North Carolina the likes of which I don't recall seeing ever having seen before. The state's Board of Elections has not only released the number of early voters, but also provided significant information about their demographics. According to the
The difference between these two estimates is significant. As you've probably found if you've played around with our North Carolina prediction tool, an increase of 1 percent in the fraction of the electorate that is African-American translates to roughly a 1-point increase in Barack Obama's margin over Hillary Clinton. So -- if the pollsters are assuming 33% black turnout when it will actually be 40%, that would add 7 points to Obama's margin -- putting us in the 13-14 point range, again fairly close to my model's estimates.
As Blumenthal notes, the Obama campaign has tended to place more of an emphasis on early voting -- and so it's possible that his voters are just getting to the polls sooner. On the other hand, studies in other states suggest that people who vote early tend to be of higher socioeconomic status -- whereas black voters as a group tend to be of lower socioeconomic status, particularly in a state like North Carolina. Moreover, 61% of the early voters are female, which is not typically an Obama demographic. (Although, there are a couple of alternate hypotheses for this: the black electorate tends to be disproportionately female, so this could be another reflection of the higher African-American turnout. Also, men tend to procrastinate.)
But fundamentally -- would 40% black turnout be a reasonable number in North Carolina? Or is it just completely out-of-bounds? The short answer is that yes, 40% is a perfectly defensible estimate -- and very probably a better estimate than 33%.
I had worked on this problem a little bit before, attempting to estimate African-American turnout as a proportion of the state's underlying African-American population. But I recognized that there is one alternate metric that might be helpful in this regard too: the proportion of a state's John Kerry voters who were African-American. This is reasonably easy to infer based on 2004 exit polling data.
Below are a set of statistics for each state that has held a primary so far and in which at least 5% of that state's population is African-American (I exclude the disputed contests in Florida and Michigan). This compares the size of the African-American electorate in the primary against (i) the percentage of African-Americans in that state's population, and (ii) the percentage of Kerry voters in that state who were African-American.
Note that in North Carolina, 52 percent of John Kerry's voters were African-American. While 40 percent black turnout sounds high compared to a 22 percent population baseline, it does not sound so high compared to this figure.
In fact, by running a simple regression model, we can come up with a relatively good estimate of African-American turnout based on these two figures. The regression model finds that it's helpful to draw from both sources of data, and comes up with the following best-fit equation:
Turnout = .65 * Population + .57 * KerryIf you run these numbers through for North Carolina, you come up with African-American turnout of 44% (you also come up with 15% in Indiana).
Now, I don't necessarily expect that African-American turnout is going to be quite that high -- and I should point out that the standard error of the forecast is fairly large (roughly +/- 5 points). But it does seem to me that 40 percent -- which falls within that standard error interval -- is a pretty reasonable estimate. I am not so sure about 33 percent. That would imply turnout of approximately 63 percent of the black share of the Kerry vote. Only one other state (Oklahoma, where Obama did not really wage a campaign) came in that low. Other Southern states have been in the range of 64 percent (Arkansas) to 126 percent (Missouri).
I did play around with some more sophisticated versions of the model that looked at things like the amount of campaign activity in a state, or combined the data you see above with pre-election polling from SurveyUSA. These alternate methods produced turnout estimates ranging from about 37 percent to 43 percent -- but orienting around the 40 percent number. Considering that nearly 400,000 votes have already come in at a turnout rate of 40 percent and that other polls in Southern states have substantially underestimated Barack Obama's victory margin, I would be surprised if African-American turnout came in lower than about 37 percent in North Carolina. In fact, that figure is probably conservative. If I am right about this, then a double-digit win for Obama in North Carolina is not only possible, but perhaps somewhat likely.
By the way -- I think I know what the pollsters are doing wrong. They're calibrating black turnout to a proportion of the population, but not to a proportion of the Kerry vote. This is a significant mistake, because in some states, the vast majority of the available white voters will vote as Republicans -- meaning that black voters make up a larger share of what remains in the Democratic electorate. It may even be that the Obama campaign recognizes all of this, which is why they have devoted a disproportionate share of their resources to Indiana. But, we will know soon enough. I am prepared to be spectacularly wrong on Tuesday.
EDIT: Public Policy Polling's Tom Jensen suggests that there is some history in North Carolina of black voters tending to make up a larger share of early voters. Still, the presidential primary in North Carolina was not competitive in either 2000 or 2004, so I'm not sure how much can be inferred from that. If you want to pin me to a number, I'll take 38% black turnout.
by Nate Silver @ 9:26 AM