Showing posts with label bradley effect. Show all posts
Showing posts with label bradley effect. Show all posts

Monday, May 5, 2008

North Carolina Prediction: Obama by Double Digits





In Pennsylvania and Indiana, the previous times that we conducted this exercise, the results from our regression model were closely in line with the composite polling averages in those states. In North Carolina, however, while most polls show a tightening, single-digit race, our model steadfastly forecasts a solid, double-digit victory for Barack Obama.

If you'll recall, the way that I produce these projections is to rely purely on demographic data from previous primaries. So the unstated assumption is this: if voters in North Carolina behave like demographically-aligned voters in other states, this is about what we should expect. On the other hand, if something has changed in the way that some groups of voters view the candidates -- our model may be inaccurate.

There does appear to me to be some evidence that Hillary Clinton is overperforming the position she has generally held throughout most of the recent primaries. But there is also some strong evidence that the current polling in North Carolina may be understating Barack Obama's support in that state.

But let's set that discussion aside for a moment and first look at the model's projections on a Congressional District by Congressional District basis.

CD-1 (Northeast / Albermarle): Without meaning to sound reductionist, this is one distrct where you can pretty much tell who somebody is going to vote for when you see them walk through the door. Most voters will be African-American, as the black population exceeds 50 percent in the district. However, the white population is impoverished and tends to be fairly old. The delegate split will almost certainly be 4-2 in Obama's favor. Prediction: Obama 68.1, Clinton 31.9; Obama 4-2 Delegate Win.

CD-2 (East Central / Raleigh): The Congressional Districts in North Carolina tend to be a little amorphous, as the major cities are divided up between two or three CD's. CD-2 includes the Raleigh city center, but then spiders out into some of the poorer areas of the state. However, it's an equal opportunity kind of poor, as their are plenty of blacks, plenty of whites, and a decent number of Hispanics. It should be a safe Obama win, but it will be a close battle to see whether he can get 4 out of 6 delegates. The model notes that CD-2 is the youngest district in the state and the most male and gives him the delegate majority. Prediction: Obama 61.5, Clnton 38.5; Obama 4-2 Delegate Win.

CD-3 (East / Cape Hatteras): The North Carolina coastline tends to be rugged and difficult to navigate, so you do not have so many of the wealthy, resort-type towns like you get in SC-1. Nevertheless, the white population in NC-3 is far more middle-class than the two Inner Banks districts that we just finished describing, and the district tends young and male, which should help Obama to make up for a relatively small black population. Overall, it's one of the more heterogeneous regions in the state and should track the state's returns fairly closely. Heavily Republican area, so expect a low turnout. Prediction: Obama 56.3, Clinton 43.7; 2-2 Delegate Split.

CD-4 (Central / Research Triangle): Among the best educated districts in the country, but also having a decent-sized African-American population and lots of collegians, the rapidly-growing Research Triangle area around Durham and Chapel Hill is Obama's best opportunity to run up the score. The district has nine delegates and turnout should be in the 170,000 voter range; Obama will get the lion's share of both, but a 7-2 delegate split looks unlikely. Prediction: Obama 69.8, Clinton 30.2; Obama 6-3 Delegate Win.

CD-5 (Northwest / Boone): White and lower-middle class, this Appalachian region most certainly favors Clinton, but the demographics aren't nearly as extreme as the neighboring regions in Tennessee. I have a hunch that she'll outperform our popular vote projection, but Clinton has no realistic chance to hold Obama to fewer than two delegates. Prediction: Clinton 55.9, Obama 44.1; Clinton 3-2 Delegate Win.

CD-6 (Central / Asheboro): With otherwise average demographics but a black population below 10%, this district leans Clinton, but Obama could steal the odd-numbered delegate from her if he does well with unaffiliated voters, as this is the least Democratic district in the state. Prediction: Clinton 51.8, Obama 48.2; Clinton 3-2 Delegate Win.

CD-7 (Southeast / Wilmington): Once fairly rough and tumble, this is a rapidly-growing district that is beginning to look something like suburban Atlanta, with an emphasis on service-sector employment. It has a high enough black population to lean Obama, but with an even number of delegates, we are most likely looking at a split. Prediction: Obama 52.8, Clinton 47.2; 3-3 Delegate Split.

CD-8 (South Central / Kannapolis): With a 30% African-American population, Obama will almost certainly win CD-8, but the white population is relatively working-class. It should break down about 60:40, almost exactly in line with a 3-2 delegate split. Prediction: Obama 60.7, Clinton 39.3; Obama 3-2 Delegate Win.

CD-9 (Charlotte - Gastonia): CD-9 comprises the ritzy and professional suburban areas immediately south of Charlotte. That is a demographic that has favored Obama over the course of most of this primary season, but performed less well for him in Pennsylvania. The model is giving Obama a 4-2 delegate win on the strength of independent voters, but I'd feel a little more comfortable for him if the African-American population were slightly larger. Prediction: Obama 60.2, Clinton 39.8; Obama 4-2 Delegate Win.

CD-10 (West / Hickory): The demographics are nearly identical to CD-5, and so naturally our model expects a nearly identical result. Prediction: Clinton 55.6, Obama 44.4; Clinton 3-2 Delegate Win.

CD-11 (West / Asheville): Artsy and eccentric, the Asheville/Smoky Mountains area is different from anything else in the state. Although the black population is small, there would otherwise be enough of a latte liberal crowd to keep things fairly balanced, except for the fact that the high quality of life tends to attract a large retirement community. As such, Clinton is bound to win, and will be right on the brink of winning a 4-2 delegate split. Prediction: Clinton 57.6, Obama 42.4; 3-3 Delegate Split.

CD-12 (I-85 Corridor): This gerrymandered district hugs Interstate 85 and includes portions of Charlotte, High Point and Winston-Salem. It's predominately middle class, but nearly half the district is African-American, which heavily favors Obama. In fact, we have this as his best district in the state, with the voting margins lining up almost exactly with a 5-2 delegate win. Prediction: Obama 70.6, Clinton 29.4; Obama 5-2 Delegate Win.

CD 13 (North / Greensboro): The demographics here are similar to some of Obama's better districts in Virginia, with a mix of blacks and relatively well-educated whites in Greensboro and suburban Raleigh. On a good day, Obama is within reach of the 64.3% vote share he'd need to take a fifth delegate from this district. Prediction: Obama 63.4, Clinton 36.6; Obama 4-3 Delegate Win.

Statewide Results



(Note: Typographical error fixed in table -- statewide popular vote totals were neglecting to include CD-12 and CD-13).

The model expects Barack Obama to pick up 66 delegates to Clinton's 49. Relative to Indiana, the delegate situation is a little more fluid: we are very close to the delegate thresholds in CD-6, CD-9, CD-11 and CD-13, and a delegate could also easily change hands in CD-2.

However, at this point it's not the delegate math that counts so much as the margin of victory, and that's showing up impressively for Barack Obama: a 17-point win, which would net him approximately 150,000 in excess of 200,000 popular votes.

From a 30,000-foot level, it should not be so difficult to see how the model might come up with results like these. Obama won by 29 points in South Carolina and 28 points in Virginia, the two states that share a long border with North Carolina. Although North Carolina is somewhat whiter than South Carolina, and somewhat less wealthy than Virginia, this projection nevertheless looks perfectly reasonable next to those numbers.

And indeed, this projection did match the polling data until relatively recently. On April 22, the morning before the Pennsylvania primaries, Obama led Clinton 51-36 in the Real Clear Politics average in North Carolina, a 15-point margin that matches our regression output almost perfectly.

If North Carolina were North Dakota, I'd probably just caveat this as "Clinton may presently be outperforming her long-term demographic trends" and leave it at that. However, North Carolina is a Southern state -- and pollsters have had an awful lot of trouble polling the South. Let me bring up two interesting facts.

Interesting Fact #1: The polls have significantly underestimated Barack Obama's margin of victory in Southern states with substantial black populations.



That is a graph I drew up a few weeks ago comparing the actual results in a state to the final Pollster.com averages. As you can see, Obama outperformed his polling averages in almost all Southern states: by 3 points in Tennessee, 5 in Florida, 6 in Missouri, 9 in Mississippi, 11 in Virginia, 15 in Alabama, 18 in South Carolina, and 22 in Georgia (the only exception among Southern states that received widespread polling was in Texas). Moreover, the discrepancies appear to be related to the number of African-Americans in the population. In North Carolina, which is about 22 percent African-American, the regression line would forecast an underestimate of about 10 points. Add those 10 points to the 6- or 7-point lead that Obama presently holds in the North Carolina polling consensus -- and you get right to my model's estimate of a 17 point win.

Interesting Fact #2: Early voting data in North Carolina suggests that the pollsters may be significantly underestimating the proportion of African-Americans in the voting population.

As we discussed over the weekend, we have a unique wealth of data in North Carolina the likes of which I don't recall seeing ever having seen before. The state's Board of Elections has not only released the number of early voters, but also provided significant information about their demographics. According to the Center for Congressional and Presidential Studies, fully 40 percent of North Carolina's roughly 400,000 early voters (probably one-third to one-quarter of the eventual turnout) are African-American. By comparison's sake, Mark Blumenthal has found that most pollsters are assuming 32-33 percent black turnout.

The difference between these two estimates is significant. As you've probably found if you've played around with our North Carolina prediction tool, an increase of 1 percent in the fraction of the electorate that is African-American translates to roughly a 1-point increase in Barack Obama's margin over Hillary Clinton. So -- if the pollsters are assuming 33% black turnout when it will actually be 40%, that would add 7 points to Obama's margin -- putting us in the 13-14 point range, again fairly close to my model's estimates.

As Blumenthal notes, the Obama campaign has tended to place more of an emphasis on early voting -- and so it's possible that his voters are just getting to the polls sooner. On the other hand, studies in other states suggest that people who vote early tend to be of higher socioeconomic status -- whereas black voters as a group tend to be of lower socioeconomic status, particularly in a state like North Carolina. Moreover, 61% of the early voters are female, which is not typically an Obama demographic. (Although, there are a couple of alternate hypotheses for this: the black electorate tends to be disproportionately female, so this could be another reflection of the higher African-American turnout. Also, men tend to procrastinate.)

But fundamentally -- would 40% black turnout be a reasonable number in North Carolina? Or is it just completely out-of-bounds? The short answer is that yes, 40% is a perfectly defensible estimate -- and very probably a better estimate than 33%.

I had worked on this problem a little bit before, attempting to estimate African-American turnout as a proportion of the state's underlying African-American population. But I recognized that there is one alternate metric that might be helpful in this regard too: the proportion of a state's John Kerry voters who were African-American. This is reasonably easy to infer based on 2004 exit polling data.

Below are a set of statistics for each state that has held a primary so far and in which at least 5% of that state's population is African-American (I exclude the disputed contests in Florida and Michigan). This compares the size of the African-American electorate in the primary against (i) the percentage of African-Americans in that state's population, and (ii) the percentage of Kerry voters in that state who were African-American.



Note that in North Carolina, 52 percent of John Kerry's voters were African-American. While 40 percent black turnout sounds high compared to a 22 percent population baseline, it does not sound so high compared to this figure.

In fact, by running a simple regression model, we can come up with a relatively good estimate of African-American turnout based on these two figures. The regression model finds that it's helpful to draw from both sources of data, and comes up with the following best-fit equation:

Turnout = .65 * Population + .57 * Kerry
If you run these numbers through for North Carolina, you come up with African-American turnout of 44% (you also come up with 15% in Indiana).

Now, I don't necessarily expect that African-American turnout is going to be quite that high -- and I should point out that the standard error of the forecast is fairly large (roughly +/- 5 points). But it does seem to me that 40 percent -- which falls within that standard error interval -- is a pretty reasonable estimate. I am not so sure about 33 percent. That would imply turnout of approximately 63 percent of the black share of the Kerry vote. Only one other state (Oklahoma, where Obama did not really wage a campaign) came in that low. Other Southern states have been in the range of 64 percent (Arkansas) to 126 percent (Missouri).

I did play around with some more sophisticated versions of the model that looked at things like the amount of campaign activity in a state, or combined the data you see above with pre-election polling from SurveyUSA. These alternate methods produced turnout estimates ranging from about 37 percent to 43 percent -- but orienting around the 40 percent number. Considering that nearly 400,000 votes have already come in at a turnout rate of 40 percent and that other polls in Southern states have substantially underestimated Barack Obama's victory margin, I would be surprised if African-American turnout came in lower than about 37 percent in North Carolina. In fact, that figure is probably conservative. If I am right about this, then a double-digit win for Obama in North Carolina is not only possible, but perhaps somewhat likely.

By the way -- I think I know what the pollsters are doing wrong. They're calibrating black turnout to a proportion of the population, but not to a proportion of the Kerry vote. This is a significant mistake, because in some states, the vast majority of the available white voters will vote as Republicans -- meaning that black voters make up a larger share of what remains in the Democratic electorate. It may even be that the Obama campaign recognizes all of this, which is why they have devoted a disproportionate share of their resources to Indiana. But, we will know soon enough. I am prepared to be spectacularly wrong on Tuesday.

EDIT: Public Policy Polling's Tom Jensen suggests that there is some history in North Carolina of black voters tending to make up a larger share of early voters. Still, the presidential primary in North Carolina was not competitive in either 2000 or 2004, so I'm not sure how much can be inferred from that. If you want to pin me to a number, I'll take 38% black turnout.

There's More...

35 comments

Tuesday, April 1, 2008

The Reverse Bradley Effect: Fact or Fiction?

Over at The Stump, Noam Scheiber tracked down a Pew Research study that purports to identify the presence of both a Bradley Effect and a Reverse Bradley Effect in the results to date in the Democratic primaries. For those of you who are not familiar with the Bradley Effect, the long and short of it is the idea that black candidates tend to systematically underperform their polls on election day, perhaps because people lie to interviewers in surveys so as to be politically correct. The Reverse Bradley Effect would be -- well, just the reverse -- a black candidate systematically overperforming his poll numbers for some or another reason.

The gist of the Pew study can be found in the chart below (larger version here), which looks at the extent to which Barack Obama overperformed or underperformed his polls in a number of primary states, and compares that to the African-American population in the state. The authors report that Obama has outperformed his polls in states with high African-American populations, and underperformed them in states with low ones. They find a correlation of .74, which is quite high -- it would imply that more than half of the state-by-state polling errors are explained by the racial composition of that state alone.


I have a couple of issues with the way this study was conducted. Actually, just one issue, but it's a pretty big one. The authors seem to have cherry-picked their states. They exclude states like Connecticut, Maryland, and New York that had closed primaries, but give no explanation as to why. They do not include Vermont, even though it appears to have met their standard of having three polls conducted in the week before the election. They don't include Florida which, officially-sanctioned primary or no, would seem to be as useful as any other data point insofar as the Bradley Effect goes. They don't include the caucus states of Iowa and Nevada (there is actually a decent argument for that, since caucuses are conducted in public rather than the privacy of a voting booth, but I would tend to be inclusive rather than exclusive when testing my own hypothesis). They do include the "outlier" of Wisconsin, but seem to be annoyed by it -- as though it's Wisconsin's fault for not conforming to their hypothesis.

So, I attempted to recreate their analysis, pulling my own numbers from Pollster.com, and adding back in the blacklisted states. My imitation of their graph is below:



Putting these other states back in turns out not to make all that much difference; the correlation drops from .74 to .62. Still, we notice the presence of a few more states, like Vermont and Iowa, that don't seem to fit the hypothesis.

You'll notice I've done something else too, which is to color code the graph. The states in blue are those that we define as Southern (this includes Florida and Virginia, but not Maryland or Missouri), whereas those in red are the rest of the country. The claim for the Reverse Bradley Effect is really just based on the strong pull exerted by five states: Virginia, Alabama, South Carolina, Georgia, and Mississippi. All those states have high black populations, but they also have another thing in common, which is that they are all Southern.

But what if we look at other states that have relatively high black populations, but are not in the South? New York and Illinois have fairly substantial black populations -- but the polls were spot-on in each of those locations. In Maryland -- which I consider a Northern state, and which demographically has much more in common with other Northern states than anything in the South -- Obama outperformed his polls, but only barely so. New Jersey has a relatively large black population and Obama slightly underperformed his polls there.

So instead of drawing one regression line, let's draw two: one to represent the South and the other to represent the rest of the country.



Well, it looks like we are dealing with two completely different sets of behaviors. The relationship in the South is quite strong -- just as strong as the Pew authors found originally. But there is virtually no relationship between race and Obama's performance at the ballot booth elsewhere in the country -- the slight correlation you see is nowhere near statistically significant. The polls overestimated Obama's performance in New Hampshire and Massachusetts -- but underestimated it in Vermont and Wisconsin. They were largely accurate in states like New York and Maryland that have substantial black populations. There is just nothing happening with the Bradley Effect outside of the South, at least so far as this data can tell us.

Also of note: my study identifies the presence of a Reverse Bradley Effect in the South (Obama outperforming his polls in states with higher percentages of black voters), but not the presence of the plain ol' Bradley Effect in either the North or the South. At no point do my regression lines for either region run substantially above zero, which is the point at which Obama would begin to underperform his polls.

Put differently: there is nowhere in the country where we have reason to subtract from Obama's poll numbers based on the Bradley Effect. (Yes, Obama has underperformed his polls in some "white" states -- but he has overperformed them in others that are whiter than Kurt Rambis). On the other hand, there is one specific group of states where we might want to add to Obama's polls based on the Reverse Bradley Effect, which are Southern states with high African-American populations.

If the Reverse Bradley Effect is real, what is its raison d'ĂȘtre? There is a fair amount of academic literature on the effects of the race of the interviewer on survey results. People can guess quite accurately the race of the person on the other end of the line, and they might respond differently depending on that perception. It is probably safe to assume that the majority of interviewers are white, and correctly perceived as white. A white voter might not want to tell a (presumably) white interviewer that they're voting for the black guy. Or, a black voter might feel intimidated by a (presumably) white interviewer, and not want to tell him they're voting for the black guy. These effects may be far more tangible in the South, in which race is a much more explicit consideration in everyday life. Three other quick comments on this:

1. If I had to guess, I would guess that black voters might be more likely not to want to reveal their true candidate choice than white voters. This is because I have noticed that a lot of black voters tend to be classified as 'undecided' in pre-primary surveys, which might indicate the hedging of bets.

2. As Scheiber and others have noted, there appears to be less of a Reverse Bradley Effect in polls conducted by agencies like Rasmussen, Survey USA and PPP that use automated calling scripts (robopolls). This requires further research -- but if true it would tend to validate my hypothesis.

3. Something else that is worth mentioning: Americans tend to think that other people have more racial hangups than they claim to have themselves. This might be why there is a Reverse Bradley Effect, rather than a Bradley Effect. You don't think you're a racist -- but you think the person on the other end of the line might be, and so you lie about your candidate choice so as not to offend them.

With all that having been said, I may be overselling the Reverse Bradley Effect. What we know is that in Southern states with large black populations, Obama has outperformed his polls by a statistically significant margin. But we're just guessing at why this is the case. It could be because the pollster's turnout models are screwed up. It could be because minority voters are more likely to use cellphones as their primary line, and therefore won't make it into surveys. It could be because of something I call the Frontrunner Effect, which is that there is some tendency for candidates who are already ahead in the polls to run up the score on election day.

Nor do we know if we have identified a universal effect, or whether it's something specific to Obama and Clinton. Perhaps Southern voters feel badly voting against Hillary Clinton, who claims some heritage in the region. But the same might not be true when Obama is matched against John McCain, whom Southerners tend to feel lukewarm about. Or, it could be that there is a Reverse Bradley Effect among Democrats but a real Bradley Effect among Republicans, and the two things will cancel one another out in the general election. If you had to bet against the spread, however, you might want to take the over on Obama's numbers in states like North Carolina and Virginia.

There's More...

12 comments

Friday, March 28, 2008

Today's Polls: Bad news for both Dems in Virginia

If the old adage about burying bad news on a Friday afternoon is correct, then perhaps Scott Rasmussen was trying to soften the blow for the Democrats. His new poll of Virginia -- presently available only on video -- shows McCain with an 11 point lead over Barack Obama and a 22 point lead over Hillary Clinton.

Virginia is one of Obama's more important states; it's an essential part of any kind of Plan B that involves him winning the election without carrying a state like Pennsylvania. Previous polling of the state had been more encouraging to him, including a Survey USA poll released last week that showed him a percentage point ahead of McCain. Still, his win percentage there has dropped from 46% to 35% on this survey.

Clinton, meanwhile, looks like she will not win Virginia except in a landslide; the state has dropped entirely off of her Swing State list.

It does seem, by the way, that Rasmussen has generally had harsher news for the Democrats than Survey USA. Chris Bowers has an interesting theory about this, suggesting that McCain may have a hidden advantage in robopolls like Rasmussen because they are less likely to be susceptible to the Wilder Effect (the existence of which, for the record, I am at most agnostic about). This doesn't quite work when comparing Rasmussen to Survey USA, since Survey USA is a robopoll too, but Chris's column is worth a read.

BTW, I've now included a small box along the left-hand side of the page indicating the last time the charts and graphs were updated, as well as the number of days until the election.

There's More...

7 comments