Showing posts with label best-of. Show all posts
Showing posts with label best-of. Show all posts

Tuesday, April 1, 2008

The Reverse Bradley Effect: Fact or Fiction?

Over at The Stump, Noam Scheiber tracked down a Pew Research study that purports to identify the presence of both a Bradley Effect and a Reverse Bradley Effect in the results to date in the Democratic primaries. For those of you who are not familiar with the Bradley Effect, the long and short of it is the idea that black candidates tend to systematically underperform their polls on election day, perhaps because people lie to interviewers in surveys so as to be politically correct. The Reverse Bradley Effect would be -- well, just the reverse -- a black candidate systematically overperforming his poll numbers for some or another reason.

The gist of the Pew study can be found in the chart below (larger version here), which looks at the extent to which Barack Obama overperformed or underperformed his polls in a number of primary states, and compares that to the African-American population in the state. The authors report that Obama has outperformed his polls in states with high African-American populations, and underperformed them in states with low ones. They find a correlation of .74, which is quite high -- it would imply that more than half of the state-by-state polling errors are explained by the racial composition of that state alone.


I have a couple of issues with the way this study was conducted. Actually, just one issue, but it's a pretty big one. The authors seem to have cherry-picked their states. They exclude states like Connecticut, Maryland, and New York that had closed primaries, but give no explanation as to why. They do not include Vermont, even though it appears to have met their standard of having three polls conducted in the week before the election. They don't include Florida which, officially-sanctioned primary or no, would seem to be as useful as any other data point insofar as the Bradley Effect goes. They don't include the caucus states of Iowa and Nevada (there is actually a decent argument for that, since caucuses are conducted in public rather than the privacy of a voting booth, but I would tend to be inclusive rather than exclusive when testing my own hypothesis). They do include the "outlier" of Wisconsin, but seem to be annoyed by it -- as though it's Wisconsin's fault for not conforming to their hypothesis.

So, I attempted to recreate their analysis, pulling my own numbers from Pollster.com, and adding back in the blacklisted states. My imitation of their graph is below:



Putting these other states back in turns out not to make all that much difference; the correlation drops from .74 to .62. Still, we notice the presence of a few more states, like Vermont and Iowa, that don't seem to fit the hypothesis.

You'll notice I've done something else too, which is to color code the graph. The states in blue are those that we define as Southern (this includes Florida and Virginia, but not Maryland or Missouri), whereas those in red are the rest of the country. The claim for the Reverse Bradley Effect is really just based on the strong pull exerted by five states: Virginia, Alabama, South Carolina, Georgia, and Mississippi. All those states have high black populations, but they also have another thing in common, which is that they are all Southern.

But what if we look at other states that have relatively high black populations, but are not in the South? New York and Illinois have fairly substantial black populations -- but the polls were spot-on in each of those locations. In Maryland -- which I consider a Northern state, and which demographically has much more in common with other Northern states than anything in the South -- Obama outperformed his polls, but only barely so. New Jersey has a relatively large black population and Obama slightly underperformed his polls there.

So instead of drawing one regression line, let's draw two: one to represent the South and the other to represent the rest of the country.



Well, it looks like we are dealing with two completely different sets of behaviors. The relationship in the South is quite strong -- just as strong as the Pew authors found originally. But there is virtually no relationship between race and Obama's performance at the ballot booth elsewhere in the country -- the slight correlation you see is nowhere near statistically significant. The polls overestimated Obama's performance in New Hampshire and Massachusetts -- but underestimated it in Vermont and Wisconsin. They were largely accurate in states like New York and Maryland that have substantial black populations. There is just nothing happening with the Bradley Effect outside of the South, at least so far as this data can tell us.

Also of note: my study identifies the presence of a Reverse Bradley Effect in the South (Obama outperforming his polls in states with higher percentages of black voters), but not the presence of the plain ol' Bradley Effect in either the North or the South. At no point do my regression lines for either region run substantially above zero, which is the point at which Obama would begin to underperform his polls.

Put differently: there is nowhere in the country where we have reason to subtract from Obama's poll numbers based on the Bradley Effect. (Yes, Obama has underperformed his polls in some "white" states -- but he has overperformed them in others that are whiter than Kurt Rambis). On the other hand, there is one specific group of states where we might want to add to Obama's polls based on the Reverse Bradley Effect, which are Southern states with high African-American populations.

If the Reverse Bradley Effect is real, what is its raison d'ĂȘtre? There is a fair amount of academic literature on the effects of the race of the interviewer on survey results. People can guess quite accurately the race of the person on the other end of the line, and they might respond differently depending on that perception. It is probably safe to assume that the majority of interviewers are white, and correctly perceived as white. A white voter might not want to tell a (presumably) white interviewer that they're voting for the black guy. Or, a black voter might feel intimidated by a (presumably) white interviewer, and not want to tell him they're voting for the black guy. These effects may be far more tangible in the South, in which race is a much more explicit consideration in everyday life. Three other quick comments on this:

1. If I had to guess, I would guess that black voters might be more likely not to want to reveal their true candidate choice than white voters. This is because I have noticed that a lot of black voters tend to be classified as 'undecided' in pre-primary surveys, which might indicate the hedging of bets.

2. As Scheiber and others have noted, there appears to be less of a Reverse Bradley Effect in polls conducted by agencies like Rasmussen, Survey USA and PPP that use automated calling scripts (robopolls). This requires further research -- but if true it would tend to validate my hypothesis.

3. Something else that is worth mentioning: Americans tend to think that other people have more racial hangups than they claim to have themselves. This might be why there is a Reverse Bradley Effect, rather than a Bradley Effect. You don't think you're a racist -- but you think the person on the other end of the line might be, and so you lie about your candidate choice so as not to offend them.

With all that having been said, I may be overselling the Reverse Bradley Effect. What we know is that in Southern states with large black populations, Obama has outperformed his polls by a statistically significant margin. But we're just guessing at why this is the case. It could be because the pollster's turnout models are screwed up. It could be because minority voters are more likely to use cellphones as their primary line, and therefore won't make it into surveys. It could be because of something I call the Frontrunner Effect, which is that there is some tendency for candidates who are already ahead in the polls to run up the score on election day.

Nor do we know if we have identified a universal effect, or whether it's something specific to Obama and Clinton. Perhaps Southern voters feel badly voting against Hillary Clinton, who claims some heritage in the region. But the same might not be true when Obama is matched against John McCain, whom Southerners tend to feel lukewarm about. Or, it could be that there is a Reverse Bradley Effect among Democrats but a real Bradley Effect among Republicans, and the two things will cancel one another out in the general election. If you had to bet against the spread, however, you might want to take the over on Obama's numbers in states like North Carolina and Virginia.

There's More...

12 comments

Thursday, March 20, 2008

Wright and the Obamacans

One of the many nice things about Survey USA is the extensive set of interactive cross-tabulations that they release with every poll. Survey USA has now released polls in fifteen states that were taken at the height of the Jeremiah Wright controversy (this past Friday through Sunday). We can compare the demographic groups in these polls to Survey USA's previous set of polls, which were conducted in the last couple days of February.

I will be keeping track of five sets of demographic characteristics that Survey USA included among both sets of polls: gender, age, race, party ID, and orientation on the political spectrum (conservative/moderate/liberal). Another demographic that would be nice to look at -- income levels -- was tracked by Survey USA in their February polls but not in their March polls, so we have no choice but to ignore it.

One other methodological annoyance: Survey USA used different age brackets between the different surveys. Although the 18-34 age group was common to both polls, Survey USA handled the other groups of voters differently. I will be lumping the brackets together as follows:

Age       February    March
"Young" 18-34 18-34
"Mid-Age" 35-54 35-49
" " 50-64
"Old" 55+ 65+
Otherwise, this is a rather straightforward exercise: I'm merely comparing Obama's net advantage against McCain between the February and the March surveys. If Obama was leading among whites in Oregon by 6 points in February, but he trailed by 2 points in March, that would be recorded as a "-8".

Here come the numbers:



(That chart might be a little hard to read, so I've temporarily created a larger version of it along the right-hand sidebar. Yes, we're the only website in the world that devotes more space to its sidebar than its main column).

Let's pick through these demographic groups one by one:

Gender: Obama's margin declined by a slightly larger margin among women (7 points) than men (4 points), but the differences are small enough that they're probably not worth worrying about. The gender gap was most noticeable in the Midwestern states. Between Missouri, Kansas, Iowa, Minnesota, Ohio and Wisconsin, Obama declined by 12 points among women but 4 points among men. Elsewhere in the country, the gender differences were roughly equal.

Age: There do seem to be some age-related effects, with the Wright story tending to have done more damage to Obama among older voters, but because of the ambiguities of Survey USA's age brackets, it would be dubious to come to too many conclusions. Taking the 55-64 year olds and shifting them from "Old" to "Mid-Age", as we had to do here, would likely have a deleterious effect on Obama's numbers irrespective of the Wright controversy.

Race: Perhaps unsurprisingly, Obama lost ground in every state except Oregon amongst whites. He gained ground amongst blacks in all states except the three Southern states: Alabama, Kentucky, and Virginia, where he actually backtracked a little bit. The polling data on Hispanic voters is mixed ... the only two states with a whole lot of Hispanic voters were California and New Mexico, and Obama gained a bunch of ground amongst them in California (+24), while losing a bit in New Mexico (-5). Notice, by the way, how the media seems to have completely forgotten about Hispanic voters now that we have a good ol' fashioned white-black racial controversy to kick around.

Party ID: Obama lost the most ground -- an average of 9 points -- amongst Republicans. This is actually fairly hard to do, because there weren't that many Republicans voting for him to begin with. But for the time being, the Obamacans appear to be in hibernation. The interesting piece of news for Obama is that he lost hardly any ground at all amongst independents, although the results bounced around rather radically from state to state. (Survey USA tends to have fewer self-identified independents in their surveys than other pollsters, and so the sample sizes are a little smaller).

Political Orientation: But here's the weird thing. While Obama lost the most ground amongst Republicans on the party spectrum, he lost the most ground among liberals on the political spectrum: 11 points among libs, as compared to 5 points among moderates and just 2 among conservatives (among whom he had little of the vote to begin with).

What to make of these seemingly contradictory results?

Actually, I don't have a great answer for you. Let's try and get a discussion going. But it certainly looks to be thatliberal and moderate Republicans -- not independents, but specifically voters who identify as Republican -- who were willing to indulge the idea of an Obama vote before, have at least temporarily reverted to their base.

Now, if you accept that this is what has gone on, there are a couple of takes you might have on this. Shall we spin the wheel?



Spin #1. These voters were inherently soft, vulnerable supporters of Obama anyway. It is asking a lot for Republicans to cross over and vote for a Democrat -- only 6 percent of them voted for Kerry in 2004. The way Obama -- or Clinton for that matter -- was always going to win this election was by turning out the base, winning over independents, and taking advantage of the blue-leaning shifts in party identification throughout the country. You do two out of those three things (and one of them is really a gimme), and you'll probably win the election. You do all three, and you win big.

On the other hand, if these voters were soft supporters of Obama, that likely now means that their support for John McCain is also fairly soft. There are really relatively few swing voters in the general election -- 80% of the country is voting reflexively by party ID -- which is why polling numbers in general elections are much more stubborn than polling numbers in primaries. But, sort of Linc Chafee / Olympia Snowe Republicans, and perhaps also libertarian-leaning Republicans, are a group of voters that must feel authentically conflicted about what to do. It doesn't take a lot to shift them from one group to the other -- nor might it take a lot to shift them back.

These may also have been voters who were intrigued by Obama's unity message, a brand which was at least temporarily damaged by Jeremiah Wright.

Spin #2. On the other hand, perhaps this also has to do with media consumption habits. The Wright story was handled very differently from media outlet to media outlet, from a full-frontal assult on Obama on FOX News, to a relatively benign treatment at the New York Times. (This effect is even more noticeable in the wake of Obama's speech on Tuesday, which has acted as a depth charge of sorts for partisan conservative pundits). Do Obamacans still watch FOX News and listen to Rush Limbaugh? My hunch is that they do -- that it forms their sort of home base for media coverage, even if they often disagree with its conclusions. When the conservative media went from playing relatively nice with Obama to bashing him non-stop, there was going to be an effect; the Wright incident may have catalyzed it.

Actually, now that that's written, these are really part and parcel of the same explanation. Swing Republicans were vulnerable to being swung -- and the Wright story, amplified by the conservative media, managed to swing some of them. Will Obama's speech swing them back? I don't know. As I mentioned above, it is inherently an uphill battle to ask a voter to cross party lines for you. On the other hand, I would guess that these folks are fairly sophisticated political animals -- you have to have a fairly well-thought out political philosophy to maintain an identification as a Republican these days, but ponder voting for a Democrat for President. And that means they might have been among the 2.3 million Americans and counting who have seen the director's cut of Obama's speech, rather than the sound byte version. It is likely to take at least a couple of weeks before we know for sure.

There's More...

3 comments

Monday, March 17, 2008

The Six Types of Voters

Although the story hasn't quite yet broken through to the dead-tree media, some influential online writers are starting to pick up upon the fact that strength in the primaries is not necessarily so strongly related to strength in the general election. Noam Schieber noticed that the same Strategic Vision poll that had Clinton leading by 18 points in the Pennsyvania primary had Obama performing a couple of points better against McCain in the general election. MSNBC's First Read discovered this too.

Meanwhile, Rasmussen Reports has put out a whole number of polls this week that show Obama running at least as well as Clinton in so-called "Clinton states":

State       Date      Obama vs McCain   Clinton vs McCain
CA 3/13 Obama +15 Clinton +7
OH 3/13 McCain +6 McCain +6
FL 3/12 McCain +4 McCain +7
NY 3/11 Obama +13 Clinton +12
MI 3/10 McCain +3 McCain +3
PA 3/10 McCain +1 McCain +2
Each of these are states that Clinton won (California), "won" (Michigan), or presumes to win (Pennsylvania). And yet Obama is running better than her in four of the six polls, and is tied with her in the other two. In the interest of full disclosure, it should be pointed out that most of these Rasmussen polls were taken mid-week, before the Jeremiah Wright story really hit the media. As of this writing, ironically, Obama outperforms Clinton in both the Rasmussen and Gallup primary tracking polls -- but Clinton slightly outperforms him against McCain (by small margins in all cases).

It might help to abstract the situation and consider the rest of the electoral cycle from the perspective of Instant Runoff Voting. Suppose you asked each voter to rank her choices -- Clinton, Obama and McCain -- from one to three. There are six possible permutations -- and each of them can reasonably be associated with one or another demographic group, or at least a stereotype thereof.

1. Clinton-Obama-McCain. These are likely to be the choices of mainline, establishment Democratic voters, especially women, Hispanics, and working-class voters outside of the South. If you look at the choices of Democratic voters so far in the primaries, they form a sort of donut hole: Obama does better among voters on the far left, and he also does better among moderates and independents. But Clinton does better with the "median", more traditional Democratic voter in the middle of the donut.

2. Obama-Clinton-McCain. The most likely order of preference for blacks, young voters, and progressives and other "latte liberals".

3. Clinton-McCain-Obama. This ordering may be fairly common among two groups: Southern Baptsits and other evangelical Democrats, and some older voters.

4. Obama-McCain-Clinton. Probably a common ranking for independents, as well as some anti-establishment (and anti-Clinton) voters to the left and center of the political spectrum.

5. McCain-Clinton-Obama. According to Mark Blumenthal, this was a common ordering in Mississippi. A lot of Republicans may have voted tactically in the Democratic primary -- but unless they were lying to exit pollsters, it also appears that they genuinely preferred Clinton to Obama as their #2 choice. I think we'd find the majority of social conservatives in this group, especially in the South, as well many national security conservatives, and a significant minority of suburban women.

6. McCain-Obama-Clinton. On the other hand, I'd guess that most economic conservatives and many libertarians end up here, and certainly most right-leaning independents; also perhaps some anti-Bush and anti-war Republicans.

So, these are the six fundamental classes of voters. Of course, you could get more technical than this if you'd like. For example, we've neglected those voters who might vote for one and only one candidate; I suspect the decision rule for some black voters, for instance, is "Vote Obama, otherwise sit out". But sticking with these groups, it is fairly easy to manipulate the numbers so that we can produce pretty much any result we like -- including primary results that differ substantially from general election results. We might hypothesize the following distribution of voters in Pennsylvania, for instance:



In this case, we have a plurality (30%) of Type 1 voters -- mainline, working-class Democrats who prefer Clinton to Obama, but Obama to McCain. That sounds a lot like Pennsylvania. And what happens? We have Clinton winning 60-40 in a match-up between Clinton and Obama. Both both Democrats draw 50 percent of the vote against McCain. (The eagle-eyed among you will notice that we have McCain-preferring voters voting in the Democratic 'primary', but since we have them splitting their votes evenly, it shouldn't matter very much).

On the other hand, in Mississippi -- where Obama beat Clinton in the primary, but Clinton has done better against McCain in general election polls -- perhaps the electorate looks like this:



To get this result, you need a lot of Type 2 voters, and Type 5 voters. Our Type 2 classification includes most African-Americans, whereas our Type 5 classification includes most religious conservatives. Sounds a lot like Mississippi.

Each state is going to have its own unique fingerprint. In Nevada -- where Obama significantly outperforms Clinton in general election polls -- you probably have a lot of Type 1's (Clinton-Obama-McCain) but also some Type 4's (Obama-McCain-Clinton). In Illinois, where Obama beat Clinton easily, but both Democrats trounce McCain, there might be a lot of Type 2's.

This is not to say that there is no relationship between voting in the general election and the primary. If we compare the Obama-Clinton margin in the primaries in the states that have voted so far (counting Florida but not Michigan), to the current differences in general election polling in the fivethrityeight.com modified polling averages, we certainly can see some correlation:



However, there are also enough differences that a state can wind up in one column in the primaries and another in the general -- including particularly those states that are close enough to begin with where such differences are likely to matter. Following are our present win percentage estimates in all the states where Clinton claims victory (including Florida and Michigan, as well as Texas, but not states like Pennsylvania that haven't voted yet):




We see Clinton running more strongly by notable margins in Florida, Tennessee, Arkansas, and New Jersey, and slightly stronger in Massachusetts and New York. On the other hand, Obama does considerably better in New Mexico, Nevada, and New Hampshire; a reasonable amount better in Michigan, and a little bit better in California.

If we multiply the win percentage by the number of electoral votes in each state, we come up with 151.9 for Obama and 156.3 for Clinton -- that is Clinton has an overall advantage of 4 or 5 electoral votes in states that Clinton won. That's a trivial difference, one that for example can be explained by Arkansas alone (although, we shouldn't dismiss Arkansas; those are six electoral votes that Clinton will get and Obama won't).

Now, let's look at the Obama states:



There are some Obama states, like Georgia and Utah, in which neither Democrat really has a chance. And there are also a couple -- like Mississippi -- where the general election advantage tips to Clinton. However, there is also a group of about a half-dozen states -- Virginia, Washington, Minnesota, Wisconsin, Colorado, Iowa -- in which Obama has a substantial advantage over Clinton in general election polling, as well as some longer shots like North Dakota and Alaska. All together, Obama projects to get 112.7 electoral votes out of "Obama states" to Clinton's 74.1 -- close to a 40-EV advantage.

For the time being -- and that is an important caveat, because these numbers may be changing even as we speak -- Obama has at least a 15% advantage in win percentage over Clinton in 18 states totaling 113 electoral votes:

Virginia (13), Washington (11), Wisconsin (10), Minnesota (10), Colorado (9), Iowa (7), Connecticut (7), Oregon (7), New Mexico (5), Nevada (5), Nebraska (5), New Hampshire (4) Maine (4), Hawaii (4), Alaska (3), North Dakota (3), South Dakota (3), Montana (3)

And Clinton has at least a 15% advantage over Obama in five states totaling 64 electoral votes:

Florida (27), New Jersey (15), Tennessee (11), Arkansas (6), West Virginia (5)

For Clinton to win an electability argument based on the polls, she'd have to be much more likely than Obama to win states like Michigan, Ohio, Pennsylvania, California and New York. But presently, that is not what the numbers show; the Democrats either perform about equally in those states (as in Ohio and Pennsylvania), or the states are unlikely to be contested in a competitive election (as in California -- where Obama is actually outpolling her -- or New York).

Clinton, certainly, can make other sorts of electability arguments not based on the polls -- but those are really the only arguments available to her for right now. And one argument she can't make is to point reflexively to the primary results; we have a robust enough set of general election polling now that such arguments are out of date.

There's More...

7 comments

Thursday, March 13, 2008

Does Obama benefit from an active primary campaign?

I decided to look at something relatively basic tonight that I should have looked at a long time ago. Do the Democratic candidates fare better in the polls in the states in which there has been an active primary campaign?

It turns out that the answer appears to be yes for one of the candidates.

Consider the following:

In Ohio, where there has been a primary campaign, Barack Obama leads McCain by 2.6 points in our weighted average of polls. In Pennsylvania, where there hasn't yet been a campaign, Obama trails McCain by 1.3 points.

In Washington, where there has been a primary campaign, Obama leads McCain by 9.7 points. In Oregon, where there hasn't been a campaign, Obama leads McCain by 6.3 points.

In North Dakota, where there has been a campaign, Obama led McCain by 4 points in the only poll. In South Dakota, where there hasn't yet been a campaign, he trails McCain by an average of 6.6 points.

In Tennessee, where there has been a campaign, Obama trails McCain by 15.4 points. In Kentucky, where there hasn't been a campaign, he trails McCain by 23.3 points.
These numbers -- apples-to-apples comparisons of apparently similar states -- seem to lead to the conclusion that Obama might benefit from having had the opportunity to campaign in a state, perhaps on the order of about 5 points in his general election polls. And if we plop a 'campaign' dummy variable into our regression analysis -- which is set to '1' in every state that has voted in a primary or caucus so far, and '0' in the ten states (including Florida and Michigan) that haven't, it turns out to be highly statistically significant:

Variable    Coef.   St. Err. t-score P>|t|
Campaign 5.70 2.11 2.70 0.010
Kerry 0.43 0.65 6.65 0.000
Baptist -0.53 0.12 -4.36 0.000
AfAm 0.21 0.14 1.49 0.144
$_Obama 8.29 2.19 3.78 0.000
$_Clinton -6.38 1.96 -3.25 0.002
$_McCain -6.77 3.50 -1.94 0.059

Constant 2.68 2.31 1.16 0.252
OK -- so I know that you didn't come here to read regression output. But what that is saying is that Obama is polling about 5.7 points better in states that have participated in the primary process so far, all else being equal. And it's saying that this finding is highly unlikely (around 1000-to-1 against) to be the result of chance alone. This is a robust finding too. If you remove other variables that might be related to the presence of a campaign -- the fundraising numbers, for instance -- the campaign variable continues to show up at about the same level of significance.

This pattern does not show up for Clinton. Her campaign variable is not statistically significant.

But this has been one of the big themes with Barack Obama this year: the more voters get a chance to know him, the more they seem to like him. That's how he improved his standing in essentially every state that has voted so far as time wore on; check out the pollster.com graphs for more detail.

Now, there are two ways to interpret this data. Number one is that the states where Obama campaigned are still basking in his afterglow, and will eventually come back down to earth. That's the pessimistic interpretation. The optimistic interpretation is that these improvements in his poll standing are permanent, and that he can therefore expect to gain in general election polls in states like Pennsylvania, Indiana and North Carolina as the campaign wears on. If the optimistic interpretation is true, that means that both Florida and North Carolina should be highly competitive in the fall, and maybe Indiana too, and that that Obama can expect to move back ahead of McCain in the Pennsylvania polling averages.

There's More...

7 comments

Tuesday, March 11, 2008

Pollster Ratings v2.0

I recently subscribed to PollingReport.com, and had some time today to backfit a whole bunch of polling data from previous election cycles into my pollster report card. I now have a database of over 150 competitive contests since 2000. This includes essentially every competitive presidential race, and most competitive primary races, Senate races, and Governor races. I also expanded the playing field