Thursday, March 13, 2008

Does Obama benefit from an active primary campaign?

I decided to look at something relatively basic tonight that I should have looked at a long time ago. Do the Democratic candidates fare better in the polls in the states in which there has been an active primary campaign?

It turns out that the answer appears to be yes for one of the candidates.

Consider the following:

In Ohio, where there has been a primary campaign, Barack Obama leads McCain by 2.6 points in our weighted average of polls. In Pennsylvania, where there hasn't yet been a campaign, Obama trails McCain by 1.3 points.

In Washington, where there has been a primary campaign, Obama leads McCain by 9.7 points. In Oregon, where there hasn't been a campaign, Obama leads McCain by 6.3 points.

In North Dakota, where there has been a campaign, Obama led McCain by 4 points in the only poll. In South Dakota, where there hasn't yet been a campaign, he trails McCain by an average of 6.6 points.

In Tennessee, where there has been a campaign, Obama trails McCain by 15.4 points. In Kentucky, where there hasn't been a campaign, he trails McCain by 23.3 points.
These numbers -- apples-to-apples comparisons of apparently similar states -- seem to lead to the conclusion that Obama might benefit from having had the opportunity to campaign in a state, perhaps on the order of about 5 points in his general election polls. And if we plop a 'campaign' dummy variable into our regression analysis -- which is set to '1' in every state that has voted in a primary or caucus so far, and '0' in the ten states (including Florida and Michigan) that haven't, it turns out to be highly statistically significant:

Variable    Coef.   St. Err. t-score P>|t|
Campaign 5.70 2.11 2.70 0.010
Kerry 0.43 0.65 6.65 0.000
Baptist -0.53 0.12 -4.36 0.000
AfAm 0.21 0.14 1.49 0.144
$_Obama 8.29 2.19 3.78 0.000
$_Clinton -6.38 1.96 -3.25 0.002
$_McCain -6.77 3.50 -1.94 0.059

Constant 2.68 2.31 1.16 0.252
OK -- so I know that you didn't come here to read regression output. But what that is saying is that Obama is polling about 5.7 points better in states that have participated in the primary process so far, all else being equal. And it's saying that this finding is highly unlikely (around 1000-to-1 against) to be the result of chance alone. This is a robust finding too. If you remove other variables that might be related to the presence of a campaign -- the fundraising numbers, for instance -- the campaign variable continues to show up at about the same level of significance.

This pattern does not show up for Clinton. Her campaign variable is not statistically significant.

But this has been one of the big themes with Barack Obama this year: the more voters get a chance to know him, the more they seem to like him. That's how he improved his standing in essentially every state that has voted so far as time wore on; check out the pollster.com graphs for more detail.

Now, there are two ways to interpret this data. Number one is that the states where Obama campaigned are still basking in his afterglow, and will eventually come back down to earth. That's the pessimistic interpretation. The optimistic interpretation is that these improvements in his poll standing are permanent, and that he can therefore expect to gain in general election polls in states like Pennsylvania, Indiana and North Carolina as the campaign wears on. If the optimistic interpretation is true, that means that both Florida and North Carolina should be highly competitive in the fall, and maybe Indiana too, and that that Obama can expect to move back ahead of McCain in the Pennsylvania polling averages.

7 comments

Rasmus said...

Hi,
I´ve got two questions:
1) Did you update your average numbers in the last 2 days and could you eventually put in the "last update time"?
I was so impressed by this site that I rebuilt your simulation (probably worse, but my numbers come close to yours) and I need to update my numbers with your new averages from time to time.

2) I think we both use the same system: I use your average, modify it with to random numbers that stand for a) errors on this site and MoE of the polls b) future random events in the campaign.

I set this random numbers to +-4 and +-7, but that´s not important because the effect of the randomisation gets smaller and disappears when you simulate the election 500 times or more [I still don´t get how you are able to simulate it 5000x, I wrote a macro that simulates the election 25x and I have to copy&paste the results- I need about 30 min for 5000 elections).

So, do you bring in the 2nd random number BEFORE the states are simulated or in every single state?
I did both, and the results change.
The popular vote stays the same, as it should be, and this is an indicator that the different results were not caused by random, the Clinton and Obama popular vote- difference was less than 0,1%.

But when you bring in the random number in each state alone, you get more wins for the democrats (75% Obama, 55% Clinton) and the margins Obama-McCain/Clinton-McCain get smaller.
When you bring in the random numbers before you do the state simulation, you often get results like 80 EV or 450 EV, and I see those results in your graphic. When you calculate another random number for each state, the results stay within 170-330 EV in 95% of the time.

I think it´s more logical to bring the random number in before doing the states calculation, what do you think?
And what causes the difference?

Rasmus said...

Here are my results with the old averages and the new ones with your new pollster weighting and the PA and MI polls (I think the PA and MI polls had MUCH more effect on it than the pollster weighting, for in the most states there is only the SUSA 50-states poll)

My old results:
Vote% Obama: 50,94
Vote % CLinton: 49,74
EV Obama: 293,86
EV Clinton: 266,47
Elections Won Obama: 58,3%
Elections Won Clinton: 47,4%

My new results/your results from the homepage:
Vote% Obama: 50,85/50,9
Vote % CLinton: 49,56/49,7
EV Obama: 289,90/286,6
EV Clinton: 262,28/252,8
Elections Won Obama: 55,7%/59,5%
Elections Won Clinton: 47,4%/41,4%


I wonder why the popular vote is almost the same and the Electoral Vote is not. OK, the random effect is more important because of the Winner-Takes-All system, but is this all?
I also noticed that there are very often elections where the loser of the popular vote wins the election... Maybe we´ll get a "reverse Gore-effect"^^

Rasmus said...

Oh, and I did 1000 simulations for each the old and the new system.

Beckylooo said...

Thanks so much for this. Glad to have numbers to back up something I've suspected for a while.

George said...

"OK -- so I know that you didn't come here to read regression output."


Actually... I did! I just discovered your site and I love it already. Keep up the good work!

538 said...

Rasmus,

I can definitely include an indication of when the most recent update was going-forward; that is a helpful suggestion.

With respect to your simulations, it's hard to determine *exactly* what you're doing without looking at your spreadsheet, but it does look like we're on largely the same path. Technically, however, I apply three random numbers rather than two, accounting for sampling error, state-specific movement, and national movement. All of these random numbers are applied at the same point in the calculation.

Frankyboy said...

Very interesting blog!

In one of your previous entries, you predicted the outcome of post-Super Tuesday primaries, which were in some cases (e.g. WI, MN) pretty much off the target, i.e. grossly underestimating the Obama vote. This suggests that there may be something wrong with the variables that you are using in your regression.

Have a look at the following two maps: http://www.dailykos.com/story/2008/3/12/13453/0052/834/475116 and
http://en.wikipedia.org/wiki/Image:German1346.gif. Parallels are striking, except for Ohio. But even there, the share of people with German ancestry increases towards the Noth-West, as does the Obama vote share.

I therefore suggest to you to test ancestry as additional variable in your regressions on the primary results. Data should be available from the US Census. Aside from including German ancestry as a variable, it might also make sense to construct a "germanic" (German, Dutch, Scandinavian) ancestry variable, and test both of them alternatively.

If my assumption that Obama has strong appeal for Voters of German (or germanic) origin holds your tests, it might be attributed to two reasons:
First, Obama, with family roots in Kansas, and his political base in Illinois, has culturally embraced voters of German/ germanic origin - otherwise he would never have been able to win Illinois. Thus, in spite of being half African, he may be seen more to be "one-of-us" than Hillary Clinton. Secondly, he is politically taking up (protestant) German values such as unity/non-partisanship, transparency, environmental consciousness, education, innovation/change and peaceful conflict resolution (the last one is arguable - Mr. Rumsfeld has German roots as well).

As to Ohio (and, possibly, PA), it may be that, as with Hispanics, the point is when the immigration took place. The first wave of German emigration to the USA took place in the 18th century (helped by the British heavily recruiting Hessonians to fight Washington's troops, many of which decided to stay in the USA after independence). This emigration wave was quite strongly driven by religious motives (Amish), and should account for most of the German ancestry in OH and PA. The second wave of emigration from Germany to the USA started around 1840, and was to a much stronger extent driven by economic and political motives. Most german ancestry in the North-West stems from that second wave.

If (what needs to be tested) German ancestry plays a significant role in presidential preferences, this could explain why Obama is likely to win several states in the Mid- and North-West against McCain, while Clinton fails to do so.

Some other European ancestries are, by visually comparing the above map and maps on ancestry, as well likely to have distinct preferences. French and Irish-Scotch ancestry, e.g., appears to heavily go for Clinton. For Italian ancestry, I would suppose the same, however, this could partly be attributed to the Clinton home state effect (New York). Irish appear to be somehow split, as do the English. Eastern European ancestry is either rather insignificant, or, in the case of Polish, too much co-concentrated to German ancestry to come up with a conclusion by just comparing maps.

I would be interested to learn, whether my hypothesis holds your empiric tests. In any case, keep up the excellent work!