Monday, August 25, 2008

House Effects In Da House

Charles Franklin has a terrific article up at Pollster.com about "house effects": the tendency of certain polling firms' numbers to tend to lean in the direction of one or another candidate. It is so terrific, in fact, that I have incorporated a house effect adjustment into our averages and projections.

Before we proceed, it is VERY important to distinguish house effects from either "bias" or "partisanship". Those things can cause house effects, but far more often they are, in Franklin's words: "[D]ifferences ... due to a variety of factors that represent reasonable differences in practice from one organization to another."

Nevertheless, house effects do present some problems for our model. Say you have a pollster like, oh, Mason-Dixon, that tends to have a fairly consistent lean toward McCain. We don't know whether Mason-Dixon is right or wrong -- and they very well could be right, since they are a pretty good pollster! But it is the case that, in states where you have a Mason-Dixon poll, the numbers are going to lean more toward McCain than they do in states where you don't. This has nothing to do with the states themselves itself -- rather, it's simply a matter of who polled them. It would be nice to be able to adjust for this somehow.

Likewise, say you have a pollster like Selzer, which is a very good polling firm, but has had a pretty strong Obama-leaning house effect so far. Selzer only polls a handful of states -- usually Iowa, Michigan and Indiana. If we have Selzer polls in those states and don't have them anywhere else, we may get a false impression of the relative ordering of different states. This is pretty important in Michigan right now, where Selzer's Obama +7 is really bringing his numbers up.

Of course, bad pollsters can have house effects too (I just wanted to list a couple of good pollsters first to debunk the notion that house effects mean 'bias'). Zogby Interactive has a pretty strong Democratic lean, for instance. TargetPoint has a pretty strong Republican lean.

I don't have quite as much time as I'd like right now to describe our process in detail, but the basic steps are as follows:

1) Each poll in our database is compared against the trend-adjusted average of all polls in that state. Adjusting for the time trend is important, because otherwise you could easily mistake a timing effect for a house effect, if a pollster happens to release a bunch of data at a particularly good time for one of the candidates.

2) We throw these +/- numbers into a regression model to produce both a house effect coefficient and a standard error for each pollster.

3) The house effect adjustment is enacted only in cases where we are at least 90% certain that there is a house effect. Even in these cases, we hedge our bets a little bit, by subtracting 166% of the standard error from the house effect coefficient. (If you have no idea what this means, don't worry about it. In plain English, it means we're being conservative, since house effects can sometimes appear to arise when they're in fact due to plain old luck).

That's basically it. Well, actually not quite. As Franklin notes, we also have to figure out where to 'center' the house effect. We know that pollsters may have a lot of different methodologies that produce consistently different results -- but we don't know which one is right.

So what we do is compare the averages given by that actual mix of pollsters that we have in our state-by-state numbers against that produced by an optimal basket of pollsters. How do we determine what is optimal? We combine the sample sizes from all the polls that a given firm has conducted in this election cycle -- including national polls -- and then assign it a weight based on our pollster ratings. So the pollsters that have the most say on where the avreages stand are the best pollsters, provided that they've given us enough data such that we have a reasonable idea of where they stand. It turns out that the optimal mix of pollsters is just a tiny bit more favorable to Barack Obama than the actual one we have, so his numbers have gotten bumped up by a fraction of a percentage point.

If we didn't do this -- and we weren't doing it before -- our averages tend to be dominated by a relatively small number of pollsters:



Right now, our four most prolific pollsters -- Rasmussen, SurveyUSA and Quinnipiac -- collectively account for about 2/3 of all the data that forms our daily averages. Rasmussen and SurveyUSA alone account for just more than half of our data, and Rasmussen alone counts for 37 percent. So, our recentering method gives more weight to the little guys at the expense of the big guys -- provided that the little guys are good pollsters. (We don't want to give more weight to Zogby Interactive -- we want to get it the hell out of our numbers).

What this process ends up doing for a pollster like Selzer is that it diffuses some of Selzer's impact over all states. The fact that Ann Selzer's polls think that this will be a very good election for Barack Obama is certianly something we should take notice of. But it really has nothing to do with the particular states that she's polled. So instead of giving Barack Obama a large bounce in states like Michigan and Iowa, we instead take some of that and give him a much smaller bounce spread out over a lot of states.

So which pollsters have a discernible house effect? Not necessarily the ones that you'd think. A lot of the pollsters that have a statistically significant house effect are tiny pollsters that might have released just one or two polls in one or two states. One really nice 'side effect' of this methodology, by the way, is that it will reduce the effect of particularly extreme outliers, in some cases even based on a single poll.

Rasmussen's polls have a slight, Republican-leaning house effect. But it's small -- less than one percentage point (Franklin finds a larger effect, but he's not looking at their state numbers, where the effect has been less pronounced). The effect is nevertheless statistically significant, mostly because we have so much Rasmussen data to work with, but it's not really anything worth getting worked up about.

Strategic Vision has a pretty recognizable Republican-leaning house effect. Mason-Dixon too, which we mentioned.

The pollsters with a Democratic lean tend to be national pollsters, which is one reason why our averages -- which are ultimately still based on state-by-state numbers -- have tended to be less favorable for Barack Obama than things like the RCP national average. Washington Post / ABC and New York Times / CBS have both had a little bit of a Dem-leaning effect. Quinnipiac's polls have been fairly Obama-friendly, but not enough to show up as statistically significant. PPP, a firm that has frequently been accused of/assumed to have a Democratic-leaning house effect in fact does not have one.

To repeat, house effects are not necessarily bad -- but we can make our model even more robust by understanding and accounting for them.

n.b. In our poll detail chart, the house effects are considered part of the 'trendline adjustment' and take effect there. The 'polling average' line is still a pure, unadulterated weighted average, just as it was before.

178 comments

assmole said...

Mule Rider, where are you when we need you?

mr zogby himself said...

I love your work even though you are so very mean to me, nate.

assmole said...

Ok, seriously, dude how can they be termed a 'good pollster' if they're skewing the actuality and not aiming at truth.

tesaar said...

You got to look at it scalewise. Pollster that skews it´s results by 1 point and is otherwise correct is still better than most pollsters resultwise.

Michael said...

Fascinating. Thank you, Nate!

How will this affect your Senate vote predictions?

Mark said...

It's also often not a pure skew, as in sampling more Republicans or more Democrats, but subtle methodological differences that can result in a different outcome based on all sorts of unknown factors. For example, some pollsters push leaners harder than others. Does this produce a pro-Obama or pro-McCain skew? And is it more or less accurate than not pushing leaners? It depends on a few things. First, are there are more marginal Obama or more marginal McCain supporters? Whichever there are more of is who you'll likely to get more of by pushing leaners hard. And the other question is: what are leaners likely to do on election day?

There are dozens of subtle factors like that, which might skew one way or another in difficult to predict ways, or ways that might even vary election to election: a particular methodological choice that was pro-Kerry in 2004 might be pro-McCain in 2008. In that case it wouldn't be biased in a long-term sense (doesn't consistently favor Republicans or Democrats), but in a given situation might skew towards a particular candidate.

striatic said...

@assmole

A good pollster with a house effect might actually be spot on.

It is just that relative to all the other polls they tend to show consistently different results.

That isn't "skewing reality", it is just coming up with different results based on a different methodology. Malice need not enter the equation.

Heck, it could just be that most of the other pollsters are unknowingly making some small error.

Brad said...
This post has been removed by the author.
striatic said...

@Brad

RCP is not a polling company.

Maybe they are biased in which polls they aggregate, but I'm unsure how it relates to this post.

Brad said...

RealClearPolitics is lying this morning. They kept McCain's +5 poll, and threw OLDER polls out of their average. This shows real bias on their part and is not a "house" effect.

BTW - they claim touse the most recent polls in their average, no matter who does them.

Brad said...

Striatic-

Very true that they are not a pollster. The real point to me is that polls have two effects, the first is to attempt to acurately snapshot the race, the second is to effect public opinion about the race. RealClearPolitics may be more important in influencing the race than any single pollster, thus they do at least have some aspects of a pollster - done simply through their polling average.

Brad said...

Nate-

I would love to see more info on how you handle the house effect based on the number of polls. Since Rasmussen is the largest pollster and they lean McCain, is there house effect "removed" across all the polls? Is it really fair to do that because they have such a huge number of polls and does that create a "house effect" in the 538 number as you might overcorrect for their bias.

On the other hand, if you do not correct for their bias, are you creating a Rasmussen created pro-McCain bias in the numbers, thus creating a pro-McCain house effect in the 538 number caused by your reliance on the Rasmussen firm for such a large number of polls.

striatic said...

That's not necessarily the same as "Lying" though.

If they use a consistent methodology for discarding older polls, then what they did this morning could be totally legitimate.

At any rate, the polling companies are in a significantly different position than poll aggregators. It is easier for an aggregator to cherry-pick polls then it is for a pollster to cherry-pick respondents.

dwbh said...

Nate: I just can't read dense stuff like this early in the morning, pre-coffee, so I'll just have to take your word on