This site has had a ban on listing internal polls for some time now. The logic behind this is that when a candidate for office commissions a poll, he is only liable to leak its results to the public if it contains good news for him, thereby encouraging donors, press persons, etc. This does not mean per se that the poll is "biased" -- many pollsters do very good and thorough work on behalf of campaigns and affiliated interest groups. But it does mean that there may be a bias in which information becomes part of the public record: we learn about a poll that has a candidate ahead by 10 points in a state, but not one where he is down by 2. For this reason, such polls have been excluded.
There have been an increasing number of surveys, however, particularly on the Senate side of things, that somewhat test our definition of an "internal poll". Where would you draw the line on the following spectrum?
1. Polls commissioned by the candidate himself.
2. Polls commissioned by another candidate for office in that state.
3. Polls conducted by a national campaign committee (e.g. RNC, DSCC)
4. Polls conducted by an interest group (Emily's List, US Chamber of Commerce), but formally unassociated with the candidate.
5. Polls that are private, but conducted on behalf of someone with no direct interest in the campaign, such as an outside lobbying group.
Presently, I have been drawing the line between #3 and #4. But I'm not sure that there's a major philosophical difference between, for instance, Emily's List commissioning a poll, and the DNC doing so. I'm also not so sure that I necessarily have things in the right order.
Anyway, I've come to very much trust in the wisdom of the 538 crowd -- so opinions are solicited and appreciated.
Monday, June 30, 2008
Internal Polls
-- Nate Silver at 2:11 PM
Labels: internal polls, meta, methodology, site
70 comments
I'm not a statistician, but I am a fan of this site, and also, quite a bit greedy for news on this election - mostly the presidential one. If internal polls were obtained from both camps and on any side, and averaged against each other, wouldn't that minimize the bias risk? Is it possible to have a toggle for with/without internal polling data?
QT
I'd have to say that under your logic (which I think is sound), there are two questions that should be answered:
1) Do we know for sure that all polls conducted by the org in question are made public. If not, then I'd say it should be excluded.
2) Is there any track record with which to judge the efficacy of the poll and assign some kind of rating/weight? If not, I'd think that even assigning a low-ish default weight would be problematic.
Since there's really no shortage of public polls AFAICT, I think it's wise to err on the side of exclusion with regard to any private polls.
Seems to me that you could look at the polls from different sources (obviously ones directly from candidates shouldn't be used), and compare their success in forecasting to other major nonpartisan polling organizations. If they are fairly close or even better, I would see no problem in using them, but if they consistently rely on only flukes in an attempt to portray a trend to the media and/or voters, then they shouldn't be used. That would be my framework. You being the smart stats guy could probably determine where to draw the line on how reliable they are compared to other polls. Anyway, I love the work you've done so far (in both this and my favorite Baseball Prospectus), and I'm sure it will continue to be great either way. Good luck.
The risk is in selective release, as you said...That means the poll is useful as a snapshot, if you believe it is properly conducted and such.
Therefore, it might be a good idea to give it less weight in the regression model, but it can certainly be useful to seeing the current standings, if there are few recent polls. Some of the senate races are quite sparse.
I tend to think that if #3 is excluded (and I think it should be) that #4 should also be excluded. I also think that #1 should unquestionably be excluded. #5 could lead to bias, but I generally think it should be included assuming there is no direct interest in the campaign. My only open question relates to #2. I can see arguments going both ways as to whether they should be included or not. The more risk averse I'm feeling the more inclined I am to exclude it. But why should a feeling decide this? Is there some sort of analysis that can be done on historical polls in this group to determine if they are worthy of inclusion?
Provided the organization actually *conducting* the poll has a decent track record, the question should only be: *why* do we know about this poll? Do we know because all polls conducted for/by these people are routinely disseminated? Or do we know because they have, in this instance, chosen to make the results public? If that information isn't available, I'd err on the side of caution and exclude it.
My preference would be to limit the polls to those commissioned by unaffiliated third parties. I would think that any group with a political agenda would be inclined to publish those poll results which favor their agenda. Probably best (and easiest) to avoid the issue altogether where possible.
The other option, absent a belief (or evidence) that one side publishes more polls, would be to use all available polls which meet methodological standard on the premise that any agenda-based release bias would be averaged out with the larger sample of polls.
Correct me if I'm wrong, but, considering the relative aggregate similarity in wealth between the Republican and Democrats during this particular year, would the selective inclusion of internal polls not effectively cancel each other out? As in the Republicans don't release polls showing Democrats ahead and the same holds true for the Democrats, but, over time, the noise cancels itself out?
We know that there will be no lack of polls, internal and otherwise, in the coming months. Given that, you can be very selective with the polls you use and still have more than enough polls to provite reliable data.
The standard should be clear from the question: Is the sponsor of the poll going to release all the results regardless of how they turn out.
If the sponsor allows itself any discretion about if and what to publish, then it is using polling for some sort of advocacy or public relations. And those polls should not be relied on.
=sh
I would forget about categorizing the types of internal polls and simply tackle the problem with internal polls directly: if the polling firm is willing to be transparent about the methods they used in poling and agree to release all polls they commission (rather than being selective towards only polls with a certain outcome) you should include them.
Does the individual/organisation in question release all their polling or do they have a selection bias?
There, that's the relevant question. And I don't know the answer to that for the most critical choices... (#1 and #3 surely don't, but #2 most likely doesn't either; #4 and #5 probably depend on which group we are talking about...)
Nate -
Your current policy is good.
However, the beauty of running a high traffic site like this is that you can probably get some folks to hit you with more internal polling data than the public sees. You should factor this data into your analysis in your posts on the blog, but not your regressions or poll averages.
I think others here have hit in the answer.
In general, I'd exclude all 5 of the types of polls you list, Nate.
The only exceptions I'd allow are for those partisan groups that consistently release all the polls that they undertake. Fox News and Daily Kos, for instance.
.
Nate, I agree with your decision. Internal polls can certainly be valuable and interesting to discuss, but they are inherently biased and therefore not completely reliable.
Frankly, I have been surprised recently at how other blogs I read -- and especially news sources, such as The Politico -- post internal poll results. In fact, numerous news sources will discuss internal poll results but continue to refuse to post Survey USA and Rasmussen on the ground that they are automated and thus unreliable. I find this view ridiculous, particularly given the strong track records of each.
Part of the reason I like this site over RCP which just averages polls equally is your methodology and refusal to tabulate internals. Don't change.
I agree with JGabriel and others - these are all questionable.
If the rationale for keeping out polls of candidates is selection bias, then this rationale seems to apply equally to interest groups. Just like the candidate, the interest group has an interest in how the race turns out and this leads them to have the same bias as candidate to selectively release polls. Absent some strong empirical evidence that internal polls add to the predictive power of the model, I would think the obvious potential for bias argues for exclusion of all private polls.
As to category 5, I'm not sure who you're thinking of here. If they are strongly associated with one candidate (or party), then I'd favor exclusion. If it's the state credit union trade association or something like that (seems like they did one in Texas a while back) that doesn't have a strong partisan reputation, then I'd think the model is better off including it. This does require a bit of a judgment call on your part.
If there's a particular (large) dataset that you'd like to see included, you could try to get into contact with the interest group and see if they release all of the internal polling they conduct. If they don't, or won't answer, then you can be pretty sure of selection bias. If the group claims to release all polling data and has a record that seems to indicate that is true, then you might think about putting the results in.
It's a lot of work to do that, though, and I'm not sure it's worth it. As stated above, you might consider such regular polling groups as Fox News and DailyKos.
Nate, you are the genius here. I would always desire to err against exclusion, promoting the use of every available assett biased or not in making a determination. You must have some method of giving these polls proper weight in your method, and I would trust you to do that. If the