One of the things I learned while exploring the statistical proprieties of the Iranian election, the results of which were probably forged, is that human beings are really bad at randomization. Tell a human to come up with a set of random numbers, and they will be surprisingly inept at trying to do so. Most humans, for instance, when asked to flip an imaginary coin and record the results, will succumb to the Gambler's Fallacy and be more likely to record a toss of 'tails' if the last couple of tosses had been heads, or vice versa. This feels right to most of us -- but it isn't. We're actually introducing patterns into what is supposed to be random noise.
Sometimes, as is the case with certain applications of Benford's Law, this characteristic can be used as a fraud-detection mechanism. If, for example, one of your less-trustworthy employees is submitting a series of receipts, and an unusually high number end with the trailing digit '7' ($27, $107, $297, etc.), there is a decent chance that he is falsifying his expenses. The IRS uses techniques like this to detect tax fraud.
Yesterday, I posed several pointed questions to David E. Johnson, the founder of Strategic Vision, LLC, an Atlanta-based PR firm which also occasionally releases political polls. One of the questions, in light of Strategic Vision LLC's repeated failure to disclose even basic details about its polling methodology, is whether the firm is in fact conducting polling at all, or rather, is creating fake but plausible-looking results in order to increase traffic and attention to its core business as a PR and literary firm.
I posed that question largely as a hypothetical yesterday. But today, I pose it much more literally. Certain statistical properties of the results reported by Strategic Vision, LLC suggest, perhaps strongly, the possibility of fraud, although they certainly do not prove it and further investigation will be required.
The specific evidence in question is as follows. I looked at all polling results reported by Strategic Vision LLC since the beginning of 2005; results from 2008 onward are available at their website; other polls were recovered through archive.org. This is a lot of data -- well over 100 polls, each of which asked an average of about 15-20 questions.
For each question, I recorded the trailing digit for each candidate or line item. For instance, if Strategic Vision had Barack Obama beating John McCain 48-43 in a particular state, I'd record a tally in the 8 column and another in the 3 column. Or if they had voters opposing a particular policy 50-45, I'd record a tally in the 0 column (for 50) and another in the 5 column (for 45). I did not include "non-response responses" like "other" or "undecided", nor did I include a tally for third-party candidates in races beteween the two major parties. I also excluded party primaries in which more than two candidates were listed, and approval questions for which more than two choices were provided.
We might expect, as a default, the distribution of these trailing digits to be approximately random. Here, for instance, is what I get if I run the numbers for all Senate and Presidential polls -- more than 3,000 (!) of them -- in my 2008 database:
This data is arguably not perfectly random. There is a little bit of bunching around the middle values like 4 and 5, perhaps because most of the polling comes from swing states or national polls in which a typical figure might be something like Obama 46, McCain 45 (with 9 percent undecided). But it is close to random, and could fairly easily have occurred through chance alone.
By contrast, here's what we get if we run the same tally for the Strategic Vision polls:
This data is not random at all. For instance, the trailing digit was '8' on 676 occasions, almost 60 percent more often than the 431 times that it was '1'. Over a sample of more than 5,000 data points, such an outcome occurring by chance alone would be an incredible fluke -- millions to one against. Bad luck can essentially be ruled out as an explanation.
One of two things seems to have happened, then.
One possibility is that there is some intrinsic, mathematical reason that certain trailing digits are more likely to come up than others. This is certainly possible -- and in fact, it would be somewhat likely if the polling data that we were looking at were homogeneous -- McCain versus Obama polls in Ohio, for instance.
But Strategic Vision's polls cover a wide array of topics: Presidential horse race numbers in any of a dozen or so states, senate and gubernatorial polling, primary polling, approval ratings of various kinds, polling on issues like the war in Iraq, and more abstract questions such as whether voters think that 'experience' or 'change' is the more important quality in a Presidential candidate. No one type of question, in no one state, represents more than a relatively small fraction of the sample. Under those circumstances, I can't think of any reason why the trailing digit wouldn't approach being random -- although there absolutely might be reasons that I haven't thought of.
But this data is not random. It's not close to random. It's not close to close. Which brings up the other possibility: Strategic Vision is cooking the books. And whoever is doing so is doing a pretty sloppy job. They'd seem to have a strong, unconscious preference for numbers ending in '7', for instance, as opposed to those ending in '6'. They tend to go with round numbers that end in '5' or '0' slightly too often. And they much prefer numbers with high trailing digits like 49 and 38 to those with low ones like 51 and 42.
I haven't really seen anyone approach polling data like this before, and I certainly haven't done so myself. So, we cannot rule out the possibility that there is some mathematical rationale for this that I haven't thought of. But it looks really, really bad. There is a substantial possibility -- far from a certainty -- that much of Strategic Vision's polling over the past several years has been forged.
I recognize the gravity of this claim. I've accused pollsters -- deservedly I think in most cases -- of all and sundry types of incompetence and bias. But that is all garden-variety stuff, as compared against the possibility that a prominent polling firm is making up numbers whole cloth.
I would emphasize, however, that at this stage, all of this represents circumstantial evidence. We are discussing a possibility. If we're keeping score, it's a possibility that I would never have thought to look into if Strategic Vision had been more professional about their disclosure standards. And if we're being frank, it's a possibility that might actually be a probability. But it's only that. A possibility. An hypothesis -- as yet unproven.
In the meantime, I have a couple of relatively specific messages for people.
Firstly, if you've been polled by Strategic Vision at any point in the past several years -- this probably means that you're in a state like Georgia, Florida, Ohio, Michigan, Washington, Wisconsin or New Jersey that they tend to "poll" frequently -- now would be a good time to tell me about your experience. So click on the 'contact' button at the top of the page and do so. If you do, please provide as many details about the experience as possible. Please also provide reliable contact information, so that I could verify your identity if need be. I will not publish your name unless you specifically give me permission to do so -- but I do need to be able to confirm that you're not a sock puppet.
To the folks at Strategic Vision, LLC, the opposite holds. I don't care if you contact me by e-mail, by phone, by attorney, or by carrier pigeon. Whatever you tell me, whether the communication is solicited or not, whether you decide to play naughty or nice, it's on the record, and will almost certainly be revealed in full to the readers of this blog.