6.17.2009

Unconvincing (to me) Use of Benford's Law to Demonstrate Election Fraud in Iran

Benford's law is an amusing mathematical pattern in which the first digits of randomly sampled numbers tend to have a distribution in which 1 is the most common first digit, followed by 2, then 3, and so forth. It's the distribution of digits that arises from numbers that are sampled uniformly on a logarithmic scale.

In our Teaching Statistics book, Deb and I describe a classroom demonstration where we show how Benford's law applies to street addresses sampled randomly from the telephone book. In a more serious vein, Walter Mebane has written about the application of Benford's law to vote counts.

In the past several days, a few people have asked me about applying these ideas to the recent Iranian election. Today, Stephane Reissfelder pointed me to an article by Boudewijn Roukema, which states:

The results of the 2009 Iranian presidential election presented by the Iranian Ministry of the Interior (MOI) are analysed based on Benford's Law and an empirical variant of Benford's Law. The null hypothesis that the vote count distributions satisfy these distributions is rejected at a significance of p < 0.007, based on the presence of 41 vote counts for candidate K that start with the digit 7, compared to an expected 21.2-22 occurrences expected for the null hypothesis. A less significant anomaly suggested by Benford's Law could be interpreted as an overestimate of candidate A's total vote count by several million votes. Possible signs of further anomalies are that the logarithmic vote count distributions of A, R, and K are positively skewed by 4.6, 5.8, and 2.5 standard errors in the skewness respectively, i.e. they are inconsistent with a log-normal distribution with p ` 4 × 10−6, 7 × 10−9, and 1.2 × 10−2 respectively. M's distribution is not significantly skewed.


I don't buy it. First off, the whole first-digit-of-7 thing seems irrelevant to me. Second, the sample size is huge, so a p-value of 0.007 isn't so impressive. After all, we wouldn't expect the model to really be true with actual votes. It's just a model! Finally, I don't see why we should be expecting distributions to be lognormal.

Maybe there's something I'm missing here, but that's my quick take. This is not to say that I think the election was fair, or rigged, or whatever--I have absolutely zero knowledge on that matter--just that I don't find this analysis convincing of anything. I will say, though, that Roukema deserves credit for presenting the analysis clearly.

P.S. In response to comments: let me emphasize that I'm not saying that I think nothing funny was going on in the election. As I wrote, I'm commenting on the statistics, I don't know the facts on the ground. To move my comments in a more constructive direction (I hope), let me pull out this useful comment from Roukema's article: "One possible method to test whether this is just an odd fluke would be
to check the validity of the vote counts for candidate K in the voting areas
where the official number of votes for K starts with the digit 7." Further investigation could be a good thing here.

I did not find Roukema's argument convincing; that does not mean that I consider it a bad thing that the article was written. The article is a first draft of an analysis; it might end up leading to nothing, or it might be unconvincing as it stands now but lead to some important breakthroughs. We can see what further analysis turns up. Again, my verdict is not a Yes or a No, it's an "I'm not convinced."

P.P.S. A commenter on our other blog pointed out this analysis of the Iran vote counts by Walter Mebane, who's the expert in this area.

20 comments

Bradford said...

You do have absolutely zero knowledge on the matter, and this is real life and real lives. So, STFU!

Academics, fiddling while Rome burns...

Kippers said...

@Bradfor...
Yeah, people like Andrew with his silly little things called 'facts' and 'reason.'

Shane said...

I guess Bradford would want people to turn off their minds and accept any argument either way that comes along.

Sorry, that's not a viewpoint I can endorse. Besides, it's better to only put forward arguments that aren't easily dismissed (either because they're wrong, or there are weak points in the argument). There are many other things that could be used to show the election was fraudulent much better than the argument about Benford's Law.

As a question to Andrew: I am not really an empiricist, but are they applying the correct test? I'd think a Chi^2 test on the aggregate data is appropriate, rather than saying the 7 bin is filled through a Poisson process and get a p-score based just on that bin. I don't know if that result would lead to a different conclusion. At least they perform a Bonferoni correction.

Bradford said...

Kippers-

I will place facts and reason with folks who have enough knowledge of the ground in Iran to know the basic lay of the land. Nate has posts saying things as bad as "the Democratic win, by 90% of the vote, in Louisiana and Utah are completely unsurprising...". What a clueless and idiotic thing to say...

Herein Andrew argues that a non-random distribution of numbers is absolutely expected and that because he shows off to first year stats students that phonebooks do have a non-random distribution that somehow that it relevant, because, well, he is really fucking smart. He is playing with fire and arguing a point that is both arguable and not certain, when his argument can and will be used by a government that is killing people and putting other countries at risk with nukes. I, for one, believe that one must use their talents and analysze how their talents may be misused before spouting off.

Andrew may be right here, I dis agree with him (go read the paper) but even if he is, his point is far from a certainty and his argument matters more than his normal academic BS arguments with other stats profs in this case and he should use a different standard.

Now, both of you please go pound sand. Start with Nate's ears around this Iran thing for many of the same reasons stated above.

Bradford said...

Shane-

Your name was a great film, as for your post, well, not so much.

"I guess Bradford would want people to turn off their minds and accept any argument either way that comes along."

Learn to read, and contemplate before you slam people randomly on the internet.

I want people to truly think before they spout off and act responsibly...

Bradford said...

Andrew-

You can quit changing the links and the post itself anytime based on my comments. Lets fight fair and do it in the comment string.

Pretty classless to post-hoc change a post to strengthen your weak argument AFTER folks comment.

STepper said...

I believe the point of Benford's Law is to deal with numbers which are simply made up, not tweaked. If so, I don't think your criticism is on point.

It is the hypothesis of Roukema that there was no vote count, merely a press release stating what the vote count was. With numbers that are supposed to look impressive but are actually more likely than not made up numbers.

I suspect that if the truth comes out it will show that the numbers released as the vote count were, in fact, simply made up out of whole cloth. Of course, you can sit back in your Ivory Tower and hide, but the real issue is whether Benford's Law has any predictive value.

Unless there are some laws named after you, Andy, at this point I'll vote for Benford. That's 1 vote. Starting with a "1."

Bradford said...

Finally, a real academic would link to a real paper and not a blog post wherein the academic paper underlying the blog post linked is not available.

http://macht.arts.cornell.edu/wrm1/pm06.pdf

Epic post fail on Iran at 538!!!!

eerac said...

Obviously there are lots and lots of convincing reasons to believe the Iranian election was fixed. Whether or not the analysis of this Roukema paper holds up is clearly not a deciding factor.

That said, while the surplus of 7's is definitely suspect, it would be very useful if the author data from other elections as a point of comparison.

More importantly, if you start with legitimate voting data, then manipulate it by say, scaling various vote totals, you would easily end up with fraudulent data that obeys Benford's law. Even though Benford's Law is sometimes a clever way to detect sloppily made up numbers, it's never a way to verify that something wasn't falsified.

A final observation is that a version of Benford's law exists for any base (i.e. not just base 10). Using a smaller base might make for a cleaner analysis.

Erik Swanson said...

It really frightens me how quickly this story of elction fraud became unstoppable. This article in WaPo suggests that the election results were pretty much as expected:

http://www.washingtonpost.com/wp-dyn/content/article/2009/06/14/AR2009061401757.html

If Bush were still president, I think there would be a hell of a lot more skepticism of these fraud allegations. This is a "velvet revolution" right out of the CIA playbook. It's so transparent and obvious to me...haven't we seen this before?

Mousavi is no "reformer." He's a right-wing privatization champion. Yes, he's for more social liberalization, but his economic policy is hard right. He was instrumental in Iran-Contra and he is an unapolgetic supporter of terrorism.

Only 1/3 of Iranians have internet access. Remember that as you hear these reports of the youtube revolution.

Bob X said...

The sentence "the sample size is huge, so a p-value of 0.007 is not that impressive" makes it sound that you have not a shred of a clue what a p-value is.

Tony C. said...

I am a statistician. Benford's law applies to second digits, third digits, etc. In fact this is documented on the Wikipedia article.

So, the chances of 0-9 occurring as the second digit are 11.97%, 11.39%, 10.88%, 10.43%, 10.03%, 9.67%, 9.34%, 9.04%, 8.76% and 8.50%. There is a slight differential on the 3rd digit (the chance of a zero is 10.18%, the chance of a 9 is 9.83%), and after that you might as well use a uniform distribution (10% each).

This can be used to double-check the first digit: If the 2nd digit, 3rd digit, etc follow their proper distribution under Benford's law, it is highly likely vote totals are logarithmically distributed. If they are, the entire number is probably a log distrib, and first digit fraud is likely.

To address Nate's concern, if the totals are "engineered" by precinct size to begin with certain numbers like 7, then Benford's law should apply in one of two ways: Either to the 2nd digit instead of the first, and so on (imagine if we just put '7' in front of every house number); or Benford's law should apply after we subtract some constant that represents the minimum threshold in Iran that constitutes a region. for example, if we demand at least 1000 eligible voters for a region, and 70% voted, the constant would be 700.

We don't have to know that threshold a priori; a simple computer program can easily find the constant for us by a brute force approach of test all of them. There are only about a hundred constants to test, since only the first two digits will change their distribution.

ewan said...

Surely 366 isn't the 'huge' sample size you're referring to - If each vote were a data point it would be huge (though I'm not clear where we would be misled by any tests but IANAS) but as it stands the sample is data from 366 voting areas.

Thyplo said...

Would it be possible to find if benford's law applies to this situation by applying it to previous elections that are considered legitimate and seeing if they follow the pattern? If that were the case, it may help determine if this model is appropriate.

Bob X said...

@Thyplo: see Nate's thread on "Unlucky Sevens?" where he shows a moderate failure of Benford's Law in the Franken-Coleman election (briefly: since precincts in Minnesota are preferentially of a certain size, 1000 "or so", and the percentages for Franken are not wildly different from one part of Minnesota to another, lead digits 4/5/6 are over-represented; the graph doesn't look too far off from the Benford curve, yet at the large sample size the p-value still comes out an absurdly small number).
These conditions do not seem to be applicable to the Iranian case: the electoral districts from which we have the reported vote totals are not of nearly uniform size; nor are the percentages for each candidate expected to be uniform across regions (ethnic-bloc voting is very important in Iran); and it would not explain a spike at "7" with no boost for "6" and "8". However, it does point out that Benford can fail to work in election cases, and that there might be some other mechanism, not as yet figured out, that could account for the Iranian oddities in a non-sinister way.

Pythagoras said...

According to Mebane's article "Election Forensics: Vote Counts and Benford's Law" http://www-personal.umich.edu/~wmebane/, election data often deviates from Benford's Law in the leading digit for legitimate reasons.

Consider this example. Assume in one election that most precincts have a size close to 1000 and that a particular candidate receives a uniform 50% of the vote across all precincts. Then the leading digits 4, 5 and 6 would appear quite frequently.

Mebane illustrated this by examining data from the 2004 returns in Miami-Date County. He concluded that the second digit was a better metric for election data.

Mark said...

> First off, the whole first-digit-of-7 thing seems irrelevant to me.

Really? Do you have any idea what Benford's law really is or why it's valid? By that statement, I'd say you haven't the slightest clue.

Next time if you're going to refute the validity of a mathematical/statistical paper, come armed with your own academic credentials in the subject (or at least with a friend who has).

mhnatiuk said...

Mark: that and claiming that p-value <0.007 is "Second, the sample size is huge, so a p-value of 0.007 isn't so impressive". 1.Analysis concerned population, not sample 2. pvalue takes into account population size, so it doesn't make sense to claim that "it isnt impressive, because sample was huge". Well, it's impressive enough, i would say. Enough with 99,993% probablity that Roukema is right and 0,007% that you are. Wanna bet? Check out amazon.com for any book on basic statistiss for rookies. Word up!

mhnatiuk said...

I was reffering to original post, not to Mark's comment, of course.

freefun0616 said...

酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店經紀,
酒店打工經紀,
制服酒店工作,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
酒店經紀,

,