Note: the below is fairly technical, but since the discussions of Strategic Vision's polling had become quite technical in the comments, I thought it was worth giving Michael Weissman, a retired physics professor at the University of Illinois and a frequent commenter at this website, a guest column in this space. Using a robust and fairly elegant statistical technique known as Fourier analysis, Weissman has found that Strategic Vision's polls indeed contain unusual statistical artifacts that are highly unlikely to have arisen by chance alone and which differ substantially from those of comparable pollsters. I have given Weissman's words a light, non-technical edit, with his permission, from the version he originally sent to me. --Nate Silver
____
Fourier visits Strategic Vision
by Michael Weissman
Three weeks ago, a polling association censured Strategic Vision LLC ("SV") as the only pollster who refused to answer repeated requests for routine information on their methodology -- twenty other pollsters had complied. Nate Silver followed up by checking whether there was anything statistically odd about SV’s results, finding that the distribution of trailing digits in their reported percentages for the two major candidates showed much larger deviations from uniformity than would be expected by pure chance draws from a uniform distribution. Some digits, such as 8, appeared much more often than others, such as 1. A closely matched comparison group of polls from Quinnipiac also showed larger-than-expected deviations from uniformity, but not nearly as extreme. A commenter on this site, "steve", sent in the results from a comparable collection of SurveyUSA polls, showing no unusual non-uniformities at all. A discussion immediately ensued on this blog and others as to whether the strongly non-uniform SV results could easily arise by normal causes or whether they constituted evidence suggesting that the results had not been obtained by other-than-normal polling methods.
One potential source of non-random non-uniformity, pointed out here and elsewhere, could be some rounding method that systematically favored evens or odds. It turns out, however, that evens and odds appeared with nearly the same frequency in the SV result. In addition, the sample sizes that SV typically uses are divisible by 100, making rounding errors unlikely.
A more challenging objection was as follows: There is no a priori reason to expect the distribution of trailing digits to be uniform, even in a large sample. We know that the full percentage poll results are not uniformly distributed from 0 to 100. Polls, rather, are generally taken in races where the leading candidates each have some major chunk of the vote. There are usually a few undecideds as well. So you might expect the distribution of ideal poll results to have a broad peak somewhere roughly in the vicinity of perhaps 45, trailing off smoothly on either side. That’s not uniform – although, if the results were routinely spread throughout the 30s, 40s, 50s and 60s (as is the case with Strategic Vision’s polls), you would expect a pretty smooth distribution. The distribution should also be cyclic, in that 0 – such as in the number 50 -- is just as ‘close’ to 9 (49) as it is to 1 (51).
The problem is that pretty smooth isn’t definitive enough to say whether the extra variance (the average of the squared differences from uniformity) should be considered alarming or not. Nate explored tentatively some other possible non-uniform distributions, but there were some justified objections that these were arbitrary. What’s needed is a way to remove the variability due to the non-uniform distribution without pretending to know just what the distribution is. Fortunately, we have some tools – in particular, a tool called Fourier analysis -- to solve what might sound like an intractably subjective problem.
First, regardless of the true underlying distribution of results, the actual poll results cannot show any major non-random variations between adjacent digits. The reason is that SV’s polls are taken of relatively small numbers of subjects (generally 600, 800 or 1200), leaving random uncertainties in each result of about 2 percent. If, for example, there were some (wildly implausible) real tendency of the true values to cluster on even digits as opposed to odd, the poll results wouldn’t show it very much because the random errors would smear them out too much.
Second, we have a standard mathematical tool called Fourier analysis to describe our ten digits in terms of components. These components can be manipulated such that the non-random non-uniformity is concentrated in some of the components, while leaving others random. This provides for a big advantage over the initial form, in which the non-uniformity might be distributed among all ten numbers.
One of the Fourier components is completely flat and just represents the average value. The other nine Fourier components are sinusoidal waves on our plots of occurrence rates for the ten digits. These include a range of broad and narrow waves. The wave with the most frequency is the period-2 even-odd cycle -- but I have chosen to ignore this because it might plausibly arise from rounding methods. The most slowly-moving wave has period 10. There are two such period-10 components, with different peak locations. These are the components where some plausible non-uniform distribution could show up, even after smoothing by the random sampling error. So we can remove them too, without bothering with arguments about what we think they should be.
Now comes the fortunate part: the smoothing of the distribution from random sampling effectively wipes out all the non-random shorter-period components. This reduction in the shorter-period components can be calculated with great quantitative precision using the known width and Gaussian shape of the sampling error distribution, providing for very clean random sampling variations.
Does this leave us enough statistics to work with? There were originally ten Fourier coefficients, and we’ve thrown out the irrelevant mean, the two that could reflect non-uniformity, and the one that could come from rounding. That leaves six with random amplitudes. It would be nice to have more than six, but that’s enough to catch extreme cases. We know how big the coefficients should be on the average because standard simple statistics tell us precisely how big the random variations are on average in our numbers.
Now we can ask: how big is the variation of the numbers, after all the suspect components of the variation are filtered out, compared to the statistical expectation? Remember, this filtering is done precisely in response to the serious objections, largely by defenders of SV, which were made to Nate’s original post, removing the potentially innocent components which could have made SV’s statistics look suspect. Here, then, is the filtered variance as compared to the statistically expected filtered variance:SurveyUSA: 0.46
How unlikely are those results? The SUSA result is pretty much typical. Quinnipiac actually has notably low variance, but random chance would result in variances that low or lower about 5 percent of the time. The Strategic Vision result, on the other hand, or something more extreme, would occur by chance with probability only 0.00019. That’s not as low a p-value as the results obtained without filtering the non-uniform components, but it’s still very low -- less than one chance in 5000 to have occurred by chance alone. For statistical sophisticates, this is a genuine relevant p-value, testing a well-specified prior hypothesis, not the sort of misleading p value obtained when one screens many data sets looking for anything unusual.
Quinnipiac: 0.30
Strategic Vision 4.40
I’d like to thank Nate for getting this started and various commenters for helping keep the discussion lively: especially ecarlson, Mark Grebner, MarkinIL, steve, shma, and loner. Finally, since the core issue here is transparency, I’ve included the code by which the p value was calculated. Anybody who writes real programs will get a kick out of this, since I used a baby language (Basic) to handle a fairly basic statistical problem.>list
Michael Weissman is a retired physics professor (University of Illinois) whose research has focused on using random noise to characterize disordered materials. He is a Fellow at the American Physical Society, was once nominated for the Nobel Peace Prize by Barbara Boxer, and was born and raised a St. Louis Cardinals fan.
10 dim d(10)
15 d(0) = 562
16 d(1) = 431
17 d(2) = 472
18 d(3) = 490
19 d(4) = 526
20 d(5) = 599
21 d(6) = 533
22 d(7) = 639
23 d(8) = 676
24 d(9) = 616
30 for i = 0 to 9
40 sum = sum+d(i)
41 sumcos = sumcos+(d(i)-554.4)*cos(0.6283*i)
42 sumsin = sumsin+(d(i)-554.4)*sin(0.6283*i)
43 sumdif = sumdif+d(i)*(-1)^i
45 sumsq = sumsq+d(i)*d(i)
50 next i
60 ave = sum/10
70 print ave
80 var = sumsq/10-ave^2
90 print var
100 lowf = 0.02*(sumcos^2+sumsin^2)+0.01*sumdif^2
110 print var-lowf
120 dev = (var-lowf)/(0.6*ave)
130 print dev
140 p = exp(-3*dev)*(1+3*dev+4.5*dev^2)
150 print p
>run
554.4
5521.44
1463.071532
4.398363
1.882962E-04
10.05.2009
Seen Through Sharper Statistical Lens, Anomalies in Strategic Vision Polling Remain
by FiveThirtyEight.com @ 5:28 AMFourier waves of different frequencies combining to form
another, seemingly complex wave.
...see also mathematics, strategic vision
Subscribe to:
Post Comments (Atom)


180 comments
A nice turn in the discussion. Thank you. Also notable is this article in the NYT published a few days ago.
Wow ... I just had a nerdgasm. :)
Seriously, speaking as a methods geek, that was an absolutely delicious read. Kudos!
ledowl. n. a miniature avian figurine made of a soft metal.
Love it, I think it was more likely than not from all the prior evidence, albeit less statistically rigorous than this, that SV was cooking the the books. Their reactions and lack of transparency are as telling as any stats test, albeit not conclusive...to a lawyer.
...but more importantly the dear Professor is a Cardinal fan in a great year to be a Cardinal fan. GO CARDS!
*whoosh*
the sound of that going over my head! And I have a physics degree and have done Fourier Analysis in the past....shows how much I'e forgotten :(
So basically: 4999/5000 that SV didn't get their numbers honestly.
Come on Peter Wolf, didn't it all start coming back as you read the post? If not, I find that Wikipedia is a good place to give myself a refresher course on all the math I've forgotten. ;-)
Great analysis Dr. Weissman! This is a pretty big nail in SV's coffin.
Thanks, Michael Weissman! This is a great analysis - and makes me feel a lot more confident about some of the legitimately-subjective criticism that was coming up.
I love this quote from the NYT article (cited above), about the fact that their business is 115 miles from the city they claimed it was in: "The difference is semantic."
Really? So SV is saying that factual accuracy isn't important and can be dismissed with a word and a wave of the hand?
The thing that amazes me most about this analysis is that people still seem to be using Basic.
A nice turn in the discussion indeed. This type of analysis is why I keep coming to fivethirtyeight.com.
Good stuff... but, as a mathematics Ph.D., albeit not a statistician (my specialty was numerical analysis), I'm a little suspicious about the applicability of Fourier analysis here. The signal being analyzed here is not a periodic signal; rather, it is a function with a range in [0, 100] that is being looked at modulo 10 in order to focus on the one's digit.
Now, this modulo technique *can* be useful if the function in question really is periodic or almost periodic over the "unstacked" range. Is that supported by the original data (i.e. is the distribution of the one's digit for results in the 40's statistically similar to the distribution of the one's digit for results in the 30's and 50's?
Also, like all mathematicians, I like to see how a methodology performs on some concrete problems with known results before passing judgment on the methodology as a whole. What's the result of this analysis if applied to, say, modulo-ized results from a Gaussian distribution with mean 50? What about a twin-peaked distribution with peaks at 45 and 55? 43 and 55?
I had read the cited NYT article and found the history of SV LLC's refusal to provide information about their methods striking - they never disclosed anything to anybody, but the reasons for doing so were all over the map, constantly changing, and in some cases (non notification) easy to disprove. The pattern of behavior (including threats of legal action for calling out discrepancies) was the classic pattern of a scam, familiar to anyone who likes reading investigative journalism.
And the numerical analysis presented here is devastating. It demonstrates nicely how difficult to fake data in such a way that careful analysis cannot detect it.
I think Uncle Al likely makes good points, that I do not fully appreaciate, that said, his comment remind me more as the reason I left academics - it is much easier to criticize than actually DO anything - the very definition of the average prof.
Uncle AL - and what mathetical test is appropriate? As a simple mathematical model with lots of good results to test (e.g. Gallup), why not develop the right model?
I bet all those pollsters out there fudging their numbers to steer public debate are crapping their underpants right about now thinking they are next on the chopping block.
Also a mathematician, also not a statistician, and I do agree that there is no a priori reason to think that this function is a periodic signal. However, the analysis does show that both SUSA and Quinn have last digit data that matches a reasonable periodic signal, where SV does not. I'd be much happier if SV had left us with some truly statistically insignificant data to work with (say a tenths place), but sadly they do not.
That being said, I find both Nate's original data and his analysis of the Oklahoma test scores very damning. Even without a statistical analysis, SV's least significant digits really don't look even vaguely right, and for Oklahoma, Nate showed that the easiest, laziest thing to do with the data matches SV's numbers very, very well.
@Bradford: It is possible that Nate or others who put together the different datasets can readily produce the data. If I understand the criticism, question, its whether stacking all the second digits together is a good idea. It certainly loses information.
That is, suppose instead of 0-9, the data might be tested separately or over the whole series for, say, 20-29, 30-39, 40-49, 50-59, 60-69.
Does the same pattern hold when the waves are fit, say to 20-69 as they do to 0-9? or 20-29, 30-39, 40-49, etc. Or is the stacked second digit pattern just a "compositional effect" of very different distributions for the other deciles.
Even using Weissman's formula, suppose it were tested against several sets of the empirically derived second digit data for each pollster: what is the p-value when the test is run on second digits in the range 20-29 vs. 30-39 vs. 40-49, and so on. If the test is really robust, it should survive such disaggregation and not appear only when the second digits are stacked together.
(At least this is how I read the comment.)
Added: At the very least, if the test were done in a disaggregated way (perhaps as well by subject matter of the poll -- electoral preferences vs. other subject) we might have some more "detective" evidence of situations in which the data were more likely to have been fabricated.
If Nate or others produce the data, then Weissman's little BASIC program can be run by anyone.
Numbers are complicated.
GO CARDS!
The basic point is that the least significant digits should be more or less randomly distributed. Including more significant data (like some 10s places) isn't going to help. It's significant data that's it's easy to produce reasonable answers to (just look at some other poles) and quite possibly {\em should} have a pattern! The insignificant data is the stuff that will show the most human non-randomness if it's made up and the least if it's not (if the data is real, least significant data should be mostly noise).
Another analysis that is easy to do: convert all of SV (and the selected Quin data and others) to various other bases (say between 7 and 16). The least significant data should still be distributed more or less randomly.
Shouldn't you could also apply a post-hoc test to account for the fact that you asking whether three distributions are statistically different from a random distribution?
In such a test you will get one p < 0.05 ~15% of the time. Now, there isn't anyway in heck that the SV data will become insignificant if you apply a post-hoc test, I was just hoping to move Quinnipiac a little further away from the cut off.
Todd Dugdale said...
The thing that amazes me most about this analysis is that people still seem to be using Basic.
At least it wasn't FORTRAN!(which would have been more apt for a physicist.)
http://en.wikiquote.org/wiki/Fortran
Uncle Al, I created 5000 sample data points drawn from a Gaussian with mean = 50 and standard deviation = 12. The corresponding values (using the code provided) were Ave = 500, Var = 717.8, Low-f = 213.7838, Dev = 1.680054 and p = 0.12131.
I then repeated the experiment with 5000 data points drawn from a Gaussian with mean = 45 and standard deviation = 12. The corresponding values were Ave = 500, Var = 681.4, Low-f = 176.131, Dev = 1.68423, and p = .120284.
Repeating each experiment again, I got p-values of 0.537496 and 0.21334.
So, your criticism is reasonable, but the data suggest that when numbers are generated randomly, then this approach (using Fourier analysis) fails to reject the null hypothesis of random values.
@Uncle Al- Hey, your point about periodicity was already addressed. Because the distribution is smoothed by sampling error first, then taken mod(10) zero and nine are just a close a any two other digits. Thus it's appropriate to treat the function as periodic. You can see that with the control groups there's no problem. It takes a few minutes to download Basic and put in any other function you want, just to test it.
@ Juris- The test is not intended to work on disaggregated data. The chance of getting 29 is very different from the chance of getting 20, but almost the same as the chance of getting 30. That's why aggregation mod(10) , the form in which Nate presented the data, gives something close to uniform to begin with.
@Matt Ackerman
Your point was also already addressed. Since SV was singled out as suspect prior to the test, i.e. this was not random screening, post hoc stats aren't appropriate. A 2-tailed test is also probably not appropriate, since it would be more far-fetched to think that over the years SV tried to keep their digit count artificially uniform than to think that they ignored the issue.
I did some signal processing and statistical analysis some years back during my undergrad in Electronics and Communications engineering.
Fourier analysis...Ah! it's coming back.
Nate and Prof. Weissman.
Thanks for not letting this topic die. SV LLC needs to be held accountable.
Apparently my first comment was eaten.
Thanks for the shout out, mbs. This is some excellent work.
chris said...
Todd Dugdale said...
The thing that amazes me most about this analysis is that people still seem to be using Basic.
At least it wasn't FORTRAN!(which would have been more apt for a physicist.)
http://en.wikiquote.org/wiki/Fortran
As a recent physics major and soon to be grad student, I need to say that this characterization is apt. In fact, just the other day I wrote a program in Fortran. The language will forever live on for physicists.
If anyone cares - though Nate did a great job of discovering that Strategic Vision's offices are not where they say they are online - I actually had occasion to go through Blairsville this past weekend...so I thought I'd take a few shots of the town square area where their real offices are located.
In his original post, Nate located one picture of the SV office, which appeared to be at 22 Town Square, "although the image is not of a high enough resolution to say so definitively."
Well....it's now definitive. I did not ring the bell or knock - it being Sunday (although there was a car parked in front, as you can see), as I didn't want to risk any more charges of "harassment" for Nate or the 538.com crew. I just went through the parking lot and took these photos. You can read the address quite clearly, as well as see the large poster over the door advertising the GOP BBQ Nate referenced in his original post. I also found it interesting that the polling place was apparently right nearby, as well.
Can someone post a link (or post a few paragraphs) to how Fourier Analysis is used in statistics like this?
I'm familiar with probability and Fourier transforms, but cannot understand this blog post's methodology as well as the rationale for the methodology.
How does breaking a frequency distribution into periodic components yield a p-value?
I think I'm on to something here - even easier to draw direct conclusions from then Fourier Analysis. I would like to extend my analysis with the actual raw datasets if anyone can point me to that.
Please comment on the following:
http://valpeyblog.blogspot.com/
Nate, thanks for posting this. Unless I'm missing something, this is a rigorous demonstration that they're cooking the books.
Thanks for the shout out - my first reference in fivethirtyeight!
For the record, I'm also a (non-retired) physics professor.
WV - eggis - Eggis on SV's face at this point.
Peter Mork-
I guess a discussion of the mathematical limitations of Foruier might be reasonable, if the data from other pollsters had failed the test of rendomness used. Since they did not, go back to your academic towers and/or sit in them and produce a test YOU think is better.
Always easier to sit outside and piss into the tent, but...
Peter Mork and math geeks -
More specifically, your data that has a distribution around 45 or 50 does NOT comport with the reality in the datasets already presented here, thus although you might haver a mathematical argument I see nothing that supports your position in the real world, you know, the one outside with the sun and stars.
Fourier analysis is used very successfully in digital audio (as all your mp3s will attest), and those aren't technically periodic signals either. There are windowing effects to be aware of (and I'm sure he is) but if there's a frequency component Fourier is usually a good tool.
Whoops, this sentence: "In addition, the sample sizes that SV typically uses are divisible by 100, making rounding errors unlikely." appears to have been added in editing. At first glance, I don't agree with it because, as ecarlson pointed out, it's precisely even multiples of 100 for which arbitrary rounding conventions are needed.
As a long time reader of 538 and a physics grad student at UIUC, I have to say, small world Prof. Weissman! And thanks for the interesting analysis.
Thank you Nate and Dr. Weissman for putting the "Science" back into "Political Science". I am a geneticist with a strong interest in politics, and this is website is the only source I've found that is consistently filled with rigorous scientific analysis.
Great work, I hope Strategic Vision goes down in flames for their now-indisputablely dishonest practices.
@Bradford: I certainly accept that a Fourier analysis has limitations. I was merely responding to one specific criticism, namely that randomly generated values might exhibit similar "noise" that could be picked up by a Fourier analysis. At this point, I don't think there's any evidence that a Fourier analysis is keying off of such noise. I will leave it to the Fourier experts to identify more technical limitations of the analysis.
@Matt Ackerman- Whoops I misread your comment. For Quinnipiac, one should at least use a two-tailed distribution, and probably some sort of post hoc stats. One-tailed gave p= 0.061 for Q, so two-tailed would be 0.122. Even without a posthoc correction, there's no problem there. Really, even single-tailed, there's no problem with Q.
"That leaves six with random amplitudes. It would be nice to have more than six, but that’s enough to catch extreme cases."
What does this mean? Doesn't more coefficients make it less likely to catch even extreme cases since with additional coefficients the Fourier analysis will yield a tighter fit of the observed data.
Please explain how this analysis works.
What does this mean? Doesn't more coefficients make it less likely to catch even extreme cases since with additional coefficients the Fourier analysis will yield a tighter fit of the observed data.
Ergo less false positives for fraud. This is a good thing. If you went the other way, more false positives, a positive would have less meaning. Remember, this isn't about finding something, this is about finding something with as high as certainty as feasible. The former would be a bullshit fishing expedition looking for something to smear with (a shout out to Karl Rove if you are reading this!), the later is about credible, more objective analysis.
Jon-
6 is less than 10, thus you agree with the analysis, I think, at least insofar as the fewer variables make it more likely that test will miss real fakery. Unfortunately for SV, even with the fewer variables, they still seem to fail this test...
What am I missing?
This analysis, or this program, doesn't hold up to scrutiny. Here is my code in C:
#include "math.h"
#include "stdio.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
srand48(8);
int i;
double d[10], sum, sumcos, sumsin, sumdif, sumsq;
double ave, var, sign, lowf, dev, p;
double base = 431., range = 245.;
d[0] = 562;
d[1] = 431;
d[2] = 472;
d[3] = 490;
d[4] = 526;
d[5] = 599;
d[6] = 533;
d[7] = 639;
d[8] = 676;
d[9] = 616;
/*
d[0] = base + drand48()*range;
d[1] = base + drand48()*range;
d[2] = base + drand48()*range;
d[3] = base + drand48()*range;
d[4] = base + drand48()*range;
d[5] = base + drand48()*range;
d[6] = base + drand48()*range;
d[7] = base + drand48()*range;
d[8] = base + drand48()*range;
d[9] = base + drand48()*range;
*/
for (i=0; i<10; i++) printf("d[%1i]=%3.0f.\n", i, d[i]);
sum = sumcos = sumsin = sumdif = sumsq = 0.;
sign = 1.;
for (i=0; i<10; i++) sum += d[i];
ave = sum/10.;
for (i=0; i<10; i++)
{
sumcos += (d[i]-ave)*cos(0.6283*i);
sumsin += (d[i]-ave)*sin(0.6283*i);
sumdif += sign*d[i];
sumsq += d[i]*d[i];
sign = 0.-sign; // reverse sign.
}
printf("Ave = %f.\n", ave);
var = sumsq/10. - (ave*ave);
printf("Var = %f.\n", var);
lowf = 0.02*(sumcos*sumcos+sumsin*sumsin)+0.01*sumdif*sumdif;
printf("var-lowf = %f.\n", var-lowf);
dev = (var-lowf)/(0.6*ave);
printf("dev = %f.\n", dev);
p = exp(-3.*dev)*(1+3*dev+4.5*dev*dev);
printf("p = %f.\n", p);
return(0);
}
Now, if I run it as written, I do get the same thing as Michael; if I run it with the commented-out code (between /* and */) allowed to run, I ALSO get the same thing as Michael: For example, here is Michael's numbers:
d[0]=562.
d[1]=431.
d[2]=472.
d[3]=490.
d[4]=526.
d[5]=599.
d[6]=533.
d[7]=639.
d[8]=676.
d[9]=616.
Ave = 554.400000.
Var = 5521.440000.
var-lowf = 1463.071532.
dev = 4.398363.
p = 0.000188.
Here are randomly generated numbers, in a uniform distribution (by definition using drand48), with the same minimum value and same range:
d[0]=465.
d[1]=526.
d[2]=432.
d[3]=577.
d[4]=549.
d[5]=557.
d[6]=592.
d[7]=515.
d[8]=648.
d[9]=606.
Ave = 546.576519.
Var = 3820.920762.
var-lowf = 2560.896276.
dev = 7.808898.
p = 0.000000.
Every time; change the seed (srand48()) and get different numbers and they work out the same. As far as I can tell, this program cannot reliably identify a uniform distribution when one is given to it by design.
@Dwight
Yes, fewer coeffs make it more likely to report fakery according to my understanding, but the blog post suggests that using more than 6 coeffs would help catch even more cases, not just extreme ones. It's my understanding that the opposite is true.
Hey doc,
Great stuff. Parris Island shrunk my head too much to handle all the math, though.
To put it another way, using C64 basic, your article made my brain do a SYS 64738.
I don't like number fudging on either side, so keep at these stooges.
Semper Fi,
Terry
Also, this statement confuses me:
"The most slowly-moving wave has period 10. There are two such period-10 components, with different peak locations."
Doesn't a DFT have exactly one coefficient for each period? Why does this post say there are two period-10 components?
I'm not sure about the two period-10s but for this:
Jon said...
@Dwight
Yes, fewer coeffs make it more likely to report fakery according to my understanding, but the blog post suggests that using more than 6 coeffs would help catch even more cases, not just extreme ones. It's my understanding that the opposite is true.
I think he's assuming a given threshold of confidence. With more you have the better confidence you have, thus you can rightfully count what would otherwise be a borderline positive.
Think of it as a pair of eyes, and associated image processing neurons. The better eyes not only have less false positives, they'll also notice more details that are really there.
Man, this takes me back to signal processing and digital circuit design classes...and not always in a good way. :)
Tony C.
Every time; change the seed (srand48()) and get different numbers and they work out the same. As far as I can tell, this program cannot reliably identify a uniform distribution when one is given to it by design.
But your numbers aren't random. They're 431 + a random number between 0 and 245. Wouldn't this affect your analysis?
One thing to keep in mind though is that sifting out the "signal" and identifying it's source are two different things. As you turn the sensitivity up you'd be more likely to pick up artifacts from other sources that could be benign. Ineptitude, or gross ineptitude, causing rounding errors is an example of that sort of thing (and why he chose to toss the "frequency" on which those are likely to occur).
I think he's assuming a given threshold of confidence. With more you have the better confidence you have, thus you can rightfully count what would otherwise be a borderline positive.
If he is assuming that, then that assumption should be stated, because since the data we are analyzing is fixed, confidence is not independent of how many coefficients are used, and it would most likely decrease considerably if more coefficients were used.
To me, this part of the program doesn't seem right. If I understand what the program does here, it should be calculating the variance of the Fourier function. I don't see how this line does that, though my stats are pretty rough.
100 lowf = 0.02*(sumcos^2+sumsin^2)+0.01*sumdif^2
@Tony C
Dude- where the hell did you get the variance you put in your random distribution? This isn't some arbitrary parameter. The counts for each digit are close to Poisson distributed if the null hypothesis holds. Their variance would be down a factor of 0.9 from the mean (the Poisson value) because of the finite number of bins.
You're a smart guy- get your head in the game!
@Adam
The code is confusing to me, as well.
Where in the code is the model being fit to the data? To me, it just looks like a set of a priori fixed coefficients are assumed.
What am I missing?
Aadam- Those are just the normalization factors to convert the sin and cos and the +/-1 to orthonormal functions before subtracting the squares of the projections onto them.
@Tony C: Thanks for another version of the code. However, your code doesn't sample from a uniform distribution. To do so, you want to generate a number between 1 and 100, with equal probability (e.g., drand48 * 100 + 1) a bunch of times and then count the number of occurences of each of the ones' digits. When I run this experiment in Excel, I get the following frequencies (based on 5000 samples):
480, 531, 522, 504, 496, 497, 534, 451, 501, 484
Running these through the code produces a p-value of 0.222976. Once again, we cannot reject the null hyptohesis of independent random events.
@Pan:
drand48() produces a uniform distribution between 0 and 1. The d[] values in Michael's analysis count how many samples end in 0, 1, etc. The idea is that these should be uniformly distributed, and thus these totals should be uniformly distributed.
The number I generate ARE uniformly distributed, between 431 and 676. If Michael cannot identify this uniform distribution, then all he is **really** testing is the variance, and the Fourier transform has nothing to do with anything.
The proper way to do this, if one has the time, is empirically: Generate 5544 uniform samples between 0 and 100, tally the final digits, and do that about ten thousand times, and you will get a pretty accurate distribution curve for the variance. THEN you could see how improbable the SV variance really is.
Unfortunately I am on deadline for a paper; so I don't have the time to do that much.
@Jon
"Doesn't a DFT have exactly one coefficient for each period? Why does this post say there are two period-10 components?"
It has one complex coefficient. The two coefficients for each frequency come from separating the real and imaginary parts of the complex coefficient.
This whole subject is so interesting, and there's so much to say. Much more fun than actually running my business or working for clients. Anyway... let's think about the legal situation for a moment.
If SV-LLC has never taken money for its polling services, I can't think of any theory that leads to criminal liability. Distributing fake polls isn't a crime. Even if there were malicious intent to mislead a candidate into a mistaken withdrawal from a race (for example) nothing comes to mind. If a polling firm took money from a client and explicitly promised to conduct a poll, but produced the data from a table of random digits, it's barely conceivable that criminal fraud could be proven; such situations are ordinarily rejected by prosecutors who tell the injured party to try the civil courts instead. The only exception would probably be where the injured parties included a unit of government, where fraud is taken more seriously and literally.
Shifting from the criminal to the civil environment, the only liability I can think of would be to a client who paid money for services not honestly rendered - if any such client exists. Unlike a regulated profession, polling does not hold itself as having any legal duty to the public. Indeed, because of the political context, and the broad sweep of the First Amendment, it would be very difficult for a court to create such a duty. And without a duty, there's generally no civil liability - to anybody.
Imagine that a poller intentionally issued a fake poll with the intention to damage a particular candidate by convincing his supporters to switch their support. Even in that extreme case - which is far beyond anything demonstrated here - there isn't any obvious recourse.
In short, since the law doesn't provide any remedy in situations like the present, our only protection has to come from bodies like the AAPOR or "vigilantes" like Nate. I guess we're his posse.
I hope other lawyers will respond, especially those with prosecutorial or judicial backgrounds.
@Tony C
Your are confusing which numbers are to be generated from a uniform distribution. Your code generates the observed _frequencies_ of the various digits as drawn from a uniform distribution. This makes no sense.
Those counts result from thousands of draws from a distribution over [0-9]. The idea is that the counts will approximate the true distribution from which they are drawn.
By using a uniform distribution for the counts, you are not modeling the distribution [0-9] as a uniform distribution, but rather as fairly arbitrary and complicated distribution generated by the random counts.
A couple posters have suggested that by raising the stakes, we are eliminating the danger of fraudulent polls. That's not completely true, for two reasons.
First, almost any single poll result will pass appropriate statistical tests. One poll can be an "outlier". It's only when there are multiple polls that it's possible to detect and confirm statistical anomalies. SV-LLC wouldn't have come to anybody's attention unless they generated so much data.
Second, the tests first applied by Nate, and then so much improved by Prof. Weissman, will only catch fake results that are concocted naively. It wouldn't be hard to create false data that would pass these tests, provided a reasonable level of care and statistical sophistication. (Hint to would-be fraudsters: limit yourself to falsifying the distribution parameters, then generate the results to be published using Monte Carlo methods.)
Charles Dickens remarked that the greatest criminals of his day would have made more money doing less work if they'd been bankers. Plus ca change....
@Tony C- Right, all we're looking at is the variance. We know what it should be under the null hypothesis- random draws from a uniform distribution of digit likelihoods: 0.9*mean. But first, to be fair to SV, we're removing those 3 Fourier components of the variability which might arise from the genuine non-uniformity of the distribution or from a rounding algorithm. Then we look at the the remaining 6 of the iid coefficients, whose contribution to the variance under the null hypothesis would be (6/9)*0.9*mean.
You're getting all your distributions scrambled up.
I modified Tom's program as follows to more correctly perform the test:
at the top:
struct timeval tv;
double totalPVal=0;
int testCount=100;
gettimeofday(&tv,0);
srand48(tv.tv_sec); // I'm lazy...
for (testN=0; testN < testCount; testN++)
{
for (i=0; i < 10; ++i) d[i]=0;
for (i=0; i < 5544; ++i)
{
int poll=(int)rint(drand48()*100);
d[poll % 10]++;
}
...
At the bottom:
totalPVal += p;
} // for each test
printf("avg P=%.4f, N=%d\n",totalPVal/testCount,testCount);
For a test of N=10000 I got an average P of 0.5016 (each test consisted of 5544 random 'polls', with the possible result).
Oops, forgot to finish my post. The possible result was 0-100 inclusive (and I verified that all values from 0 to 100 were set for each test).
Imagine that a poller intentionally issued a fake poll with the intention to damage a particular candidate by convincing his supporters to switch their support. Even in that extreme case - which is far beyond anything demonstrated here - there isn't any obvious recourse.
Are you sure? Wouldn't that be knowingly writing an untrue statement about someone for malicious reasons, i.e., libel?
Correction: Tony's program (which is really Michael's), not Tom.
@Jon
Yes, but these two coefficients are not different.
Yes they are. Even to represent a function that is purely a single frequency sinusoidal, you need two components. You can break it up as amplitude and phase, or alternatively cosin and sin.
These numbers are different for different functions, and in this case, different distributions. I don't see any reason why they would be the same, or even correlated.
@e23: yep, for Gaussian stationary noise they are iid: independent and identically distributed. Independent implies uncorrelated.
I just tried testing Michael's prediction that a p of 0.00018 occurs at roughly 1 per 5000. This seems to be accurate as well and can easily be verified by modifying my previous changes (simply keep track of number of tests that have a p less than or equal to 0.00019).
@Michael (mbw):
Alright, I stand corrected; the counts are NOT uniformly distributed. I generated 5544 numbers, counted the last digits of each, and the lowest p I get in about 100 trials is 0.1. Sorry about that...
@Anybody: This request may be premature, but can somebody please summarize what the discussion is about now in layman's language?
Is Weissman's finding ultimately in question, and if so on what basis?
Are alternative approaches failing to confirm his results? If so, how and why?
@(mbw):
I mean, 0.1 PERCENT. p=0.001.
It does seem the lowest 'p' doesn't match the trials. The above is 1 in 1000, not 1 in 100. If I do 5000 trials, I get a p=0.000021; which is actually 1 in 48,000 or so.
Code below for generating a trial:
for (i=0; i<10; i++) d[i]=0;
for (i=0; i<5544; i++)
{
p = floor(.5 + drand48()*100.);
x = ((int) p)%10;
d[x]++;
}
@Juris:
I fucked up a test of his results by coding an invalid short-cut, and I called them into question on that basis; but after further analysis I retract that. I made a bad assumption. I corrected my code to better simulate the actual situation and now his results look correct to me.
Apologies to MBW!
As an overall thing we are talking about Benford's Law, which you can look up on Wikipedia.
FYI:
I will note that you get skewed distributions like SV if you convert about 5% of one trailing digit (like 4) to a different trailing digit (like 5); but it has to be selective and create non-uniformity.
A post like this illustrates the other side of the coin of why fivethirtyeight.com has a dim future.
Posts like this give the reader an overdose of pedantic number-crunching that is ultimately uninteresting and often pointless (much ado about nothing) and hardly encourages increased traffic to this site.
On the other hand, the fact-less and ham-handed critique of John McCormack at the Weekly Standard:
http://www.fivethirtyeight.com/2009/10/weekly-standard-s-john-mccormack-is.html
exposes this site as nothing more than an outlet for partisan hackery and homerism, and the supercharged rhetoric in the comments section solidifies it as a far left echo chamber.
To succeed, this site has to find a balance, which it has shown flashes of in the past. Too much nerdiness or too many opinion pieces, and this blog is doomed to fail. Nobody (or a very select and negligible few) wants to hear about advanced statistical theory and its applications or redundant liberal "triumphantism" ad nauseum.
Good luck striking that balance. You're going to need it.
Thanks, Tony. We all make misteaks....
Can someone explain at a high level how this analysis works?
My understanding is that the goal is to fit the vector v = (562, 431, 472, 490, 526, 599, 533, 639, 676, 616) with a set of sinusoidal functions, and then use this to generate another 10-component vector v* that contains the "best fit" of v. Then, we normalize v* to a probability distribution d* and compute the probability that v is generated from d*, versus some other distribution (I'm not sure what).
Does anyone know what is right and wrong about this summary (and can you fill in the details, like what the p value means here)?
Tony, you need to do multiple trials. I ran it with a count of 1 million, and got a rate of one p of 0.00019 or less once per 5464.5 tests (with each test 5544 polls ranging from 0 to 100).
Here's the code I used in full:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main(int argc, char **argv)
{
struct timeval tv;
gettimeofday(&tv,0);
srand48(tv.tv_sec);
int i;
long testN;
double d[10], sum, sumcos, sumsin, sumdif, sumsq;
double ave, var, sign, lowf, dev, p;
double originalTotal=0;
double totalPVal=0;
long testCount=1000000l;
int tests[101];
double minP=1.0;
int nSmallP=0; // n less than 0.001
long step=testCount/10; // keep track of progress
for (i=0; i < 101; ++i) tests[i]=0;
for (testN=0; testN < testCount; testN++)
{
for (i=0; i < 10; ++i) d[i]=0;
for (i=0; i < 5544; ++i)
{
int poll=(int)(drand48()*100+.5);
d[poll % 10]++;
tests[poll] = 1;
}
for (i=0; i < 10; ++i) originalTotal += d[i];
sum = sumcos = sumsin = sumdif = sumsq = 0.;
sign = 1.;
for (i=0; i<10; i++) sum += d[i];
ave = sum/10.;
for (i=0; i<10; i++)
{
sumcos += (d[i]-ave)*cos(0.6283*i);
sumsin += (d[i]-ave)*sin(0.6283*i);
sumdif += sign*d[i];
sumsq += d[i]*d[i];
sign = 0.-sign; // reverse sign.
}
var = sumsq/10. - (ave*ave);
lowf = 0.02*(sumcos*sumcos+sumsin*sumsin)+0.01*sumdif*sumdif;
dev = (var-lowf)/(0.6*ave);
p = exp(-3.*dev)*(1+3*dev+4.5*dev*dev);
totalPVal += p;
if (p < minP) minP=p;
if (p < 0.00019) ++nSmallP;
if (testN % step == 0) printf("%02ld%%\n",testN/step*10);
} // for each test
printf("avg P=%.4f, N=%ld, minP=%.6f, N p < 0.00019=%d, rate^-1=%.2f\n",totalPVal/testCount,testCount,minP,nSmallP,testCount/(1.0*nSmallP));
return(0);
}
Instead of waiting on someone to make a gratuitous snide remark lamenting my intrusion into the comments section, I'll get it out in the open for them:
"Well, we made it all the way to comment #78 before that damn idiot and troll, Mule Rider, came and destroyed the entire thread! Woe is me!!!"
One of the things the discussion here represents is something we saw myriad times last year: a peer review process. It's quite remarkable how the many geeks here respond when given a math/stats/coding challenge.
I'm still in awe of what happened way back in summer 2008 when Nate asked that combinatorics question as a "Homework Assignment".
I would like to make two comments to the author:
1. Power Spectral Density Analysis would add a lot here. Instead of just looking at the spectral variance, this also measures the intensity of that spectrum. That likely will get back to the p that Nate was seeing.
2. It would be useful to run the same Fourier analysis on their political polls separated from the non-political ones. That is, if they are skewing numbers or making them up, they are perhaps more likely to be doing so for Political polls. I have to believe that for non political polls that someone has paid for, they must be calling someone. So, the non-political ones may be cleaning the numbers somewhat and the artifacts may show more strongly.
3. Taking same concept as 2, run the analysis on a single race that they polled multiple times for (ideally polled for most often), and then compare with the same analysis on other pollsters for the same race. This would help to identify a couple factors: if that one shows no abnormality, then you know that only some may be in question; if it does show an abnormality and all of the pollsters show a similar effect on that race, then you know it could be skewed by a particular race.
@Juris: Basically, it's another statistical method to test whether the poll results appear invalid based on the number of polls ending with each possible digit (0-9).
According to this particular statistical method, the p value should be near 0.5, with numbers straying away from this less likely. Using this method, the p value for Strategic Vision is extremely small and thus very unlikely.
Specifically, Michael asserted that this p value should only occur at a rate of roughly once per 5000.
Tony thought there was a mistake in Michael's method but later found this wasn't the case. In addition I found that this p value does, indeed, occur at roughly once per 5000 when seeded with random poll data.
So the question really is whether Michael's statistical method is applicable. I have no idea in that regard--I'm a programmer, not a mathematician. I took basic statistics in high school ages ago and that's about it as far as statistics is concerned.
@Persuter said... "Imagine that a poller intentionally issued a fake poll with the intention to damage a particular candidate by convincing his supporters to switch their support. Are you sure? Wouldn't that be knowingly writing an untrue statement about someone for malicious reasons, i.e., libel?"
I played with that idea before I wrote the post, and the answer is "probably not". First, libel has to be "defamatory", not merely "untrue". And the courts have held that if somebody claims you're dead - to take an extreme case - you haven't been libeled, since death isn't dishonorable. I would guess that being called a loser wouldn't be held to be defamatory either.
More powerfully, any candidate for public office would be required by Times v. Sullivan to meet a standard of proof so high that it's effectively unattainable. I don't know of a case right on point, but I bet the courts would mumble about the rough-and-tumble of our political system and the need to protect vigorous discourse and hand you back your complaint, while keeping the filing fee.
But your answer, that it might possibly be held libelous, is as good a theory as I can think of.
@Shaun:
Yeah, out of 100,000 separate trials I got 15 instances where p was below the MBW threshold; so about 1 in 6666 or so....
Enough of that for now.
I ran some tests similar to Weissman's, and in most of the variations the SV numbers looked much more unevenly distributed than randomly generated distributions, while the Quinnipiac numbers looked slightly less evenly distributed than typical randomly generated distributions.
One exception to this was the following: Generate random distributions according to a normal distribution centered at 45 with variance 2. In this case the random distributions are more similar to the SV numbers, while Quinnipiac looks like an outlier. (I also let the variance be 5 and 8, and in these cases SV looked unusual, but maybe 1 in 100 unusual rather than 1 in 5000 unusual.)
So my question for the rest of you who have working programs to run this sort of simulation: Have you studied the dependence of your results on the specific type of random distribution you use? More specifically, have you studied the effect of making the variance of the input numbers fairly small (e.g. 2)?
P.S. Of course it would be much better to compare with a variety of other pollsters, but I don't have those numbers.
P.P.S. Disclaimer: I'm a mathematician, but without any formal training in statistics.
Grebner?
Disclaimer - I am not an attorney, and have not taken any courses on the subject of law. What I say below is based on my observations and (what I think is) common sense:
One - SV LLC hire you to defend them?
Two - If a poll is taken, it has no influence unless it is released, be it to a client and/or the general public. If it is found that SV LLC has concocted it's polls in the past, legitimate and responsible media will not publish the polls. If non-legitimate and/or irresponsible media publish the results of the "poll", that will just reinforce the non-legitimacy and/or irresponsibility of that outlet (Faux News? Drudge? We're looking at you).
Also, if any polling firm is sued, that firm will incur expense in defending itself in court, and/or in pursuing a complaint in court. The individual or other entity suing and/or being sued will almost certainly be able to get much legal advice at little to no cost, or costs covered by sympathetic other parties. The best defense for SV LLC would be to make public the information it currently is hiding (if it actually has such information), and thus make it much more difficult to be sued, or easier to sue for defamation anyone who says it is 'cooking the books'.
Finally, even if SV LLC does not charge for the polls, it makes some, most or all it's money on the PR side of the firm. If the polls are discredited, then the PR side of the firm (which many times uses polling to back up the PR side) loses credibility. So even if SV LLC were to win any court case on polling issues, the public relations hit on the firm could (and probably would) be a very adverse hit on the PR side of the firm, and thus the firm would almost certainly be in deep trouble.
The solution? Follow standard practice of the polling organizations, and release cross-tabs of the information. People may question how you came up with, for example, the make-up of some of the samples (pre-interview to select your sample population, as I suspect some extreme partisan pollsters do?), or the wording of the questions, and/or the order and sequence of the questions, but at least you have a defensible defense that the numbers in the poll were not pulled out of thin air (or some body part), but from actual people who actually were polled.
Mike in Maryland
@Michael
Now that my above post is starting to get buried, can you please explain how Fourier analysis works at a high level for the purposes of statistics and computing a p-value?
My best guess is that you are somehow computing a 'smoothed' set of data and comparing the data set to this (though I don't see where this actually occurs in the code).
However, I can't see how multiplying a few cosines and sines with the data accomplishes this. Also, I can't see where an Fourier transform is actually computed, and this makes me suspect that something else is going on.
Can you please illuminate? I have a background in signal processing and econometrics but not statistics.
It took an awful lot of horsepower to come to this conclusion. In the nature of things that horsepower won’t always be available or motivated.
The answer of course is some kind of automation administered by AAPOR.
Mule thinks we’re all talking to ourselves and he could be partially right (heh heh). The general public isn’t paying much attention. As a media guy, I can almost guarantee the producers at CNN aren’t terribly interested.
What needs to happen is an “x” test where “x” is a reassuring name for a rigorous process and set of routine statistical proofs.
The CNN producer would be able to say, “The Strategic Vision LLC poll failed to pass the “x” test”. Neither the producer or the viewers/readers would actually have to understand the test. Control would by definition be in the hands of the scientists.
The test would include statistical proofs, a set of call center data and methodologies. Can a subset of this group design the “x” test?
Small correction to my earlier comment: The numbers (i.e. "1 in 100") in the parenthetical comment about variance 5 and 8 distributions are incorrect.
@k23- Your initial distribution (width of 1.4) is radically dissimilar to the actual distribution. My post (thanks to Nate's edit) describes the actual distribution of the SV results: spread out all over the 30's, 40's, 50's, and 60's. So a distribution ranging from 43 to 47 has nothing to do with the case we are discussing.
I guess a simple test for an individual poll and a more useful one for a group of polls.
@jon- I'd be happy to provide a tutorial on Fourier analysis, but don't think that another long technical comment would be welcome. These are all things easily available in elementary texts or probably online.
@ Thatcher- If there were a well-known standard test, fakers could easily adjust to it. In fact, Mark Grebner just gave good advice on how to beat most standard tests.
A likely outcome of all this scrutiny is that SVLLC will in the future ensure that their polls don't so often represent an even multiple of 100, that their trailing digit frequencies are more uniform, etc. That is, there will likely be a sudden conformity to numerical expectation based on the complaints.
Of course, all that will mean is that those scrutinizing SVLLC will look to secondary characteristics and consequences to overt repairing that are unlikely to occur in naturally occurring data.
It will be fun to watch... I'm putting the popcorn in the microwave now.
@Michael
Is there really no hope?
I would like to ask Dr. Weissman a serious question.
What are the odds the Cards can go 14-16 since September 1 and still win the Series? Is their some physical principle that will cause them to revert to hitting in the post-season? Please reply immediately as a quick drive by me over the Nevada state line depends on your answer.
Go Cards!
A likely outcome of all this scrutiny is that SVLLC will in the future ensure that their polls don't so often represent an even multiple of 100, that their trailing digit frequencies are more uniform, etc. That is, there will likely be a sudden conformity to numerical expectation based on the complaints.
Of course, all that will mean is that those scrutinizing SVLLC will look to secondary characteristics and consequences to overt repairing that are unlikely to occur in naturally occurring data.
I don't know that there would be any need. If the statistics of their data change significantly in response to Nate's accusations, that would be the clearest evidence yet of fakery.
Certain people might have fraudulent misrepresentation claims against SV -- for example, a client who, in engaging SV, relied on SV's purported poll-taking experience.
@Kenneth: Considering they have to make it past the Dodgers and (hopefully) the Rockies, no chance at all.
However, should they make it to the World Series they'll have my 100% support (they're my favorite team after my home team). Until then, go Rockies!
@Kenneth Ranson:
Sorry. Some of the mysteries of the universe are beyond mortal comprehension. I'll be engaged in worship next Saturday at Busch Stadium.
@Michael
I can't find any tutorials on Fourier analysis for statistics. I searched pretty intensely, but maybe I'm not googling the right things. There is no mention of statistics on the Wikipedia page for DFTs, and I just can't think of how a DFT can be used to compute a p-value (nor do I know exactly what the p-value is referring to in this case).
Do you know of any links? What about a relatively non-technical explanation. What's the elevator talk for how this technique?
@Jon -
Here, roughly, is what's going on in the program (in the original post above).
Lines 10-24 input the SV last digit distribution.
Lines 30-50 calculate, in slightly disguised form, 4 of the 10 Fourier coefficients (FCs). "sumdif" is the unnormalized highest frequency FC. (Think of (-1)^i as a high frequency sine wave.) "sum" is the unnormalized lowest (constant, zero frequency) FC. "sumcos" and "sumsin" are unnormalized lowest non-zero frequency FCs. (Note the the "-554.4" is unnecessary -- it does not affect the final answer.)
Line 80 computes the usual variance, which can be thought of as the norm-squared of the data after the constant FC has been subtracted out.
Lines 100 and 120 subtract out the other three FCs that were computed above. The "0.02" and "0.01" are here because in lines 30-50 the unnormalized, rather than normalized, FCs were computed.
@Jon, you are correct that the full Fourier transform was not computed. All that was computed was the part of the FT that we want to remove from the data.
Off-topic…
…but that’s never stopped me before. A curious incident has inflamed the VA governor’s race. Will this be McDonnell’s “Macaca” moment? Johnson apologized (sort of) but McDonnell’s campaign guru went on the attack against Deeds for Johnson’s gaffe.
Kinda typical of the GOP—when your candidate farts try to make it look like his opponent's fault.
@k23- Thanks. You're right, the -554.4 is superfluous, left over from a trouble-shooting step. I wondered if anyone would notice.
@ Paul K:
1. We have a total of 3 frequencies on which to calculate powers after dropping the suspect ones. We've already calculated the total power in the 3. If you want to analyze the other two parameters, I say (as someone who is tired after spending 4 decades doing Power Spectral Density Analysis, although without the caps), go for it.
As for getting back Nate's p-value, you won't get it that way. The reason is that much of the variance Nate used came from the period-10 component, for which the priors are arguable.
To get decent priors on that, maybe we could take a smoothed version of Nate's overall tabulation of SV results 0-100, then fold it mod 10, then Fourier analyze. Or we could look at a large collection of other pollsters (like Q) to see if they consistently have smaller period-10 components.
Points 2&3: no statistical power, I guess.
@k23
So the technique consists of the following steps?:
1. Run d through a low-pass filter to generate another, smoother vector d*
2. Normalize d* so that it represents a probability distribution over the ones digits.
3. Compute the probability that a random vector generated using the probabilities in d* has a greater error (whatever this means for a 10-component vector) than d has.
Is this correct?
@Jon- No. It's more equivalent to this:
1. Run the data through a high-pass filter. (Also knock out the even-odd variation.)
2. Look at how much power is left.
3. Compare with the power expected for random draws.
@Pragmatus:
Please don't veer off-topic. The posters have been great in ignoring one poster trying to drive the comments off the road. Don't play the same game from the other side of the partisan divide.
@Everyone else:
Keep up the commentary! Fascinating stuff, even when I don't understand it.
Way off topic, but needs to be said.
In America, we do not value the lives of girls. Its not just that girls are told (wrongly) not to bother pursuing a career in fields such as physics or chemistry, resulting in two scientific fields of endeavor that are almost exclusively male. Not is it the fact that those who do excel are ignored-a mere ten women in all of history have received the Nobel Prize.
No, I say this because of Roman Polanski. Now, when a young boy is molested or photographed in a sexually suggestive way, we rightfully lose our collective shit. One reason why I flatly reject christianity is the behavior for which Catholic priests have become renowned for in recent years. However, when someone drugs and then violently rapes a 13 year old girl, admits guilt, and then flees to Europe-only to later brag about it to a newspaper reporter-his arrest is seen as a travesty of justice.
Now, don't get me wrong. I'd rather watch The Pianist than attend mass. However, I do not see how great talent necessarily translates to the right to rape a child and get away with it. Had this been father Polanski instead of Roman Polanski, you bet your ass I'd be howling for blood. I don't see why the outrage is directed at the Swiss for detaining him and preparing his extradition, and not the rapist.
To be sure, the victim has forgiven Polanski. However, he has expressed anything but remorse for his deed. He should be called to account for his crime. Even if the victim successfully pleads on his behalf for his later release, or for light punishment, he should at least be made to answer publicly for his atrocity.
I find it horrific in the United States that even the Hollywood elite, as liberal as I, cannot see the crime of child-rape as being equal, no matter the gender of the child or the height of the perpetrator's profile.
Long live the glitterocracy.
One of the things I like about teh innernets is that every now and then I can find a discussion that is interesting, relevant, and challenging, even when some (or most) of the science involved is more than I understand. (I like to think that's uncommon. That's probably self-selection ... I visit ESPN more often than, say, exploratorium or MathWorld.)
This is great work and great discussion. Even if it doesn't change anything at all, enough of us will read it to be able to tell other people what we know, and sometimes that's enough. It doesn't have to catch all the fakers out there (whether or not SV is actually one of them; seems hard to argue otherwise, though). If it catches one, or more importantly, replaces one who uses questionable methods with another who uses proper methods, then it's a good thing.
Great analytic techniques deserve great data! It's too late now, but knowing where we've ended up, we can see a couple of safeguards that should have been taken on the front end, which would have made the analysis easier and more powerful. Of course, the whole purpose of Fourier analysis is to tease the real signal out of a confusing and noisy jumble of data, but the less noise and jumbled the better it works.
First, reducing the SV-LLC data to its rightmost digit, which Nate did in expectation of applying BL2, can be seen in retrospect to have been a mistake. Preserving the full numbers, over the range [0 100] would have allowed Fourier analysis on additional modes, and it would have avoided the "echo" problem created by the small aperture. Poll results of 37% and 41% are SIX points apart in Nate's dataset, and are mapped onto the same point as {41%,47%}, but not {41%,45%}. Fourier can deal with the distortion, but might have reached even clearer results with cleaner data.
Second, as I understand Nate's original posting, he collected both the highest and second highest candidate result from each poll. So if Jones leads Smith 52%-to-41%, both "2" and "1" were tallied. It seems likely that if somebody fakes a poll, their pattern might be different in fabricating numbers for the leading candidate, versus second place. It might be, for example, that all the tampering occurs in creating the leader's percentage, while the other number simply echoes that choice with an eye to creating a plausible "undecided" or "other" tally to report. Keeping the two tallies separate (and possibly keeping a tally of "undecided") would probably reduce the muddiness of the patterns.
Finally, I don't think we've seen enough interest devoted to the fact that SV-LLC apparently doesn't "round" their data (to the nearest whole percentage) so much as "stretch" it (to force the total for each question to exactly 100%). I can't think how that would create the patterns we see, but perhaps there was no need to ignore the mode corresponding to differences of exactly 2%. And - by the way, you practitioners of the FFT - what ARE the differences among the three data sets when examining the 2-mode?
Finally, I don't think we've seen enough interest devoted to the fact that SV-LLC apparently doesn't "round" their data (to the nearest whole percentage) so much as "stretch" it (to force the total for each question to exactly 100%).
The key word there is "apparently", because these analyses suggest that they don't stretch (nor round) anything--they just make up integer numbers that ad up to 100. :-)
I think this a great post and comment thread in the scheme of things though, aside from the code (I made a living as a FORTRAN programmer for a decade when I was much younger) and the baseball, I have only a very general idea of what's being discussed. Good call, Nate.
Thanks again, Michael (mbw).
@juris
misteaks :-D Love it.
@Mark Gruebner
I've been trying to think of a legal angle, too (see previous post regarding payola in the music industry). I concur that unless it can be shown that funds for the polls were misused you're only guilty of lying, which in and of itself isn't illegal; just tasteless. On the other hand if any of this geekiness could prove that SV LLC had used someone else's data and modified it, that would be stealing since it relies on other people's polling (and the money it took to conduct the poll). I think you could make a prosecutorial argument there.
Dug out my old media ethics law texts. Thought there might be an angle in there but there isn't a lot that sticks. Could only find cases involving campaign regulation.
plus c'est la meme chose
Since the analysis is way over my head (my engineer degree is too old and used to recall even calculus)...I'll just throw in another possible line of investigation...now that we have some apparently irrefutable indications they are cheating, I want to know who SV and why they are doing this.
Maybe SV is just stand alone firm trying to get over, but given their apparent partisan bent, they make me suspicious they may be in league with other rats who have an political ecology that they feed from...help get certain politicians elected, certain politicians throw them work/money..and so on. Is SV related to other business/non-profit enterprises, is there a way to figure that?
@Mark Grebner
A histogram of the trailing digits split by the high and low values would be interesting. If they show the same distribution, then it seems to show that the high and low values were generated by the same underlying process. If the numbers are really "made up" it seems likely that there would be systematic differences between the high and low numbers, giving rise to different distributions for the two sets of numbers.
@Jon:
Fourier analysis answers the question "is there some sort of undisclosed rounding that could account for the distribution of data?" If there is, it is assumed that the algorithm will apply in a *regular* fashion. Regular distributions will result in p values closer to 1. Irregular distributions will result in p values closer to 0. The very low p value for the SV distribution leads one to believe that the irregularities in the distribution are not attributable to some unknown (but regular!) rounding algorithm.
@Brad Buchsbaum:
It took me several readings to fully grok, but your suggestion is correct. We should clarify that you're referring to my complaint that Nate simply lumped together all the data both for the leading and trailing candidates. Separating the data into two series would allow an additional comparison.
One of the important points of Prof. Weissman's work is that the histograms can't be compared in a simple way; only certain statistical properties allow reliable comparisons. In particular, it's possible to imagine that an honest poller might find "typical" leading candidates generally score say 52-to-56%, while the person trailing might often be between 38% and 41%, so their gross patterns might vary. But your point is correct that if the distributions were separated, comparing their high-frequency components would yield an additional valid test.
Here's another suggestion at an analysis that might be able to shed some light on the situation...
Each result is a multinomial sample, right? With major choices A, B and "the rest" which adds up to 100% exactly. So, could we not take the entire dataset of N different multinomial samples and simulate another dataset using the observed SV data as the expectation? And then do that a few 10s of thousands of times and examine the trailing digit distribution for the entire data for each simulation. Then we could calculate some sort of deviation from the mean of all simulations (like maybe something similar to a Chi^2 statistic) for each simulation and the "observed" dataset. If SV is really reporting a true random deviate from an underlying multinomial sample, then the observed result shouldn't differ much from the simulated result. This could also be applied to Quinnipiac and other pollsters to see if this is a valid way of looking at things.
I've already coded this up in R and it seems to offer some insight, though I can't quite place it in a traditional Likelihood Ratio Test or Bayesian framework, off the top of my head.
This kind analysis has a lot of similarity to the "smooth tests" of goodness of fit.
E.g. see the book by Rayner and Best.
The fastest way to stop this examination of the SV LLC polls is for SV LLC to release the cross tabs of the polls. Of course, that is predicated on SV LLC having done legitimate polls, and the oddities of the polls being noticed are just anomalies, whether the occurrence of such anomalies occurring is rare or even more rare.
The longer SV LLC goes without releasing the cross tabs, the more guilty they look, the more questions possible clients have about the company, and the more time for loss of reputation, not only for their polls (if they actually are legitimate polls), but also, and maybe especially, for their reputation as a good PR firm. After all, the LinkedIn profile of SV LLC states:
Strategic Vision, LLC is a nationally recognized full service public relations firm that provides public relations, marketing, advertising, graphic design, and web integration. The agency has three distinct divisions to serve our diverse clientele. Strategic Vision is headquartered in Atlanta, Georgia.
Whatever your specific need may be, Strategic Vision, LLC provides the personalized service to reach your targeted audience. We give our clients a competitive edge.
No mention of polling in the wording of that profile (which, by the way, is under the control of SV LLC).
I'm of two minds on this subject right now.
I want SV LLC to be taken down, and taken down NOW, based primarily on their political leanings.
On the other hand, the longer they fight the accusations with silence and/or threats of law suits, the more the entire company's reputation is hurt (including the PR side), and the more opportunity to find out who else might be involved.
Right now, the slow death of SV LLC argument wins, as that means it will be that much more difficult for them to make a comeback to pollute the polling waters again.
Mike in Maryland
The one thing I am unsure about is the chance that such improbable numbers would occur in intentionally faked data.
I know that this probability is impossible to calculate, but wouldn't you expect a polling company to know how to fake or generate real looking data?
Was this poll a scantron or something that could have accidentally screwed up?
@allie said... "I know that this probability is impossible to calculate, but wouldn't you expect a polling company to know how to fake or generate real looking data? "
Actually - no. There's a lot of research that suggests people who try to pick random numbers do a very poor job of it. Unless somebody is planning from the very beginning that their fakery will be detected and analyzed in a particular way, they'd be likely to create their results by looking at the results of other companies' polls, and then adjusting them a few points to suit whatever agenda is being advanced.
From Johnson's media contacts to date, I'd say he's not very skillful at lying - the answers he concocts often conflict with the public record or with his own previous statements - and that's when he's on guard. It's hard to imagine that he'd have done a better job concocting statistical data at a time before anyone suspected anything.
@Mike, "We've already calculated the total power in the 3. If you want to analyze the other two parameters, I say (as someone who is tired after spending 4 decades doing Power Spectral Density Analysis, although without the caps), go for it." - sorry I capitalized them, but I started with PSDA and then realized not everyone knows what that is. I will see if I can grab some time tomorrow to run it on your total results.
"As for getting back Nate's p-value, you won't get it that way. The reason is that much of the variance Nate used came from the period-10 component, for which the priors are arguable." - that is true. Folding %10 may be the easiest.
"Points 2&3: no statistical power, I guess." - not sure what you mean. My point is only that looking at signal distribution on a split set will isolate power. The problem is that your sample set is too large a window and likely attenuating the possibly manipulated signal (maybe even with white noise). Either a filter to catch the strongest of the irregular signal to see if it has a correlate in type of poll, or a split set (on a guessed correlate) to see if the FT yields the same answer. The filter could be an interesting technique if the poles are placed wide enough to not catch just local anomalies.
Is the polling data actually conveniently aggregated for crowd-sourced analysis? A simple tabular format with:
p1 p2 p3 N
Would make this pretty easy to run through my R code. It'd be cool to also check SUSA and Q.
@ Grebner et al
Is there not some sort of trading standards or advertising standards legislation that could work off a legal definition of a poll. This would be a bit like getting Al Capone for tax evasion, but could you go after SV every time they referred to a "poll" in corporate literature/sales pitches on the basis that making up numbers does not meet the standard definition of that word?
@Allie:
The most common way for people to fake numbers in accounting (or polls) is to not really fake them; but to purposely make "mistakes" they can plausibly claim were the result of inattention, rushing, tiredness, dyslexia, a brain glitch during transcription or just innocent oversight in some way.
Of these, the most common is changing numbers that look similar: 1's to 7's, and 3's to 8's. When recording your lunch receipt; you put down $18 instead of $13. If you get caught, claim you misread it. When you put in your hotel bill for $610, accidentally add in $670.
If you look at SV's numbers, you see exactly this pattern. The numbers under-represented by 10% or so are 1,2,3, and the numbers over-represented are 7,8,9.
A plausible explanation is that somebody is changing 1's and 2's to 7's and 3's to 8's, for about 15% or 20% of the results; and then changing other numbers at random to account for the difference. Such changes could add about 5% or 6% to certain candidates, which is about as much as one can get away with on an individual poll without being called a liar by other pollsters. They will just say your poll is an outlier.
The problem with engineering numbers that would stand up to Benford's Law or other sophisticated analyses is that, when caught, the results are too obviously engineered and there is overwhelming evidence of an intent to cheat. For most cheats it is more important to keep their intent to cheat clouded, even if they must feign incompetence to do so.
Of course a high probability of intent is revealed in aggregate analysis; both in accounting and polling, but on any individual poll it is hard to tell if they cheated, and nearly impossible if their methodology and interviewee selection criteria and such are not disclosed: Any of these can account for a few points swing in a poll.
The upshot is a fudger's goal is usually to maintain plausible deniability, and like anybody, cheats stick to the old reliable methods they have been using all their lives.
BTW, if one wants a way to fool Benford's law, the easiest way is to never change the distribution of any digit: just exchange them with some other digit, and the counts are unchanged. For example, if you polled 43 / 57, you can change that to 47 / 53 to make the race tighter without affecting the distribution.
Of course that won't do for accounting cheats because all the numbers would add up the same! Plus it is less plausible you confused 3 and 7 twice, but for a polling operation the difference is what counts.
@PaulK- You may well be right that splitting the numbers into various categories could give some sort of sharp signal, big enough to show up against the noisier statistics. I'm guessing not, but that's a pure guess.
@Mark Grebner, JJE,Efrique ...
Yes, there's always more info in the total record before compressing it mod(10). So a reasonable way to deal with this is to fit the whole result set to some smooth curve, e.g a Gaussian times some low-order polynomial, and do a chi-sq test for goodness of fit. This is using implicit priors that the real curve after convolution with sampling spread is smooth. People would argue with the priors (although they shouldn't), so the answer would be again to run the same test on other poll report sets. Perhaps the result would give a little lower p-value.
Fraud is fraud -- promising to provide a good or service, recieving payment for said good or service, and then not providing it as agreed (say, by providing a counterfeit instead), is fraud.
While the "public" polls cannot be held to this -- only to ethical scruitiny, any CLIENTS of SV who might suspect THEIR numbers were also cooked would certainly have more than adequate standing to sue.
The question is, if they're cooking THESE numbers, do they also cook numbers for paying clients?
Me wonders if at this very moment, a large client of theirs is running similar analysis on the "data" SV provided them over the years.
@DaveNY:
In all probability, if SV cooks numbers, they do so at the behest of clients that want to deceive the public. I can tell you from thirty years of business contracting that management can find "independent" analysts happy to report whatever management wants them to report, as long as they get paid. Although I was not in the business of studies (more implementations) I know it isn't unusual at all for management to provide some pretty explicit hints of what they want to hear; in politics I imagine this would be along the lines of
"We know this race is closer than is being reported; and those reports are casting a negative shadow on our client and our fund-raising. People don't want to contribute to a lost cause. So we need to let them know what we know, this race is still close, and we are looking for the right partner to help us prove that for our supporters."
No explicit request for fudging, so there is plausible deniability even if they happen to be taped, but the point is clear. These people aren't going to sue. They WANT fudged numbers that are made public, and SV is in the business of getting paid for providing them.
One place I saw this in action involved the purchase of a $5M computer system. The company didn't need it, the job it was intended for could be done for less than $1M. I couldn't believe the company doing the study was producing one "fact" after another supporting this system; but with a little research I found the reason: The system was being sold by a buddy of the CEO; the "study" was being done by one of the CEO's fraternity brothers for $250K, and voila! $4.25M worth of corporate money spent on next to nothing. This kind of thing happens all the time.
Oh, believe me, Tony C., I know what you're talkin about.
I was stating that more as a hypothetical than anything else -- SV certainly COULD be sued... but yes, it would be dependent on a client actually wanting genuine data (say, for market research purposes), and suspecting that SV fabricated the numbers.
Or, more interestingly, a client could have happily accepted what they knew were probably fake numbers that served their purpose... and also would have no qualms about suing the hell out of SV for monetary gain...
As to the legal questions raised, if SV LLC has released blatantly fraudulent polls they have created a variety of torts which could be pursued at law.
The people most obviously harmed are the candidates shown to be behind in any of their polling. They could claim monetary and reputational damage from the false polls. The defense that they were actually behind in the race as shown by legitimate polls should not excuse the blatantly fraudulent behavior of SV LLC.
Other potential litigants with torts are: any clients attracted to SV by such polls, any news organizations who published the false data and so suffered damage to their reputation, any competitors of SV who lost clients due to SVs false claims of having conducted polling, other polling organizations who suffered damage to the perceived value of their product, any voter who was persuaded or motivated by such polls one way or the other, and so on.
The problem with committing blatant fraud is that it is hard to defend yourself against any claim of tort, no matter how peripheral, since your behavior was so obviously illegal and outrageous. That is why the head of SV refuses to answer Nate and is hoping this whole thing goes away. If he ever has to admit such public and gratuitous fraud he will never get out of court and he knows it.
go cardinals!
I still do not see why Nate refuses to do 3 very basic things.
1). Prove that supposed "anomalies" are indicative of fraud. In other words, if traditional pollsters produce the same sorts of "anomalies" and these are highly trustworthy pollsters then there is no evidence of fraud.
2) Prove that Strategic Vision is indeed an anomaly. So far Nate has produced a control sample of N=2 and one of those data points is not even his. It is from the comments. If this is science, God help us.
3). Account for the fact that legitimate methodological differences among pollsters such as:
-How hard you press "Undecided", "No Opinion", or "Don't Know" to respond or commit
-How close to the election you poll and with what frequency
-Weighting, rounding, etc.
...affect pattern in the trailing digits.
As far as we know Quinnipiac and SUSA are oddities for having such uniform distributions.
If someone were faking data using bounded data randomization techniques(as almost any clever con artist would), why should we expect a non-uniform distribution.
A nearly perfect uniform distribution is what a decent fraud attempt would look like.
Systematic biases caused by true methodological differences SHOULD cause NON-RANDOM patterns.
Nate still has not proved anything except...
Strategic Vision has a different trailing digit pattern than Quinnipiac or SUSA.
So what?
I have posted a link to data that shows that these "anomalies" are so frequent under relatively controlled. conditions that to call them anomalies is outrageous.
Nate is masking several basic flaws in his case with technical gobbledy-gook.
N=2 is not a valid control sample and anyone who is convinced it is should be ashamed.
Yo, Mike, it's a good thing we're related or I'd make some snarky comment about how it's all Aramaic to me. Oh, wait, I can actually understand Aramaic. . . .
Neal
The professor's analysis suffers the same fate as Nate's.
He starts with a theoretical assumption. The distribution should be cyclic. Great.
I have no problem with theoretical assumption in the absence of observable data. We have to make unprovable assumptions sometimes for the sake of the broader experiment.
What I do have a problem with is the fact that this assumption is absolutely testable, and with an N that can range into the dozens.
Rather than show with actual analysis that the vast majority of pollsters fit this theoretical assumption, the good professor takes a pass.
He uses distributions which cannot be corroborated because Nate refuses to post his dataset and because the othe distribution just showed up in the comments. There is absolutely ZERO ability to independently validate these distributions because of the refusal to post the raw data.
As a Phd, I would have expected the professor to use a test sample of polls, such as all 2008 primary polls, or all 2008 General Election polls to prove that periodicity is a valid theoretical assumption. He could easily do this with Nate's own datasets.
The absence of such an obvious piece of supporting analytics raises big questions.
A control sample of 2, based on data of unknown quality is an utter abomination to the practice of science.
What, are you people young earth creationists too?
This has about as much science in it as creationism, for the reason that it makes unnecessary theoretical assumptions when a whole body of public domain data means you do not need theory...there is recorded evidence to use.
The equivalent of buried fossils are there to be used on any number of sites, including this one.
Why use theory when actual data is available?
Well, because the data might inconveniently upend the necessary theory...
@MidPointMan said (Sept 29):
"...I have also extracted all 2008 election polls from RCP ... They are 893 polls. Strategic Vision is not an outlier in the trailing digits in this dataset either. I will look for a suitable place to host the file and post it tonight."
OK - can you please provide a link to the data you promised?
@Mark Grebner
Has Nate posted the data he analyzed? I'd love to run it through my code. But I think simple trailing digit frequencies isn't sufficient. I'd prefer the actual multinomial deviates
@MpM
I don't think you know what the word "periodicity" means here. There is absolutely no assumption that the initial 0-100 distribution has some periodicity, which would be quite weird.
It's just that the 9 to 0 gaps aren't particularly different from the 0 to 1 gaps. That's evident from even the crudest understanding of the distributions, and is fully consistent with all 3 data sets we have here. So the basic picture is not just "testable" on these sets, its confirmed.
You've got the code, which runs for free on any computer. Supply any further complete data sets of similar type (a mixture of many races covering a broad range of percentage splits, with no selective editing) and we can all run it on them. Nate has already tested a much more complete set from many pollsters. This big set showed no peculiar digit stats even with the low Fourier component left in.
"As a Phd, I would have expected..." You are? In what? Not any mathematical science, or grammar-based field, I guess.
Ok, this is for nerds only. What was all that "periodic" or "cyclic" stuff about? Why does it play any role? In other words, why does it matter that there isn't some big systematic trend up or down from 0 to 9?
Let's say we had a we had a full distribution that started at 0 at say 30% and continued linearly up to a peak at 69%, before dropping back to 0 at 70%. It would have many more trailing 9's than 0's. Sampling error would reduce that effect up near 70% and near 30%, but not in any of the intermediate ranges, so it would mostly survive. The Fourier power of that sloping pattern goes down only as the square of the Fourier index, so the components we looked at wouldn't have been filtered very much.
The actual patterns don't look at all like that. They're approximately cyclic. What appears as high Fourier components actual are rapid wiggles in the pattern, not artifacts of looking just at the trailing digit. Rapid wiggles are filtered out by random sampling error. Hence the method works.
OK, I grabbed another data set. This was Nate's "all Senate and Presidential polls -- more than 3,000 (!) of them -- in my 2008 database".
Here's the output:
>run
625.4 (mean)
1427.64 (variance)
440.446325 (filtered variance)
1.173772 (above/expectation)
0.316922 (p-value).
@Michael (mbw) - Can you provide a link to the dataset? Is it freely available?
Also, from your discussion, should we assume it was rightmost digit only?
@Mark Grebner- It was from Nate's 2nd SV post. I don't have any personal sources for these data.
Yes, rightmost only.
We're up too late.
@Michael (mbw) & Mark Grebner
So, if we wanted to actually get all of the results he compiled, we'd have to get Nate to post them or send them to us? (Other than the simple trailing digit frequency that is.)
I wish Nate would just post those data. Seems like a transparent thing to do. ;-)
@JJE: I bet that Nate only tabulated and kept the last digit. And I bet when he was working on it, he was thinking "Benford's Law - 2nd digit!". And I bet now he wishes he'd kept the data in full-form.
@MBW:
Thanks for a very enlightening post!
Would you mind telling us which statistical test you're using here? i.e., I would expect a Chi-square test to be appropriate in this case, but I don't recognize the formula:
p = exp(-3*dev)*(1+3*dev+4.5*dev^2)
Moreover, can you comment: is this the most powerful test in this case? (Is there a more powerful test of the hypothesis that these Fourier coeffs are uniform, iid distributed?)
Thanks!
A couple of articles on the Wall Street Journal:
Some See Numerical Oddity in Pollster's Election Surveys
http://online.wsj.com/article/SB125487188014969039.html
"The odds of that kind of discrepancy happening by chance alone, Mr. Silver wrote, was "millions to one against," a figure he later refined to 83 million to 1 when comparing the results with those of another pollster, Quinnipiac University Polling Institute, which had a more uniform distribution of digits.
This week, Mr. Silver brought in a physicist and commenter on his blog to calculate the probability, which shrank to 5,000 to 1 against, when removing what he said was an unproven assumption that each digit should appear equally often."
So now it's OK? :-)
Polling Controversy Raises Questions of Disclosure
http://blogs.wsj.com/numbersguy/polling-controversy-raises-questions-of-disclosure-805/
@Jonathan- The embarrassing thing is that I don't always know the name of things. I think this is chi-sq with 6 degree of freedom. I calculated it from scratch.
And no, there is no reason whatsoever to think this is the most powerful test. As PaulK pointed out, I haven't even made use of the full power spectrum, only using the sum of the frequency 0.2, 0.3, and 0.4 terms.
@Rennie- I can't believe the WSJ got the description exactly right. I have nothing at all against their news department, it's just that the press never gets technical things just right, especially when they phrase them in their own pithy words.
@Jonathan- One more point. The key idea is not just that the higher FT coefficients should be iid (which I haven't even checked and could add some statistical power) but rather that we know how big they should be (the expected net variance) a priori.
@Jonathan- Yeah, it's chi-sq with 6 degrees of freedom, just re-expressed in terms where the expectation is 1 instead of 3. Remember, to get the p=value you have to integrate the function out to infinity, which may be why the expression looked unfamiliar.
BTW- Since some mathematicians (and others) were wondering about the 'cyclic' property, here's what you can do to check it's ok. Try projecting out a mean-zero triangle wave. This overlaps highly with the lowest frequency sine-wave, so you have to be careful to remove the lowf after that. Since it also has a bit of overlap with the higher frequency sine waves, this has a statistical tendency to reduce them as well. In principle, that requires using a slightly more complicated chi-sq since the coefficients aren't iid. People should check, but on my first run through I got that it increased the power in the higher FT terms in the SV case. I'll recheck when I get time, but if that's right it closes one more (tiny) loophole,
Whoops- There was a typo in the order on that late-night run on Nate's big 2008 set. With the proper order:
625.4 (mean)
1427.64 (var)
304.514149 (filtered var)
0.811518 (dev)
0.560707 (p-value)
No surprise, scrambling random digits doesn't make any difference. It's still got no anomalies.
Hey, if anybody is still following, try checking this. Project out a mean-zero linear function before running the filtered variance program. The idea is to remove any non-periodic linear drift. Try this on the 4 data sets posted on this site. See if you see what I see. It's so peculiar for one of the pollsters I don't even want to say without triple checking. I'll send code later, but want others to write their own in case I somehow made a goof in this new procedure which only shows up on one data set.
@Jonathan: This may give a constructive answer to your question about tests with greater statistical power, which would be ironical since I was looking for ways to make SV look less weird.
@Michael
What do you mean by "project out"?
@MBW:
I am still reading, at least! I'm not sure how to do what you ask, however...
@I, Tony C.
"Project out" means:
1. start with the 10-vector D
2. Take c = D dot V, where V is the normalized 10-vector you're projecting on.
3. Subtract cV from D.
The result always has norm LTE the norm of D.
It's just like removing say the x-component from a 3-vector.
In this case V= (-9,-7,...,+9)/sqrt(330)
unless I've made a mistake!
There could conceivably (though somewhat implausibly) be higher-frequency artifacts in the data that was under 10% or over 90%, since that data has a lower sampling error. So you should do something to get rid of that data. To avoid edge effects, you should do a "weighted phase in" of the data. You could start to phase in the weights at either 5% and 95% or at 10% and 90% - either way, the summed weight profile cancels out in the mod(10) space.
That said, this is simply covering your bases. There's no reason to think that it would make any difference.
@MBW:
I'll do that. How did you choose this vector V?
To be clear about my last comment. What I mean is, a poll result of "12" would count as only 2/10 of a poll, while a poll result of "92" would count as 8/10. Results between 20% and 90%, inclusive, would count fully. It only affects the initial tallies that you use for the analysis.
aak oops. I mean, an 82% result would count 8/10; the 92% result would not count at all.
@MBW:
My added code:
double projV[10] = {-9,-7,-5,-3,-1,1,3,5,7,9};
double dp;
dp=projV[0]*projV[0]; // Start dot product.
for (i=1; i<10; i++) dp+= projV[i]*projV[i]; // Find sumsq.
dp = 1./sqrt(dp); // Inverse of length.
for (i=0; i<10; i++) projV[i] *= dp; // Find normalizing vec.
dp = projV[0]*d[0]; // Start new dot product.
for (i=1; i<10; i++) dp+= projV[i]*d[i]; // Compute dot product.
printf("FYI, c=%f.\n", dp);
for (i=0; i<10; i++) d[i] -= dp*projV[i]; // Subtract projection.
The dot product 'c' computed for SV is
178.246. It changes the D vector to
D[0]=650 D[1]=500 D[2]=521 D[3]=519 D[4]=536 D[5]=589 D[6]=504 D[7]=590 D[8]=607 D[9]=528,
But this produces almost no change in the P value.
I will go look for the other vectors...
Whoops- Another typo in my SUSA calculation. The actual "dev" was 0.5238. Embarrassing, but of course it doesn't change the bottom line, that's still right in the meat of the expected range. (p=0.79)
@Tony C. Long story. The problem with Fourier analyzing data that are not truly periodic is that things can appear in peculiar bins. The genuine frequencies on 0-100 are 0.01, 0.02...., but we're only looking at 0.1, 0.2,.... Mostly, components that are at say 0.23 in the full record show up mostly at at say 0.2 and 0.3, no problem. However, there can in principle (unlikely here) be some very large low-f drifts, i.e at say 0.01, which then show up in all the bins, with power falling off only as the 1/f^2. In actual time-series data that is often dealt with successfully by "detrending", i.e. removing the projection onto a linear trend, the vector V here.
@Tony C.
Great, we got the same vector after taking out the projection. I seem to be getting that this drastically reduces p. I.e. that the projection (as it must) removes some of the norm, but that it actually increases the part of the variance that appears in the particular FT coefficients we're looking at. Again, whenever you change code there can be some goof, but at least we agree on the first step.
@MBW:
The only other vector I saw on the site was the 3000 polls of senate races:
{560,563,619,620,672,673,644,642,652,609}
And that produced a p=0.925430.
The constant was 68.920,
The projected vector was
D[0]=594 D[1]=590 D[2]=638 D[3]=631 D[4]=676 D[5]=669 D[6]=633 D[7]=623 D[8]=625 D[9]=575.
Can you post the other ten-vectors you are testing with? I'll run them...
@Tony C
Good, I just typed our agreed-upon detrended numbers in to the original program. I get p=2.0E-05, and that's without making the correction that detrending removes a bit of the expected variance from the higher FT terms. Making that correction (to lowest order approximation, just by changing the expected variance, not redoing chi-sq for different variances of the different coefficients) gives p=4.3 E-06. Bottom line- failure of the "cyclic" condition, i.e. ultra-low frequencies aliasing up into our range is not the source of the SV anomaly. Trying to correct for that actually makes them look substantially weirder.
@Tony C.
Here's the Quinnipiac data, via Nate
15 d(0) = 546
16 d(1) = 551
17 d(2) = 608
24 d(3) = 593
18 d(4) = 574
19 d(5) = 563
20 d(6) = 551
21 d(7) = 509
22 d(8) = 535
23 d(9) = 505
(apologies for the weird order of line numbers)
Heres the SUSA data from 'steve':
15 d(0) = 401
16 d(1) = 407
17 d(2) = 384
18 d(3) = 375
19 d(4) = 383
20 d(5) = 402
21 d(6) = 396
22 d(7) = 370
23 d(8) = 392
24 d(9) = 385
Thanks!
@MBW:
Quinnipiac run:
c=-61.268634. New D[i]
D[0]=516 D[1]=527 D[2]=591 D[3]=583 D[4]=571 D[5]=566 D[6]=561 D[7]=526 D[8]=559 D[9]=535
p = 0.622782.
SUSA run:
c=-13.046421. New D[i]
D[0]=395 D[1]=402 D[2]=380 D[3]=373 D[4]=382 D[5]=403 D[6]=398 D[7]=374 D[8]=397 D[9]=391
p = 0.848693.
Hope that helps. Still looks like SV is the only outlier...
@TonyC- Yeah, it looks like even giving them all the breaks, detrending, taking out low FT terms, slightly overestimating the variance expected under the null, and then picking and choosing among different procedures, we can't get their p-value over 2E-04. The other 3 sets we have just look routine, no problem.
I hope this didn't screw up your project deadline!
@MBW:
Nah, we are accepted for a conference (CS) but we are making paper mods we promised in response to peer review. I always kind of panic that this stuff must be done RIGHT NOW, but today my co-author is reviewing my mods (and rewriting them completely, as usual), so no real impact. We still have all next week.
Just to pass on an email that just came in (not sure if the source wants to be identified):
"Anyone notice what became of the Strategic Vision website?
Here is the homepage as of april 2008:
http://web.archive.org/web/20080402123023/http://strategicvision.biz/index.html
Here it is now:
http://strategicvision.biz/
(nsfw)
Does that mean they've given up? Or are they just pursuing a new
corporate strategy?"
Wow!
Maybe they were a Russian pr0n outfit all along. ;-)
@MBW
Thanks for the clarification re: the statistical test. I think my comment about independence was off-base, actually -- we don't have the original samples, just the frequency table. (And even if we did, there's no real reason to think about independence here--the whole argument is about frequencies). So we really just want a distributional test that the Fourier coefficients have the expected marginal distribution. In that case, the chi-square test you performed is (I belive) exactly what's called for. (There are non-parametric test of distribution, eg. Kolmogorov-Smirnov, but those will generally be be *less* powerful than the chi-2, and there's no reason to think chi-2 is inappropriate here, where we have a well-sampled multinomial distribution...)
Thanks again for a thought-provoking analysis!
For any stragglers:
It's only fair to point out that a newspaper reporter just checked (and I confirmed) that strategicvision.biz pages other than than main one continue to offer their traditional services, not the new mix offered on the main page. He believes that the main page was hacked, not deliberately changed by the owners.
@MBW:
Maybe hacked; but it still says something about SV: I worked for corporate clients since pre-Internet days, and I cannot imagine any of those real companies allowing an incident remotely this damaging to persist for more than a few hours.
I worked for a dozen companies in the 90's. Employees and customers visit the website pretty frequently; this makes me suspect they have no employees and no customers!
If SV ignores their press and their website; it kind of indicates they are small-time dabblers that have managed to get some "polls" published. Which would explain why they don't respond to Nate's requests as well.
> For any stragglers
On behalf of the stragglers of the world, thank you.
Just wanted to ensure that you knew that your kind thoughts were appreciated. We stragglers are not used to getting much consideration.
But I guess it's time to move on, and find somewhere else where I can straggle.
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店經紀,
酒店打工經紀,
制服酒店工作,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
酒店經紀,
菲
梵,
archival post-script.
0. Mark Grebner was the first to suggest using adjacent-digit analysis, the key starting idea. I either missed that or didn't register it consciously. Mark, I apologize.
1. On the cyclic property assumed for Fourier analysis. Some mathematicians questioned how appropriate that assumption was.I tried patching by subtracting linear trends. That all turns out to be a mistake- the cyclic property holds exactly. Non-cyclic artifacts arise from taking a finite slice out of a record which has component at lower frequencies other than (and especially lower than the lowest of) the discrete FT frequencies on the slice. Treating these as cyclic introduces artificial steps at the seam when the record is wrapped around.In this case we start with a complete record on 0-100. There is no problem treating it as periodic because it does go to zero at the edges, according to your description. So its FT has no artifacts. Now because we consider not one slice but ten slices added up, it turns out that any component for which f is not a multiple of 0.1 (e.g. 0.03) contributes exactly zero to our mod(10) record. That's because on the full 0-100 space all these FT components are orthogonal. So treating our records by FT introduces no artifacts at all.
2. On correlations between digits. For a fixed number of "others", e.g. d=5%, the two digts recorded are of course not independent, since their sum must be -d mod(10). For d even, that increases the net variance to the Poisson value M, instead of 0.9M. For d odd, it reduces Var to 0.8 M. So on the average, it doesn't affect Var, unless the 'others' are clustered on even or odd. If this ever goes to court, that's easy for you to check. Now even if the expected Var is unchanged, if d was clustered close to say 5, the variance of of few digits (2 and 7 for d=6, 3 and 8 for d=4) could be increased while var of the other digit rates decreases. That can reduce the effective number of degrees of freedom slightly. This really should be simulated, but it will have a very small tendency to increase the p-value. Very small.
3. On using full FT data and trying non-parametric analysis. Some suggested using the full FT spectrum, not just tossing the suspect lowest bin. Others suggested a non-parametric analysis. I've tried a first stab at combining these ideas. Even in a non-parametric analysis (no assumptions about the shape of the 'raw' distribution, other than that it's non-negative and that it goes to 0 at the edges), the filtering makes the FT components with f=0.3,0.4,0.5 zero for all practical purposes. Therefore they can be analyzed just as before. Just for them p= 0.067. Non-negativity by itself is not enough to make the filtered f=0.2 virtually zero. However, since the f=0.1 component is very large (and its unfiltered version would have to be larger), total non-negativity does constrain the non-random f=0.2 term to be small. The p=value calculated from this term is about 0.0036. Since these p's were calculated from independent sets of terms, chosen a priori, we multiply them to get p around 0.00024, similar to the previous analysis. We made up for the loss of power from going non-parametric by the gain in using separate FT terms and in using the info in the large f=0.1 term.
pps
The first stab at non-parametric above was off a little. I used the Poisson Var, rather than 0.9 times it. That makes it an extremely conservative overestimate of p. It looks like with the proper best estimate of Var, p=0.002*0.044 or 0.00009. Obviously should be checked.
Post a Comment