One of the things I learned while exploring the statistical proprieties of the Iranian election, the results of which were probably forged, is that human beings are really bad at randomization. Tell a human to come up with a set of random numbers, and they will be surprisingly inept at trying to do so. Most humans, for instance, when asked to flip an imaginary coin and record the results, will succumb to the Gambler's Fallacy and be more likely to record a toss of 'tails' if the last couple of tosses had been heads, or vice versa. This feels right to most of us -- but it isn't. We're actually introducing patterns into what is supposed to be random noise.
Sometimes, as is the case with certain applications of Benford's Law, this characteristic can be used as a fraud-detection mechanism. If, for example, one of your less-trustworthy employees is submitting a series of receipts, and an unusually high number end with the trailing digit '7' ($27, $107, $297, etc.), there is a decent chance that he is falsifying his expenses. The IRS uses techniques like this to detect tax fraud.
Yesterday, I posed several pointed questions to David E. Johnson, the founder of Strategic Vision, LLC, an Atlanta-based PR firm which also occasionally releases political polls. One of the questions, in light of Strategic Vision LLC's repeated failure to disclose even basic details about its polling methodology, is whether the firm is in fact conducting polling at all, or rather, is creating fake but plausible-looking results in order to increase traffic and attention to its core business as a PR and literary firm.
I posed that question largely as a hypothetical yesterday. But today, I pose it much more literally. Certain statistical properties of the results reported by Strategic Vision, LLC suggest, perhaps strongly, the possibility of fraud, although they certainly do not prove it and further investigation will be required.
The specific evidence in question is as follows. I looked at all polling results reported by Strategic Vision LLC since the beginning of 2005; results from 2008 onward are available at their website; other polls were recovered through archive.org. This is a lot of data -- well over 100 polls, each of which asked an average of about 15-20 questions.
For each question, I recorded the trailing digit for each candidate or line item. For instance, if Strategic Vision had Barack Obama beating John McCain 48-43 in a particular state, I'd record a tally in the 8 column and another in the 3 column. Or if they had voters opposing a particular policy 50-45, I'd record a tally in the 0 column (for 50) and another in the 5 column (for 45). I did not include "non-response responses" like "other" or "undecided", nor did I include a tally for third-party candidates in races beteween the two major parties. I also excluded party primaries in which more than two candidates were listed, and approval questions for which more than two choices were provided.
We might expect, as a default, the distribution of these trailing digits to be approximately random. Here, for instance, is what I get if I run the numbers for all Senate and Presidential polls -- more than 3,000 (!) of them -- in my 2008 database:
This data is arguably not perfectly random. There is a little bit of bunching around the middle values like 4 and 5, perhaps because most of the polling comes from swing states or national polls in which a typical figure might be something like Obama 46, McCain 45 (with 9 percent undecided). But it is close to random, and could fairly easily have occurred through chance alone.
By contrast, here's what we get if we run the same tally for the Strategic Vision polls:
This data is not random at all. For instance, the trailing digit was '8' on 676 occasions, almost 60 percent more often than the 431 times that it was '1'. Over a sample of more than 5,000 data points, such an outcome occurring by chance alone would be an incredible fluke -- millions to one against. Bad luck can essentially be ruled out as an explanation.
One of two things seems to have happened, then.
One possibility is that there is some intrinsic, mathematical reason that certain trailing digits are more likely to come up than others. This is certainly possible -- and in fact, it would be somewhat likely if the polling data that we were looking at were homogeneous -- McCain versus Obama polls in Ohio, for instance.
But Strategic Vision's polls cover a wide array of topics: Presidential horse race numbers in any of a dozen or so states, senate and gubernatorial polling, primary polling, approval ratings of various kinds, polling on issues like the war in Iraq, and more abstract questions such as whether voters think that 'experience' or 'change' is the more important quality in a Presidential candidate. No one type of question, in no one state, represents more than a relatively small fraction of the sample. Under those circumstances, I can't think of any reason why the trailing digit wouldn't approach being random -- although there absolutely might be reasons that I haven't thought of.
But this data is not random. It's not close to random. It's not close to close. Which brings up the other possibility: Strategic Vision is cooking the books. And whoever is doing so is doing a pretty sloppy job. They'd seem to have a strong, unconscious preference for numbers ending in '7', for instance, as opposed to those ending in '6'. They tend to go with round numbers that end in '5' or '0' slightly too often. And they much prefer numbers with high trailing digits like 49 and 38 to those with low ones like 51 and 42.
I haven't really seen anyone approach polling data like this before, and I certainly haven't done so myself. So, we cannot rule out the possibility that there is some mathematical rationale for this that I haven't thought of. But it looks really, really bad. There is a substantial possibility -- far from a certainty -- that much of Strategic Vision's polling over the past several years has been forged.
I recognize the gravity of this claim. I've accused pollsters -- deservedly I think in most cases -- of all and sundry types of incompetence and bias. But that is all garden-variety stuff, as compared against the possibility that a prominent polling firm is making up numbers whole cloth.
I would emphasize, however, that at this stage, all of this represents circumstantial evidence. We are discussing a possibility. If we're keeping score, it's a possibility that I would never have thought to look into if Strategic Vision had been more professional about their disclosure standards. And if we're being frank, it's a possibility that might actually be a probability. But it's only that. A possibility. An hypothesis -- as yet unproven.
In the meantime, I have a couple of relatively specific messages for people.
Firstly, if you've been polled by Strategic Vision at any point in the past several years -- this probably means that you're in a state like Georgia, Florida, Ohio, Michigan, Washington, Wisconsin or New Jersey that they tend to "poll" frequently -- now would be a good time to tell me about your experience. So click on the 'contact' button at the top of the page and do so. If you do, please provide as many details about the experience as possible. Please also provide reliable contact information, so that I could verify your identity if need be. I will not publish your name unless you specifically give me permission to do so -- but I do need to be able to confirm that you're not a sock puppet.
To the folks at Strategic Vision, LLC, the opposite holds. I don't care if you contact me by e-mail, by phone, by attorney, or by carrier pigeon. Whatever you tell me, whether the communication is solicited or not, whether you decide to play naughty or nice, it's on the record, and will almost certainly be revealed in full to the readers of this blog.

277 comments
very nice, you are the best on the numbers.
Lets hope that Trademark case against them gets filed soon to...
I always took them as a dummy polling firm created to post pro republican polls. Their polls were always way out of line showing great republican results.
They are rasmussen on steroids, I didn't think anyone took them seriously.
Could we see comparisons with other individual polling firms (i.e. those with publicly disclosed, sound methodologies)? That would make this more convincing. I agree that there is a pattern here and there shouldn't be one, but there are many things about polls that aren't supposed to be random: questions are usually asked about controversial issues, for example, or about close electoral races, polls might not be publicized if the results aren't seen as interesting, etc. This isn't purely a mathematical exercise. Does this data really stand out so far from other companies as to suggest years of fraud?
Go get 'em, Nate!
I agree with Mark--it'd be nice to see some trailing charts for some of the more reputable pollsters. Visual differences seem to leap out more than statistical ones.
Mark, the comparison to other companies isn't what's problematic, rather the comparison to a random distribution.
To put it another way, if every pollster in the business produced numbers similar to Strategic Vision numbers, every pollster would be producing highly suspicious results.
I'd like to echo Mark's comments above. Your results are quite intriguing, but seeing the distribution of trailing digits from other poller firms would lend an additional level of legitimacy to your analysis.
This is why this is the top site on the internet right now. Nobody can bullshit numbers past Nate.
Nate - why did you not run the appropriate hypothesis tests?
Senate races: Chi^2 = 22.83, d=9, for a p-value of 0.0065
Strategic Vision: Chi^2 = 141.69, d=9, for a p-value of 4.79*10^-26
The standard p=0.01 hypothesis test does not hold even for the senate races, but Strategic Vision's numbers smell fishy indeed.
Nate, you have to run the test against other polling companies. If you do that, you instantly answer the most important question: Is this a function of polling in general?
Maybe issue polls have a different type of distribution than Presidential/senate polls.
Nate,
This is all quite interesting. You didn't mention the non-independence of the two trailing digits in a single poll (e.g. if exactly 5% of the populace were consistently undecided, then a ones digit of 1 for Obama would perfectly predict a 4 for McCain). Of course the undecideds/others won't be always the same number, but there's no reason to expect a uniform distribution of ones digits for this group.
This dependence may wreck assumptions based on sample size, and also (maybe?) create curious patterns in the distribution of ones digits.
I'd like to recommend / request that you run a different simulation: for each poll, you _randomly_ select _one_ of the ones digits (e.g. in a poll of 50-45 you would randomly select 0 or 5 with probability 50% for each) to get rid of any non-independence in the two numbers. If you run it for both your control dataset and for Strategic Visions, the inference would probably be the same, but the case would be a little stronger.
Keep up the good work - I love the analyses you all do.
This is some news. I haven't seen anything about this anywhere else but Political Wire.
Really, really unsettling stuff. You do great work here. Thanks.
I agree with Mark and others; try this will pollsters who have a comparable range of questions.
While I agree this looks suspicious, particularly in combination with their lack of disclosure, it's not quite as suspicious as a preference for, say, odd numbers. If you put the 0 at the bottom instead of the top, we see a distribution that basically clusters around 8 and falls off pretty smoothly around that.
What troubles me more than a fake pollster are media outlets who used these polls (and others?) without any kind of validation or supporting data to back up what they publish. What does that say about the rest of those media outlets published content?
Every once in a while I'll joke that "7 is the most random number" during a relevant conversation, and it's not uncommon for people to accept that assertion unchallenged. Sigh.
I agree that it would be nice to see the same graph presented for other pollsters. Clearly there's some non-random noise (i.e. variation that's not damning, but also doesn't fall into a random distribution pattern) introduced in the ones digit process, as evidenced in the first graph. In theory, I could be reasonably convinced that a single pollster will have wider variation / non-random noise than many pollsters. And, it seems almost too far-fetched for SV to *actually* be making up data. So I think a great next step would be to see this chart for another single (but more reputable) pollster - that would be informative.
Why not use pearson's chi-square test? The first set of Pres poll numbers is not random w/ a confidence p<0.01 and the second set has a chi statistic of 100 which you can very, very confidently say is not random.
By the "trailing digit" are we talking about two digit numbers? If that's the case, the digit will probably not be random. I'm not a statistician, but my understanding is that the digits should most likely be distributed according to Benford's law - http://www.mathpages.com/HOME/kmath302/kmath302.htm and http://www.journalofaccountancy.com/Issues/1999/May/nigrini.htm
In both of these cases, your results are the opposite of what one would expect from a Benford-type distribution.
As far as a natural explanation, I think people are more likely to hire pollsters (and release the results) in cases where they expect the results to be close. If you know you're way ahead or behind, you're less likely to spring for a poll, perhaps? Unless the actual numbers in their sample are uniformly distributed from 0 to 100, you shouldn't expect a random 2nd-digit distribution. You should look at the distribution of the entire number, histogram it, and calculate what the expected 2nd-digit distribution is from that data. Then compare that to the actual 2nd-digit distribution and see if they differ with any statistical significance.
Great work! But as Allan says above, you need to correct for the non-independence.
For significance, you could do Allan's correction and *then* run a chi-squared test with 9 degrees of freedom, as carlosscheidegger suggested. That's tying at least one hand behind your back, though, since at least half the extra 7's, for instance, turn into extra 3's.
You might look just at two alternative polls where the numbers add to 100. Do Allan's correction, and then a chi-squared test with 5 degrees of freedom, since then the outcomes are the sets {0}, {1,9}, {2,8}, {3,7}, {4,6}, and {5}.
You should also subject IBD to the same analysis... or possibly compare their results to the results of other polls asking the exact same question and test whether there's non-random deviations.
Apologies; I really need to read the whole post rather than just looking at the results. I do think that my proposed method of testing this distribution is probably the right one, though. I definitely wouldn't expect a random distribution.
Hey Nate, you're way too smart and knowledgable to be contributing to the conflation of "random" and random with a uniform distribution. The actual meaning of the word "random" is a very useful concept to have as part of the public vocabulary; borrowing the word to mean something else degrades the distinction in people's minds.
Hey Nate, regarding bad polls, how about some analysis on the recent NYT/CBS poll that had Obama at an approval rating of 56%?
The party split in the sample has Republicans at 22%, Democrats at 37%, and independents at 33%. That would make sense — if Barack Obama had won the presidential election by 20 points last November.
Since Obama won by seven points, with strong support from independents and some crossover Republicans, the notion of a 15-point gap in party affiliation was ludicrous then, and is even more ludicrous now.
Their July sample had a 14-point gap, which means the pollsters must feel that Democrats have gained ground over the last two months.
Isn't this crappy too???
Wow Nate turns investigative journalist!! Very interesting analysis. It would be interesting to get a bit of background as to why Nate began this investigation. (I suppose their reluctance to disclose details of polls). This will be an interesting one to watch!
Walker-
Go to pollster.com and look at the Party ID numbers, until recently double digit dem party ID advantage was the nrom (for real pollsters, of course Rasmussen has different numbers).
It's compelling, but it'd be more convincing if it were compared against other companies like Strategic Vision and to polls on the same topics and populations. Any unintentional effects to skew these data might cancel out when looked at across a diverse set of polls and pollsters. If you match on key characteristics of the polls and pollster it is a more direct comparison.
It's fishy because polls tend to hover near the 50% mark, and there's clearly a relationship between the yes % and the no % on any given poll (in particular, at the very least, they must sum to <= 100).
Why not restrict to two-position polls and look at the distribution of the gap? So, for example 50-43 becomes a 7, 63-30 is a 33, etc. Then compare the distribution from Strategic Vision against that of other pollsters using, e.g., a two-sample Kolmogorov-Smirnov test.
I'd like to see an analysis as to how Rasmussen can justify showing such high negatives on not only Obama, but Democrats in general. And up until a few weeks ago, +10 percentage points above ALL other pollsters on the negative metrics. Only until the last few weeks have the rest of the pollster numbers on the negative side came within 10 points of Rasmussen.
Is Rasmussen purposely trying to drive up averages? If you watch his numbers closely, particularly the negative ones regarding Dems and the President, they go in opposite directions of other polls coming out that day.
Could he be fudging his numbers?
Nah, of course not right? I mean, why would a guy like him want to be associated with fraud reporting outfits like Fox who's only intention is to drive the news story against this administration.
He better watch out. He's going to become another joke like Fox News has.
@Nate: some good suggestions here in the comments. I suggest that you offer to provide the "independent" samples to anyone here who wants to play with the numbers.
And I suggest that you do this for the data in each of your charts. That would satisfy those also who want a "control."
Of course, you may need to provide multiple random samples of "independent" numbers from each dataset.
If someone writes to you, you could send them the numbers fairly easily, I would guess. I imagine Gelman or one of his students might like to play with this too.
Rory- if Strategic Vision is really putting out fake numbers that are closely tracking pollster averages, wouldn't that distort a Smirnov test?
I would like to see how the election results of those polls were distributed, if available (of course we don't have that for issues polls, but maybe restrict it to elections and then compare it with the real results)
Nate can get pretty flamboyant and cocky at times when he's "calling people out."
It's already bitten him in the ass a couple of times before, but not too badly. I'm waiting for the day he does it and gets thoroughly embarrassed. By "thoroughly," I mean career-ending embarrassment. This probably isn't such an occasion and his argument isn't without merit, but he's shown a willingness to overstep his bounds, so it's just a matter of time before his day of humiliation comes.
You can bet I'll be watching with glee.
I'm not entirely convinced. As others have said, the Strategic Vision data is very certainly not drawn from a uniform distribution, but neither is is the other dataset, at least according to Pearson's chi-square test (or a likelihood ratio test, which is also appropriate).
I think you might have the wrong null hypothesis; i.e. I think it just may not be fair to expect the trailing digit to be uniformly distributed. Perhaps this firm polls on questions that, for whatever reason, tend to have true values of 47,48,or 49. Perhaps the inter-question dependence mentioned above has something to do with it.
To put it another way--we know that the poll numbers themselves are not uniformly distributed over [0,100]. We probably wouldn't expect them to be uniformly distributed if we took them mod 50 either. Is it fair to expect them to be uniformly distributed mod 10? I'm not sure. How about mod 2--i.e. should we expect the poll numbers to be equally likely to be even as odd? Pearson's chi-square test gives us no reason to reject this last hypothesis on either dataset, with high p-values (.936 for the strategic vision, .613 for the 08 pres/senate data).
I'm not necessarily defending this pollster; Nate has given many good reasons to be suspicious of them. I'm just not sure this is the right test to be doing to condemn them.
I'm curious if this could potentially be the result of a buggy rounding algorithm?
John - The mod2 test would probably hold the best for most expected distributions of poll numbers since the effect of a tendency towards high or low numbers or numbers towards the middle of the range on the even/odd distribution should be small. It would also pick out the most commonly repeated theory about human error in generating "random" numbers -- that people avoid zeros and overuse sevens.
Mule, are you the same one that during the election period would threaten violence at anyone who disagreed with you?
I'm guessing you must have had a police visit because you've became much more civil. Either that, or you've began taking medication again.
Just curious, which was it? A police visit or medications?
Rasmussen is notoriously biased in favor of the GOP position. I believe it was something like 2 points or so in general election polling that their house effect leaned in that direction. (Again, opinion polling is less scientific because there's no way to prove definitively the level of support for X or Y program.)
However, Ras does explain their work. I believe that in their cross-tabs, they tend to have relatively higher R party ID than normal. And since it's also known that they use fairly aggressive LV models for everything (which favors the Republican position), there you go.
I've already stated my feelings on Rasmussen as a polling firm, so I'm not going to go there again.
That said, there is a lot of circumstantial evidence against SV, LLC. It's not just that their numbers show some semblance of a pattern - it's also the fact that they obstinately refuse to release their methodology unless forced to, the fact that they don't really have an office, the fact that they're primarily a public relations firm, so on and so forth*.
And I think a lot of commenters are missing the point - the numbers shouldn't dramatically cluster period given Bradford's Law. That's the kicker - there shouldn't really be that strong of a preference, period, and with the set of all polling data, there really isn't.
tl;dr - Yeah, what the nerdy guy who (as Wonkette so charmingly put it) taught numbers how to fuck said.
*You'll notice that I conspicuously left out David Johnson's personal politics. Though his Twitter is a glorious bastion of teabaggery, that really means nothing.
...and I just...realized that John Blatz is even more right given the cursory glance.
Also, I can't spell. That'll teach me to comment when I just wake up. Replace "Bradford" with "Benford." Feel free to mock me ruthlessly.
Anyway. Since American poll questions tend to fairly evenly split in most cases, it becomes a narrow-band distribution and not necessarily a wide-band distribution (where Benford's Law most often applies), distributions can break down.
On the other hand...hm.
If I'm not mistaken, the typical test for "human intervention" as a cause of nonrandom polling or election counting looks at the first digit past the decimal point.
If that were done in this case -- don't round 46.9 and 53.1 to 47 and 53, but use 9 and 1 as the final digits -- wouldn't it largely eliminate the effect of the closeness of the election on the reported distributions of last digits?
I downloaded the 2008 House polling data set from electoral-vote.com - http://www.electoral-vote.com/evp2009/House/house_polls.html - and checked out the 2nd digit distribution (throwing out any 100% or 0% polls). They appear to be randomly distributed even though the distribution of numbers rather bimodal, for what that's worth. Obviously, the average congressional race is not particularly close, close races are polled more than others, etc... not really a good stand-in for a random set of polls.
@Justin - the third digit should be expected to follow a nearly random distribution regardless of the underlying distribution the number's pulled from; however, polling firms don't report the third digit because they underestimate a 3% (or so) margin of error. It would overstate the certainty of their result (not that they don't overstate it anyway).
Hey Nate:
Yesterday, on the previous thread, I had posted this opinion piece:
seattlepi.com/opinion/206360_gov04.html
Today I post a little more insight on this subject...
> > "She is the future of the Republican Party."
Take a wild guess who the "she" is and who wrote that.
Silly as it Seems . . .
Dog_Knows
Nice work Zach. I think it was rather a stretch for us to assume the digit shouldn't approach the uniform distribution, but it's great that you verified this.
On another note: for those who point out that Presidential polls may not follow Benford's Law, that view has some merit. However, Nate has aggregated several different types of polls, and in a combination of several different probability distributions, Benford's Law should reassert itself.
Just curious, which was it? A police visit or medications?
Neither. My only run-ins with law enforcement in my entire life have been related to a couple of minor traffic violations and required only a cursory exchange with whatever state highway patrolman was on duty.
And I'm not on nor have I ever been on medication, save two or three occasions I was recovering from illness or to alleviate pain after getting my wisdom teeth removed. The only "medication" I'm on now might be a couple of beers after work on Friday with the very occasional mixed drink or shot of whiskey/tequila thrown in for good measure.
The difference between now and then is that I treated this site like a mud pit before and didn't think the blog authors or any of the commenters were serious. While I still think Nate and his cohorts still put out some garbage - and they still put plenty of spin in the good ones - and the comments section is still filled with too much name-colling and devolves into far too many pissing contests, there are enough people here - between authors and commenters - to challenge my intellectual acumen that it is worth my while to stay tuned to people who view the world from a slightly different ideological angle and see what they have to say.
@Neef - My post above was a little ambiguous. The *2nd digits* are uniformly distributed (more accurately, you can't say with any confidence that they aren't). The 2-digit poll numbers are definitely not. I don't have the 2nd digit benford distribution handy and don't want to do the Smirnov test or whatever would be appropriate, but I suspect you can't reject that they're Benford-distributed either.
Error bars! Error bars!
My kingdom for some error bars!
@Zach: I think in the a large majority of cases, pollsters actually report figures to one digit past the decimal point. However, they may also round them in summaries, and in over-time comparisons. And usually newscasters and reporters also round the numbers, as do polling aggregators. I'd be interested in whether this pollster always rounds to the nearest whole percent. If the more detailed numbers are available, those should be examined.
Strategic Vision has always been viewed suspiciously for ages by many in the polling community and elsewhere, regardless of ideology. This isn't a new revelaton. Its polls during election season were often way off the mark, and it was obvious at the time.
But for those trying to similarly impugn Rasmussen, you should remember that Nate is a big fan of Rasmussen because they employ sound methodology and have a history of above-average accuracy.
Empirically, Rasmussen has been on the leading edge of identifying the decline in Obama ratings recently despite the caterwaulling from those who don't like the numbers.
Honest polls are intended to reflect public sentiment, not move it. They can do little to form it. A firm that doesn't accurately reflect a true picture will not be in business for long, partisan leanings or not. Otherwise, they are useless as a forecasting tool. That is Strategic Vision's plight, and why they are only a minor player.
Nate would get much bigger mileage reputation-wise by going after the obvious methodology disconnects at places like CBS, which have broader exposure in the mass media.
We already know from long experience that election forecasting is only accurate from polls that use registered or likely voters and an accurate mix of participants. Should those who deliberately skew the sample be immune from similar dissection just because it might mean taking on some sacred cows?
Nate,
I appreciate your passion for holding these "polling firms" accountable.
I do think you need to run the same test on the other pollsters though, even if they responded to the recent questionnaire regarding polling methodology. I think you should run it on Zogby for sure, as well as some that you find generally lean a bit more to the left - this of course in the interest of seeming non-partisan but instead interested in the integrity of polling information.
Mule Rider,
I can't help but notice - with your insistence upon forgetting the subject and instead typing meaningless anti-Nate gibberish -that you're kind of a whiny douche.
How did you get to be this way?
I've never commented here before, but looking through this data, I think it's possible one thing was overlooked - the number of undecideds. It's possible this could have some effect.
For example, let's say that a lot of the results are close (maybe polling companies prefer to poll these areas than others.) If we assume that there are usually around 5 percent undecideds, you're going to get a lot more 48-47 or 49-46 results than you are going to get 51-44 results. This might skew the numbers a bit.
I would be much more concerned if their data wasn't so evenly distributed (peaks at 8, drops at 1, but is fairly continuous in-between.) That seems to suggest more of a "there's an overlooked trend in the last digit" than anything.
P ~= 0.3 for House polling and the 2nd-digit Benford distribution by a KS comparison test. So, in conclusion, 2nd digits in 2008 House polling results on Electoral-Vote.com could be uniformly distributed or distributed according to Benford's law.
Given that the average Presidential and Senate poll is probably closer to 50/50 than your average House poll, perhaps there's something about a distribution of results centered on 50 that results in a non-uniform 2nd-digit distribution.
@Christopher:
That is completely true if you're seeing 5% undecided. However there could also be 8% undecided with results that should come out like 51-41% or something of that sort. The undecided number should also be random leading to a continuation of random trailing digits.
I think people are over-stating the non-random hypothesis of the last digit. As Nate stated maybe one would expect some bunching in the middle because a lot of polls are of close races. But SV apparently does polling on a wide variety of issues not just political races where the distribution would not necessarily be even. Add into this distribution the essential +/- 3% randomness from sampling then I think it should definitely close to random. Benford's Law will not apply in this case and even so is stated as the first digit of data not being random. If this was examined I'm pretty sure we would see 4 being the most common digit in the data.
...that you're kind of a whiny douche.
How did you get to be this way?
Just the natural reaction to all of the whiny douches that infect this site like a bunch of parasites.
@Pinkybum
It's quite probable that the *results* of SV's polling isn't uniform, but I see no credible cause for the *last digit* to deviate so wildly.
You'll note that in Nate's original post AND in Zach's subsequent work, the case for a uniformly-distributed last digit holds up rather well.
The difference between now and then is that I treated this site like a mud pit before and didn't think the blog authors or any of the commenters were serious. ... the comments section is still filled with too much name-colling and devolves into far too many pissing contests
:rolleyes: Your first post in this thread was "I'm waiting for the day [Nate] does it and gets thoroughly embarrassed. By "thoroughly," I mean career-ending embarrassment." You are an obvious troll, plain and simple - your first post is nearly always an irrelevant insult.
@Zach: "Given that the average Presidential and Senate poll is probably closer to 50/50 than your average House poll, perhaps there's something about a distribution of results centered on 50 that results in a non-uniform 2nd-digit distribution."
If that were true, we could expect 50/50 to be very probable, 51/49 somewhat less likely, 52/48 a bit less, etc. That would give us a spike at "0", but the 9-1 and 8-2 pairs would still be equiprobable. What we're seeing is that 9 occurs much more often than 1, 8 more often than 2, etc.
It's possible that undecideds do indeed change the picture. In the SV data the most common digit pair is 8-7, such as one would see in a 48/47 poll result. This would jibe with close polling results involving 2-4% undecided. However, this would also give us 9-6, 0-5 and 1-4 as fairly matched pairs. The 1-4 pair, in particular, seems wildly out of sync even in this best-case scenario.
You are an obvious troll, plain and simple - your first post is nearly always an irrelevant insult.
In the broadest definition of "troll," you can make a pretty good case that pretty much everybody is one.
Either way, nobody's holding a gun to your head making you read my posts. If they're so irrelevant and you get nothing out of reading them, then skip over them...or at least don't waste keystrokes on a response.
I mean, if I'm really a "troll," you're not supposed to feed me. Somebody should be charged with regularly posting that (and other) rule(s) for dealing with "us." Kinda like the rules for dealing with Mogwai/Gremlins.
Nice to see a geek infestation for a change.
If you missed the conversation from yesterday, a little investigating was done where we determined that most of the 'offices' of Strategic Visions, LLC appear to be mailbox storefronts.
Also, the legal division of Strategic Visions, INC. has been made aware of the issue. It will be interesting to see if anything comes from that.
Finally, Mule Rider said,
the comments section is still filled with too much name-colling and devolves into far too many pissing contests, there are enough people here - between authors and commenters - to challenge my intellectual acumen that it is worth my while to stay tuned to people who view the world from a slightly different ideological angle and see what they have to say.
Not sure but I think we just got a troll compliment. Wonders never cease.
Besides, I'm every bit as much of a legitimate critic of Nate and his work as he is of the unscientific bias and subterfuge of others.
Nothing "irrelevant" about that. And there's nothing wrong with rooting for him to be embarrassed by overstepping his bounds, because he also roots for others to be embarrassed or outed as frauds and charlatans and he has overstepped his bounds in the past.
I called him out for his posting of a "climate challenge" and subsequent updates where he displayed machismo because no one dared take him on. With a little research (on the Alexa traffic website) and some logical deductions, it was quite clear that the eligible participants to his "challenge" who were likely "tuned in" to his website during the short window he offered it was quite likely to be zero, thus negating the need for any of the big talk he was engaging in and making him look like an utter fool.
See for yourself.
http://www.fivethirtyeight.com/2009/07/climate-challenge-update.html
Wow, it must be Friday, Rudy's posting bullshit.
Raspublican inexlicably changed polling methods only weeks into Obama's presidency. Why? Because by design, his polling on Obama's job approval would be less favorable compared to every other pollster, with the obvious exception of Zogby, not to mention FReeptards and FOX nation nuts would link to his site. It's no coincidence that both Drudge and Hannity regard Raspublican and Zogby their favorite pollsters. For example, when Obama was 65% in half a dozen reputable pollsters, those two propagandists had Obama in the low to mid fifties. Sure as shit, by February, they were invited on FOX to declare Obama's honeymoon period was over and he was sinking. Dick Morris in particular was pushing the Raspublican/Hannity meme that Jim Tedisco would win NY-20 because Obama was unpopular.
Bottom line: the moment Raspublican changed his methods from what used under Bush and started up with the push polls favorable to The Quitter, and unfavorable to Obama's policies, he lost whatever credibility he had left. The only people who cite Raspublican anymore are FReeptards.
@Neef - I was arguing for random variation in the last digit. I guess I wasn't clear enough.
One might tease out the correlation between high and low numbers by calculating the distribution of just the HIGHER percentage, and then calculating the distribution of the LOWER percentage. Do the same thing for the aggregate non-SV data too.
I agree that it would be useful to see the distribution of results for other individual polling firms, so we can see just how wildly the distribution fluctuates: just because the aggregate is flat doesn't mean that a non-flat distribution is unusual or suspicious.
The hypothesis that SV is publishing fraudulent data is supported, at least in a circumstantial way, by the fact that, however low the probability of the distribution of their final digit, that probability is lowered by the fact that they won't disclose any details about their methodology.
A. Probability of not disclosing methodology (given a professional outfit): Low
B. Probability of getting this distribution of final digits: Way Low
A * B = lower than either A or B
When in doubt, test. Here are the results of a random simulation of 10000 histograms such as those above. Assumption 1: Candidate A gets 50% of the decided vote, with a normal distribution sigma=5% around that. Assumption 2: Undecideds are 6%+-1% of the total vote. Assumption 3: Candidate B gets all of the decided vote that "A" does not get. Assumption 4: there are 2600 polls in each simulation, resulting in 5200 numbers for a candidate. Assumption 5: results are presented as percentages rounded to the nearest integer. Result 1: there is a departure from uniformity: 7 is 5% more likely than 2 as the trailing digit (it really is the most random digit). Result 2: a difference of 113 or greater (ie, the first histogram, 5 vs 0) between largest number/digit and smallest number/digit happens 5% of the time. Result 3: 99% of results have that separation at 132 or less, where the SV numbers come in at 245, almost twice that.
Note that 5% spread around 50% is very conservative. Poll noise can do that, let alone real differences between candidates. So that is small even for close races. As that gets larger, the distribution gets very uniform. Larger undecideds or uncounted, minor 3rd parties will also push toward uniformity.
Fifi…
It is truly amazing that you have so little shame that you would keep coming back here again and again trying to justify your tantrums. (Not to mention throwing some new ones.)
I can only guess that you don’t have any fallback resources—such as friends—so you hang out where you’re not particularly wanted.
Pitiful.
Your “slightly different ideological angle”—you’re referring to how the world looks from your high chair?
track: One of Sarah P's kids, also the class she kept failing in hs.
The Strategic visions polling information does look odd, for sure. But, as an FYI on Benford's Law, it may not apply to polling data. The reasons are twofold:
1) the responses are not independent in the sense that when a person chooses one option the other options are not available. Therefore, whatever percentage ends up choosing Obama (for instance) necessarily sets a maximum for what might be picked for McCain. So, the distribution for polls really should be based on randomly selecting only one value of the several options from each poll.
2) The range of values for polls are necessarily constrained and rarely in the range of values that fits Benford's law. Benford's law only applies to distributions of data that are free to vary over wide ranges of values, at least over a couple of orders of magnitudes (tens through hundreds, etc.). I would never expect the hourly temperatures of my living room over the course of a year to conform to Benford's law. But, I the lengths of rivers definitely conforms. Polling numbers in many of the cases you are examining rarely dip into ranges that are amenable to such analysis.
With that said, I'm sure you are onto something, once you get nailed down the right method for comparison. It will likely show a similar pattern to what you are examining, just you'll be able to make assertions with more confidence.
Ah, I see we've been privileged to another rant from PorridgeGun, a paragon of intellect and wisdom.
His posts remind me of the dulcet sounds you would here....
....wait, who am I kidding? What we get from PG is about the same as the moaning and screeching you would hear passing by a mental ward.
wv: facmismo - displaying manliness with a fax machine?
Has anyone seen a Fifi post on this board? I'm afraid Pragmatus has mistakenly addressed someone who isn't commenting in this section. Anyway, I'm just trying to help him find this person, so if you come across him/her, let Prag know...
Wow. How long did it take to pull all those polls together?
The other question I'd like to have answered, is are these the only fake pollsters out there, or just the only ones too lazy to make up a methodology and fake demographics?
The way this worked, assuming it is a scam, reminds me of Bernie Madoff fabricating quarterly earnings statements and detailed investor reports without ever actually engaging in a trade. I have to wonder if a similar non-random pattern could have been found by looking at those numbers, and if so, whether or not other frauds could be uncovered using the same methodology you've employed here.
Congratulations on making some news rather than just analyzing it. Today's post was like a real-life version of Mathnet, but for grown-ups!
Interesting, looks like SV LLC is going the legal route rather than refuting the facts:
http://blogs.tampabay.com/buzz/2009/09/nate-silver-pollster-may-be-fraud.html
I would recommend an edit to the post: remind readers that there are TWO different Strategic Vision pollsters (as you did in the previous post), and that you are interested in LLC, not Inc. Both to avoid accidentally turning people off from a good pollster, and to avoid getting responses from people who were interviewed by Inc. Of course, I suspect most people aren't going to know which of the two contacted them, so you're likely to get a bunch of confusion either way.
From the link Jonathan provided:
http://blogs.tampabay.com/buzz/2009/09/nate-silver-pollster-may-be-fraud.html
Secondly in regards to Nate Silver's statements, we categorically deny them and will refute them. We have a call into our attorney on this and fully intend to take action that will vindicate us. I wish Nate had contacted me directly yesterday when he began this tirade, I could have answered his questions fully to his
satisfaction prior to damage being done to our reputation. Now that he has made these accusations and posted them online, I must and will defend our company's reputation through all legal avenues available. The reason that we are going the legal route is he has attempted to do severe damage to our reputation and what is he going to do when we disprove him just say I am sorry. That isn't enough at this point.
Maybe I will get to see that utter humiliation I've been waiting for...
Fifi.
Also known as “Gut you like a fish!!!”
Quote of the day—
“It ain’t what they call you, it’s what you answer to.”
—W. C. Fields
Nate didn't run hypothesis tests because he didn't feel comfortable assuming the null hypothesis (all last digits equally likely). Duh. Thus, they might do more harm than good when they are interpreted.
Fifi…
“Maybe I will get to see that utter humiliation I've been waiting for...”
No chance. If you don’t realize how low you have dragged yourself, there is just no getting through to you.
Are you guys just upset that Obama's fallen to 50% in today's gallup tracking poll? It's unusual for Gallup to show a lower approval rating than Rasmussen (I believe that's the second time this year).
Gallup will have this headline soon enough: "Obama falls below 50%, in as quick a time period as any president other than Bill Clinton since Eisenhower."
From the Tampa Bay.com article, David E. Johnson said,
in regards to Nate Silver's statements, we categorically deny them and will refute them. We have a call into our attorney on this and fully intend to take action that will vindicate us. I wish Nate had contacted me directly yesterday when he began this tirade, I could have answered his questions fully to his
satisfaction prior to damage being done to our reputation. Now that he has made these accusations and posted them online, I must and will defend our company's reputation through all legal avenues available. The reason that we are going the legal route is he has attempted to do severe damage to our reputation and what is he going to do when we disprove him just say I am sorry. That isn't enough at this point.
Aww Yeeeah, Bring it on, bring it on! What do you suppose the probability of winning a numbers argument with the Silver Professor is? GEEK FIGHT! This is way better than wrestling.
Nicely done, Phoebe.
GEEK FIGHT indeed. I happen to be a scientist, so when someone goes immediately to the legal route rather than just responding to another's criticism, that is a big red flag for me. My money's on Nate, forum trolls notwithstanding...
Nate has the data. This is numbers forensics. You cannot massage numbers under the pretense of impartiality and expect to get away with it. I know plenty of academics who will back you up, Professor. Where is the legal fund?
@Neef - given that most polls, and I assume most SV LLC polls, have large undecided numbers, I don't think the 0/0, 1/9, 2/8, etc expectation would hold. Unless the undecided fraction has an equal probability of being 0-9%, though, one number will be dependent on the other in a 2-choice poll.
Honestly, if they are actually gathering real polling data, the legal route is the right response to take. It doesn't make sense to respond to a criticism like this by publicly releasing all of your polling data when you can just show it to potential clients and tell them that Nate's full of shit... if that's actually the case. And given that Nate's comments could genuinely hurt their business and that Nate profits from the same business, there's a solid legal case to be made if Nate's not in the right here.
@Zach: I agree the specific 0/0, 1/9, 2/8, etc expectation is only valid for certain values of undecideds (0%, 10%, 20%, etc.), but I tried to address that (perhaps poorly) in the last paragraph of my post.
I think we agree that one of the poll answers is dependent on the other. As a corollary we'd expect the second digits to occur in pairs, which should reinforce uniformity, not undermine it.
Zach nailed it.
Nate would have been better off being a bit more inconspicuous with his criticism. He may very well wind up right in the end, but as I said above, he could also have bitten off more than he could chew and now leave himself exposed to a humiliating embarrassment.
I don't really have a dog in this fight, but I did do the numbers for the first half of 2008 on Survey USA's approval polling. Here is the trailing digit distribution:
0 - 64
1 - 64
2 - 60
3 - 54
4 - 48
5 - 70
6 - 57
7 - 51
8 - 57
9 - 63
Should one conclude from this that Survey USA is forging their numbers, and that they have an aversion to the number 4 while preferring the number 5 and numbers close to zero?
This sample size is of course smaller than the Strategic Vision data above, but I concur with others who have requested comparable data from at least one reputable pollster. The Survey USA data is readily available.
@Zach
Nate does make a living from polling. I think he has every right to call out someone who is, in effect, engaging in insider trading or trying to unduly influence the marketplace with fraudulent data. Don't you?
If SV LCC is cooking the books (and being taken seriously), then they are having an effect on the credibility of 538. Nate's results are only as good as the data he crunches. If they're muddying the waters with tainted numbers, then they have it coming.
GOP got smacked down in Massachusetts.
Regarding threats from Strategic Advantage to Nate, when they say “We have a call into our attorney on this…” it doesn’t mean the same thing as we are going to take legal action against you.
@steve
By my count, you've got a total of 588 data points for SurveyUSA, compared to 4,935 for Nate's analysis of SV. When your sample size is 1/8th as big, it certainly seems possible that you would see greater variance between the digits like you did. But I'm not a stats geek.
@steve:"Should one conclude from this that Survey USA is forging their numbers, and that they have an aversion to the number 4 while preferring the number 5 and numbers close to zero?"
The data you gave does not, as far as I can see, reject the hypothesis of uniformity. A distribution like that could very easily be the result of chance. There's something like a 64% chance that you could get that distribution randomly.
The SV distribution probability is in the trillion trillionths of a percent.
Steve-
The surveyUSA distribution looks much closer to expected than the Strategic Vision distribution.
Also, if you were faking numbers a very good maneuver would be to sue to try to keep up the ruse as long as possible, just sayin' What is you ronly other choice? Ignore it and hope it goes away? Unlikely to go away in this case...
Oh, PG, you're just jealous that Rasmussen was so far ahead of the pack and was way more accurate on the way down. He is rarely wrong on the big sample size stuff. Outside of consensus does not constitute wrong.
That's why Nate respects him and is constantly cautioning the leftist loons to stop drinking self-reinforcing polling Kool-Aid. He did so on the climate bill and he's doing it now on health care.
So, now you're going to criticize Rasmussen methodology just because it adjusts with the times in the quest for accuracy? Recalibration is a regular event with all of them, and it's the accuracy of such that separates the good from the bad.
Based on results, Rasmussen was right to make the honeymoon over call early. He properly caught the negative sentiment to the porkulus bill and the big left turn on the Obama agenda.
As usual, just more insubstantive spew from you. Nate wouldn't say any of the slanderous things you're saying.
It will also be interesting to see if they can get past summary judgment in a case against Nate, he has posted facts and interpreted them, after the most well known polling org sanctioned them...
I think Nate's case is pretty good...but you never know.
@Steve:
I'm not disagreeing with you (or anyone else) who would like similar stats done on other pollsters - but sample size aside, that is *one* poll, repeated over a long time frame, which introduces all kinds of internal dependencies.
The most inoccuous explanation of Nate's result would be something similar in SV's polling - if that data includes a lot of redundant polls done again and again, with a fixed "true" value of X8%, you'd see a big cluster between X6% and (X+1)0%.
@Nate:
Ooh, this is fun.
Another sign of made-up numbers - including the polls with exactly three categories, how often do the results add up to 99 or 101, due to rounding? *That* value should be pretty damn fixed (assuming reasonable sample sizes.)
Of course, it might also be acceptable practice to fudge those numbers? Pity.
7 is 5% more likely than 2 as the trailing digit (it really is the most random digit)
Most random: I just got the joke.
As several people have mentioned, you could easily get a deviation from a uniform distribution in the last digit, depending on the distribution of the outcome of the polls - e.g. if Strategic Vision has a bias in favor of polling very tight races that would expect to result in an outcome like 47-48.
One way to correct for this is to look at the histogram of the poll results, fit that data with some sufficiently smooth parametric or nonparametric distribution, and then check whether the last digits are still significantly different from what you would expect from this background distribution.
You may also want to check if the overabundance of 0's and 5's could be due to inconsistent rounding of the published results. i.e. some polls may simply be rounded to a resolution of 5% instead of 1%. Sloppy reporting, but not necessarily fraudulent. In that case, you would expect both sides of the poll to be rounded to the same resolution though.
I'd like to restate my response to steve, because I think his data is important. I plugged his numbers into Excel's CHITEST function, and I got a p value of 0.64. This implies to me that the SUSA polling is *quite* uniform as regards the last digit, but I will wait for some of our more analytical commenters to weight in.
If anything, what steve posted makes SV's numbers look worse.
Not only is it the “most random” but also the “most unique”.
I do think some of these posts raise a good methodology -- Nate, perhaps run this analysis on a few other pollsters you assume are "good" that share similar types of wide-ranging questions. Your hypothesis should be that "SV LLC differs significantly from other pollsters" rather than from an assumed distribution of what those numbers should be. There are a lot of potential alternative hypotheses for those numbers that could be put to rest if you do a meta-analysis.
I'm not convinced that the trailing digit is the correct test here.
When you're looking at election results, you're dealing with three, four, and five digit numbers, so the last digit truly is statistical noise, and ought to behave as such.
But in polling, you're looking at two digit numbers, so as Nate noted, the distribution WON'T be random. The last digit will be biased toward the most common results.
Until we know what the most common results are, we have no way of testing whether SV's results are questionable.
It's plausible, for example, that the SV results have a strong bias toward the range 41-50. (Which would make sense if they're asking a lot of yes/no questions.)
Take Nate's SV graph and move the "0" to the bottom row. Assume for argument's sake that you're looking at the distribution of 41-50 in SV polls, not trailing digits.
You end up with a pretty close approximation of a bell curve, with lots of results ending up in the 47-49 range. (The low number of 6's would still be an outlier, but not a huge one when you look at the curve.)
In other words, Nate assumes we're looking for an even distribution in final digits, but he hasn't established that accurate polling would have that even distribution. If accurate polling would result in a curve distribution, SV's numbers look plausible.
A point Nate made that seems to have been forgotten is that he specifically used all of their polls across a wide cross section of question type. This was to remove outcome weighting from a specific poll (e.g. a Senate race that is not changing much). By including polls on non-political questions and/or not on races, you would expect far more random distribution of the last digit.
My guess is not that these guys were making up the results, but skewing them. Cheating most often comes by taking real data and then fudging them by a few points. Studies of fraud (e.g. bank fraud) and people asked to write random sequences show that people tend to use 7, 8, 9 too often as they "feel" more random. The common technique with numbers fraud is to move small numbers to the other side of 5 (e.g. $1.01 becomes $1.07). So, you can imagine that these guys were doing the same - moving outcomes by 2 points each (so 4 points total shift) or so.
I think this kind of non-uniform distribution is expected for close races with relatively few undecided or third category responses. I ran simulations that used undecided percentages between 4% and 9%. Percentage differences between the two major choices was between 0% and 8%. The shape of the distribution is pretty similar to the Strategic Visions distribution.
Just a quick question:
Could there be a multiple testing problem? i.e. Did you just test SV or did you test multiple companies and find that SV stood out?
Well, I went ahead and did the Survey USA approval polling numbers for the rest of 2008 as well. Note that Survey USA did not release any surveys for June & July, so this represents 10 months of Survey USA approval polling data (including President, Senate, and Governor approval polls).
0 - 126
1 - 115
2 - 109
3 - 111
4 - 94
5 - 127
6 - 117
7 - 101
8 - 118
9 - 110
The dearth of trailing-digit 4s is still readily apparent. The lag in 7s is also becoming more apparent. A clearer preference for 5s and 0s seems to be emerging.
FWIW, I have also done a couple months of election matchups from late 2008 and the strangely lower incidence of 4s is even more glaring. I'm not sure what to think of it.
Quite honestly, when I began working these numbers I expected to find a uniform distribution that would essentially confirm Nate's analysis. The sample size is still too small to be altogether comparable. I may work out some more data on the matchup polling as well.
FWIW, the legal 'threat' that David E Johnson made comes across more as a warning off threat than anything else. Nate didn't say that SV LLC were making numbers up, just that the evidence suggests they may have been. I think that it would be very difficult to make a legal case based on that. All that suggests to me that Johnson is acting more guilty than he needs to if he has done nothing wrong.
@Steve
a 25% difference between min/max is much more likely than a 40% difference with a much larger sample size.
Via GraphPad.com's chi^2 calculator:
Chi squared equals 8.365 with 9 degrees of freedom.
The two-tailed P value equals 0.4978
By conventional criteria, this difference is considered to be not statistically significant.
@steve: thanks for your work in pulling that data together. It's easy to just sit here opining, but people like you give us hard data to play with.
Running your latest set through Excel's chi-square test, I get a p value of 0.49. You would need a p value of 0.05 or less to reject the hypothesis that it's random. This implies that statistically speaking, you can't call that data non-uniform, even though you may see patterns in the data.
In contrast, the p value of the SV data is something like 0.0000000000000000000000000479
Not sure if anyone else has already made this point, and obviously having the straight-up data from the SurveyUSA poll itself is useful, but Nate did include a control group, namely the set of all Presidential and Senate polls from 2008, which did come out looking rather uniformly distributed, and certainly shared almost nothing of the pattern exhibited by SV.
@Neef, the SV data is on the order of 10^-26, so I don't think you have enough 0s there... Or maybe all those 0s are getting me cross-eyed.
Though, Neef, p-values are heavily dependent on sample size, and thus not comparable except to say one dataset rejects the null and the other does not.
Is there a measure of effect size for Chi-Square? I can't remember. This would be a more fair comparison.
In addition, confidence intervals around the effect size would also be helpful.
@Jeff: I had that exact thought when I posted it, but the number comes direct from Excel.
I think the font size is misleading.
I also have to agree with Mark. It's a bit of an apples to oranges comparison when the SV data is compared to aggregate data of several polling firms. For instance, how do we know that the noise of other suspect distributions wasn't washed out in the aggregate. So let's see how Research 2000's or Gallup's data compares before we reach for the pitchforks.
@mark:"p-values are heavily dependent on sample size, and thus not comparable except to say one dataset rejects the null and the other does not."
You're right, of course. The comparison was unnecessary, and potentially misleading. Good call.
For those of you with the data ready to go in excel spreadsheets etc the effect size for Chi-Square is Cramers Phi (according to Wikipedia...)
Quoting Wikipedia "Cramér's phi is computed by taking the square root of the chi-square statistic divided by the sample size and the length of the minimum dimension (k is the smaller of the number of rows r or columns c)."
or
square root of X^2/N(k-1)
A couple of notes.
First, anybody can talk about suing, but it's extremely dangerous for somebody to sue unless he's blameless. As soon as a suit is filed, SV, LLC will have to release not only everything AAPOR and Nate have asked for, but also the owner's shoe size and the names of his goldfish. Tax returns. Names of disgruntled former employees and clients. It's what we lawyers call a "bright-line test": liars threaten to sue for defamation, but they don't sue. If SV, LLC has something to hide, you can be certain they won't be filing anything.
Second, Benford's law is useful as a starting point, but in a specialized situation like this, it's always possible to craft more precise statistical tests that take advantage of the vagaries of the situation. My eyes are drawn to the sharp differences between adjacent digits - which don't appear in either Nate's or Steve's tallies. It's possible to imagine statistical processes which generate - say - more 7 - 8 - 9 results, and fewer 1 - 2 - 3. But gaussian blurring makes it very hard for me to believe there could be lots of 0 but very few 1 results.
...but it's extremely dangerous for somebody to sue unless he's blameless.
Yeah, and it's equally dangerous to levy accusations and make incriminating innuendon public without having all of the facts.
Nate is picking a fight, and he should damn well be prepared to face the consequences if he's wrong.
innuendo
Does defamation over the Internet count as slander, or libel? Because if it's libel, that would seem to tie into the NYT v. Sullivan case, which established that for a public official to win a libel suit they have to prove that the defendant knew what they were saying was false. So if writing something on a blog counts as libel rather than slander, and if SV LLC counts as a "public figure," which I would imagine it does, then even if Nate is factually incorrect and they aren't cooking the books, he is not liable for libel.
This isn't libel, this is how science/math works. Nate was pretty straightforward with his arguments, and left the door WIDE OPEN for SV LLC to respond to his analysis. If you think Nate is libel, you need to go revisit the 1st amendment, and how all scientific progress is made ("PEER REVIEW" would be a good place to start). If you read published responses to controversial scientific theories, you'd know that Nate's arguments above are pretty tame compared to a lot of GEEK FIGHTS that are out there.
@hurricanexyz: Libel. It's written, so by definition it's libel. Slander is spoken - so if Nate had made a YouTube video, slander would be more relevant.
IANAL, but that's the distinction.
Not that what Nate's said is libel - the case would probably be decided in Nate's favor even if the claim is false because David Johnson is a public figure, and Nate is not saying something he reasonably knows to be false in order to defame Johnson (from all indications). Again, IANAL, but Johnson probably has a snowball's chance in Hell of winning in court.
Rudy said...
Oh, PG, you're just jealous that Rasmussen was so far ahead of the pack and was way more accurate on the way down.
~~~~~~~~~~
Jealous of a pollster, hmm, interesting concept ;) not an interesting conversation to be sure, but ...
My emotions tend to be more involved in final results, not ad nauseam predictions ie 2006/2008 elections, the bottom line.
No, nothing to be jealous about there as the Dems annihilated the Reps. Oh joy, oh rapture!
Just the facts, ma'am.
OK, deflection over, carry on.
Mule Rider said...
Yeah, and it's equally dangerous to levy accusations and make incriminating innuendon public without having all of the facts.
In your opinion as a successful and highly respected attorney [and economicst] ... in his mid-20's...is that what you feel Nate has done here?
LOL, you dufus.
With real data, if the underlying distribution is not perfectly uniform then the counts should be roughly continuous (meaning similar counts for adjacent digits). If you have a lot of 5's, for instance, then you should also have a lot of 4's and 6's, because every poll number that came up 5 could easily have been a 4 or 6. You can see that in the Presidential General Election polls, where the largest gap between adjacent digits is only 56 (619-563). With the Strategic Vision polls, the largest gap is a much larger 131 (562-431), and there's another gap of 106 (639-533). It sure looks suspicious.
in case people were looking for the link like i was:
Tampabay.com David Johnson reply
Rudy:
The verdict on Rasmussen is that it's good if and only if the results can be confirmed. For elections, it gives very good results. For issue polls and approval rates, it doesn't - its Obama approval rates are consistently lower than those of other pollsters, while its Bush approval rates were consistently higher.
The only reason Rasmussen seems prescient is that when a curve slopes down, moving it down is the same as moving it right. So by publishing lower approval numbers, Rasmussen is sure to report today the approval rate Obama will get in two weeks - as long as Obama's approval rate keeps declining. But this method has no predictive value - if Obama's approval rate starts climbing, Rasmussen will start lagging.
OK, I have added the Survey USA election matchup & issue handling polls from July-Sept of 2008 to my data set. Here are the trailing-digit totals combined with the Survey USA approval numbers for all of 2008.
0 - 226
1 - 221
2 - 224
3 - 235
4 - 210
5 - 226
6 - 221
7 - 190
8 - 216
9 - 223
Now, that seems more like it! The trailing-digit 7s are still lagging somewhat - 19% less than the highest number 3s. However, the 4s are finally catching up while the 5s and 0s have fallen back into the pack.
I am now reasonably persuaded by Nate's line of argument. Not totally, but more so than not.
I may add more Survey USA data tomorrow to see if the 7s catch up too, but I've spent enough time on this today. Thanks for humoring me while I worked this much out.
steve, from one stats nerd to another, you should be wary of continuously adding numbers and analyzing over and over again, because you might end up with, randomly, a "good" result that confirms Nate's analysis. If you've worked out a methodology, perhaps you can pick a different pollster, and do the analysis *once* (e.g. fix the methodology, collect the data, report it), based on the "exploratory" data analysis you did with Survey USA?
Hi Jonathan,
Ideally, what I want is the number of Survey USA data points matching the number of Strategic Vision data points that Nate used in his analysis. The rest of the election matchup & issue handling polls from 2008 should reach about the same number.
Unfortunately, the way I'm doing this is very time-consuming so I have to stop for now. I will add, however, that what I find persuasive is not the static data set where I'm at right now, but rather that the numbers are converging toward uniformity.
That said, I am likely to finish out the 2008 Survey USA numbers tomorrow. I'll post my final result back here, and anyone who's interested is welcome to evaluate what, if anything, that signifies about Nate's analysis.
It will be very interesting to see if they actually file suite. If they do, they open themselves up to discovery, and then we will know the truth, one way or the other. Of course Nate (unless independently wealthy) will need support for his legal defense.
Ah, how about giving a link to a graph of the sample size vs. some measure of uniformity? I know, I know, this is just for fun. Still, since a lot of folks here have at least a passing stats nerd interest, it might be good to show a living, relevant example of the "law of large numbers". Keep up the good work!
LOL, you dufus.
Whatever, dipshit. One of these days, and I'm guessing pretty soon, the Emperor/Messiah (Obama) and the Wizard/Professor (Silver) are going to be unequivocally humbled. They've already side-stepped plenty of close calls. Obama on just about every other major policy change, and Silver has had to clam up and back into his cave here at 538 after some dastardly attacks he's made proved to be less than substantive. It's just a matter of time before they wade into one big pile of shit and get entrenched in a stink they can't wash out.
In any event, that day is coming, and it's going to make the rampant and unfettered homerism several of you display for the two of them look incredibly childish and foolish. I can't believe so many of you of the purported progressive intelligentsia can be so blindly obsequious to a snake-oil salesman President and a smart-ass jerkoff stats geek.
Folks:
http://www.washingtonwatch.com/blog/2009/09/07/do-not-feed-the-trolls/
Okay, I can't read all ten million posts to see if anyone else has said this so here goes.
To those of you who want to see, "comparison to other pollsters," that information is in the graph labeled "...all Senate and Presidential polls, 2008." The comparison is made more accurate by aggregating the information rather than breaking it down company by company. If other individual companies vary from the overall pattern it would suggest that they are cooking the books too, not that Strategic Vision is not.
A note on defamation law. Generally, modern courts don't preserve distinctions between "slander" and "libel", although that varies from state to state. The actual words used are less important than the meaning they carry. In this case, if it were eventually proven that Nate's concerns are completely wrong, he might be in trouble, regardless of his expressions of doubt.
The standard SV-LLC would have to meet is hard to call. SV-LLC issues press releases, takes public positions on issues, and invites controversy. But can a corporation be a "public figure"? They certainly could not meet the standards of proof set by Times v. Sullivan, but they might possibly meet the standard used in typical business disparagement cases.
Nate's real protection is that he appears to be correct. Each hour that passes without clear response by SV-LLC makes me more confident of that.
@Mule Rider
The big question is when are you going to register embarassment after an episode of having your idiotic statements, your wildly inaccurate prophesying, and your outright lies shoved back down your meally little maw?
I guess you are getting pretty desparate to pin your hopes on empty, boilerplate "I'll sue you" bluster. But you're still too thick to STFU. You have no pride, place no value in your own words.
And it shows. People understand that, so they call you a "troll". But really you are just a pitiful man wallowing in his own weakness and impotence.
Hi Kenneth,
In my view, you cannot reach a definitive conclusion by comparing the Strategic Vision data to the aggregate data of all the pollsters.
The most obvious reason is that given pollsters tend to repeatedly poll the same campaigns or issues, which may very well skew their overall data. Another obvious reason is that some pollsters will poll a single question at a time while others will poll multiple questions together. The results on multiple question polls will segue together more because the partisan breakdown will be identical across multiple questions.
When you aggregate all the pollsters, you essentially eliminate these types of sampling biases. On the other hand, if you have a comparable data set from another pollster then you can reliably evaluate whether or not similar statistical anomalies are present.
Suing won't do any good. Nate's main point was they don't show their methodology and their numbers look suspicious. They could sue but they aren't going to disprove those points. Suing would just mean showing their methodology and proving that they are really doing it.
Nate never said they weren't. He just said that they should prove they were.
Go ahead and do that.
BTW, "we are contacting our lawyers and will use every legal avenue" is a pretty generic statement for this sort of thing.
Every legal avenue tends to mean we'll whine about it but not actually do anything.
The big question is when are you going to register embarassment...
Register embarrassment? For something...anything I've ever said anonymously on a political blog? Are you kidding? What are you, a kindergarten teacher? Get a clue.
...after having your idiotic statements
Purely an opinion of yours. I can and do say the same about most everything you say. You saying it carries no more weight than me and is just another bland attempt to shout me down because I'm in the "minority" in this echo chamber.
...your wildly inaccurate prophesying
Got a link or proof to back this up? You keep saying it but you never mention anything exactly that I've "prophesied" on and beend "wildly inaccurate."
and your outright lies are shoved back down your meally little maw?
Again, got any proof? Link to something I said that was unequivocally a lie and not just a differing opinion, and I'll tip my cap to you for pointing it out and apologize. But otherwise, you're just one ranting fool who dislikes another ranting fool anonymously on the world wide web. Ho-hum. That really deflates me, I'm here to tell ya. /sarcasm
I guess you are getting pretty desparate to pin your hopes on empty, boilerplate "I'll sue you" bluster.
I'm not "pinning my hopes" on anything. I even said above that Nate's critique is not without merit. My broader point is that he's been quick at spouting off way too much, and eventually it'll bite him in the ass. Maybe this time, maybe not. Doesn't matter. He's come too close to getting burned for it to not backfire at some point. I really don't have a dog in this fight as I'm no defender of SV LLC and had barely heard of them before Nate brought them up. It's funny you try and call me out on that, though, as you're too thick-skulled to see that you pin your hopes on every piece of useless "I've got numbers on my side!" bluster that Nate throws out. What a joke.
But you're still too thick to STFU. You have no pride, place no value in your own words.
And it shows.
Again, your attack rings hollow, and again, I could say the same about you. You're a self-loathing dick, and it shows.
Nate,
I regret to tell you that I have indeed been polled by strategic visions. In fact they polled me all night and never called the next day.
I also regret to tell you that I am also a sock puppet.
Kindest regards;
Mr Binky.
@Jonathan
Okay, okay. I will once again commit to not responding to the trolls. But whack-a-troll is like bubble wrap. Just...so...hard...to...resist.
I will vouch for the validity of Benford's law. Years ago I used Benford's law to create a purchasing/inventory analysis system that relied upon Benford's law to identify inventory numbers from floor personnel that were 'invented.' As a result we were able to improve accuracy and conduct more reliable inventories, cutting costs.
Not bad for a kid who at the time was only 24 and a registered sex offender, eh?
Nate, you may find it interesting that Stanford has shown that a coin flip is not actually random. Even working with perfectly weighted coins, the side that is face up when flipped is slightly more likely to land face up.
Stanford just recently released a journal article on this. It's a little heavy on the biomechanics, but a great read.
MuleRider,
"Nate can get pretty flamboyant and cocky at times when he's "calling people out."
It's already bitten him in the ass a couple of times before, but not too badly."
What exactly are you referring to?
There are articles he has written that seem to have some oversights (I really didn't like his driving/flying STL-NYC analogy), but I think that he's actually pretty careful about "calling people out" and he doesn't seem to do it unless he has really thought through the consequences. He knows that he is putting their careers in danger and that they have the power to put his own in danger if he is wrong.
I think that causes a little bit of extra caution.
I'm really curious about how this turns out. It seems to me he is asking very legitimate questions. Perhaps SV really does have answers to the questions. I guess we'll see in time.
I've mentioned this before but I think it still bears pointing out that polling methodology is becoming more problematic with the decreased use of land lines. Absent a nationwide database of cell phone numbers or e-mail addresses, the word random begins to lose meaning. Sure, 'random sample of all remaining land-line users' maybe. But that becomes a stratification issue, doesn't it? Can you accurately extrapolate across the entire population based on that? Some pollsters claim to take this into account but I'm left wondering how. I haven't had a land-line for almost seven years. It apparently didn't affect Nate in '08 but I'm wondering about next time.
Got a link or proof to back this up?
Of course, and it's been shoved into your face over and over. The idiocy, the lies. It's all in the history of this blog (and backtype.com). But you aren't going to look because you couldn't stand to look square in the face of what you do.
Instead you just delude yourself that everyone else around is as much a powerless, lying, buffoon. You tell yourself it doesn't matter, that words don't matter, that nobody reads Nate Silver's blog muchless these comments. That it doesn't matter if you lie and make a fool of yourself. You can just run away [like you project onto others].
Because the truth is way too painful to even acknowledge, isn't it?
G'night Mule Rider, go drown your secret shame of yourself in beer/whiskey chasers ....
While I was reading the comments on this thread, I can't help but notice that some people are trying to calculate the stats with chi-square, p-value with the trailing digit, whether they support Nate's argument or not. These people are discussing the sample size and method and offer suggestions to the current model, and their posts are very interesting.
Some others offered no math, no model to support their argument, and I cannot learn anything from their posts.
Mule Rider, please support your argument with some stats. Sample size? Method? Why is p <<<< 0.05 (95% confidence)?
Could there really be an innocent explanation?
I've been working on some original research which will only make sense to serious stat-geeks - the problem of how sharply SV-LLC's polls could possibly distinguish between, say, 50% and 51%, given the problem of gaussian blur.
In the course of setting up a Monte Carlo simulation I looked at SV-LLC's website and discovered their polls generally claim sample sizes of exactly 600, 800, or 1200. If that's literally true, it seems to me the only results they would ever report would be of the form ROUND(n/{600,800,1200}) where n is an integer. Depending on the rounding method, that might result in some freakish distributions of final digits.
In particular, the three radixes would "beat" causing certain digits to be common or rare, simply because a larger or smaller number of vote totals would round to an integer percentage that ends with them.
Did anybody follow me?
@clarkejeffrey
You know that conversation we had about trolls yesterday? Mule Rider is our most notorious troll.
Thought I should give you a heads up.
Alon:
Not so that Rasmussen would lag on the upside while leading on the downside. You would think that would be the case if he were biased toward one side, but your claim does not hold water. Partisan blinders off, please.
Again, empirical evidence: Rasmussen's numbers on the Obama approval rating bottomed on 9/1 at 45%, while none of the polls that were behind the curve bottomed that soon. Some are still drifting lower. Gallup, for example, was at 52% approval at the time Rasmussen bottomed, but is presently at its lows at 50%. Meanwhile Rasmussen has upticked six points from the low to 51%.
So, if Rasmussen is, in fact, leading the pack, you should expect Obama approval to uptick in the coming weeks in the other polls. The absolute numbers between polls are not directly comparable because of mix differences. They are best calibrated at election time, which is when Rasmussen has historically shined in comparison the others.
The thing about people actually writing to the blog and saying that they were polled is a tad unrealistic.
I've been polled several times and I don't remember at all who the polling company was.
The fact that there is another company with basically the same name makes it even harder.
Legitimate polling companies must have extremely high phone bills. How about if the company just released their bill?
I think that would be a piece of real evidence. They could get a phone company representative to write an affidavit saying the bill was legitimate. I know the information is confidential but I would think SV would want to waive confidentiality in this case and I think that the phone company could write the affidavit if confidentiality was waived.
Davy said...
Absent a nationwide database of cell phone numbers or e-mail addresses, the word random begins to lose meaning.
Something occured to me, do pollsters have to adhere to the Don't Call database or are they exempt from that? Having a somewhat cynical outlook on this, I'd be a bit surprised if Congress didn't include random political and policy polling as an exemption for the Don't Call list.
I guess I could go try track down the answer but does anyone here know the answer offhand?
Davy,
I know. I was here during the election. He is better than he was then.
The thing is that I don't think everybody that disagrees with me is a troll. I don't want to shut down contrary opinions, just the senseless name calling.
As long as he is willing to maintain civility, I'm willing to talk to him.
Pollsters are exempt from Do Not Call rules
@Dwight
I think the rule is you can't be fined if you aren't soliciting or selling something. But a poll is soliciting information so I couldn't be 100% certain.
I remember a conservative guy in Vegas called the cops on me during the primaries (in truth, the Obama machine did get a little overzealous in the final days). He claimed my knocking on his door was illegal. Cops said it wasn't.
As long as he is willing to maintain civility, I'm willing to talk to him.
Best of luck with that. ;-)
What exactly are you referring to?
I, for one, called him out on his "calling out" with his "climate challenge" back in July and showed just how hollow it was.
http://www.fivethirtyeight.com/2009/07/climate-challenge-update.html
I know he's made some bold statements against prominent conservative intellects - George Will and Greg Mankiw to name a couple - where he didn't come off looking as smart as he thought he did starting out.
Like I said, he hasn't gotten burned completely yet, but he keeps playing with fire, so it's bound to happen.
Mule Rider, please support your argument with some stats. Sample size? Method? Why is p <<<< 0.05 (95% confidence)?
I wish you SOBs would actually read my posts if you're going to respond to me. I never said that Nate doesn't have a statistically sound argument, nor did I say SV LLC has a solid defense of what they do. My point is that Nate is quick to spout off and "call people out" and one of these days it will come back to bite him in the ass (by embarrassing the hell out of him when he's sorely mistaken).
This is maybe one of those times. Maybe it isn't. My overall point is that Nate is a pompous ass who's due for a humiliation, not that this specific debate is clearly against him.
"Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force:
'There are three kinds of lies: lies, damned lies, and statistics.'"
- Mark Twain's Own Autobiography: The Chapters from the North American Review
Of course, and it's been shoved into your face over and over. The idiocy, the lies. It's all in the history of this blog (and backtype.com). But you aren't going to look because you couldn't stand to look square in the face of what you do.
It's funny that with soooooo many examples that have been "shoved" into my face over and over again, you still can't take 2 minutes to link to just one instance - either here or on the backtype log - of me saying something blatantly inaccurate. Nice try, though, in articulating the same tired rant against me and trying to pass it off again as truth based on sheer assertion.
Instead you just delude yourself that everyone else around is as much a powerless, lying, buffoon. You tell yourself it doesn't matter, that words don't matter, that nobody reads Nate Silver's blog muchless these comments. That it doesn't matter if you lie and make a fool of yourself. You can just run away [like you project onto others].
Because the truth is way too painful to even acknowledge, isn't it?
Holy crap, that was hilarious. I could literall see the blood vessel stretching out of your temple as you typed that up. Wow! I mean, WOW!!!
G'night Mule Rider, go drown your secret shame of yourself in beer/whiskey chasers ....
Secret shame? WTF? You are so pathetically said but funny nonetheless. I will go have a drink or two....nothing wrong with that...cheers!
The plot thickens:
http://www.pollster.com/blogs/strategic_vision_a_bigger_stor.php
The latest word from SV LLC still doesn't answer a single question:
http://www.politico.com/blogs/bensmith/0909/Embattled_pollster_defends_methods.html
Hey, if you don't like Mule Rider, then don't respond to him. The guy didn't see fit to attack Nate's thesis (probably because he could only do so by using statisical techniques that he doesn't understand (and I don't either)), yet you guys jumped for the bait and the next thing I know I have to skip past a dozen or so of his responses.
If you don't want somebody polluting this site with right-wing twaddle, IGNORE IT. He'll go away soon enough.
I'm contacting my attorneys
The two things I like most about this site is that Nate does call people out from time to time and that he hosts a free-for-all here except, iirc, for a couple of periods of brief duration when one of the contrarian's obsessions seriously got the better of him.
I'm still hoping for a complete rundown of all your aliases someday Mule though I know that, far more often than not, someday never comes.
It would nice to see the original data (the numbers for each poll) so that a more robust analysis can be done
"I know he's made some bold statements against prominent conservative intellects - George Will and Greg Mankiw to name a couple - where he didn't come off looking as smart as he thought he did starting out."
I'm not sure what you're referring to. I went back and looked at the Will and Mankiw articles.
Mankiw's argument was basically:
Sotomayor spends more money than my grandmother did during the great depression.
Nate's response:
Thats a silly standard to apply to supreme court justices.
Will said a bunch of things about climate change that were basically blatantly wrong. Nate called him on it.
I didn't see any point where Will or Mankiw said anything that proved Nate wrong in any way.
I mean you could argue that saving is really important or that climate change isn't, but thats obviously a matter of opinion.
Your basic complaint about the climate change was a) He didn't give enough time to respond. b) By limiting it to other blogs and only posting it on his own blogs, nobody knew the challenge existed. Basically none of the 1000 people that were eligible knew that he existed.
a) You have a small point, but logistically he did need to set some ground rules. Certainly, a conservative blogger could have responded a week later with "I didn't see this on time but I'm willing to take the bet for next month". I think the ground rules were open to some negotiation.
b) No point at all. I hate to break it to you but bloggers read other blogs. Mankiw and Strategic Vision certainly knew about his criticisms. This isn't like he put a challenge to ESPN readers on Martha Stewart. The blogging community is relatively small but what happens in it gets spread quickly. I fail to buy the argument that none of the conservative bloggers even knew the challenge existed. They didn't take the challenge because they knew they would lose.
It had nothing to do with start dates or lack of knowledge.
But if you really believe that, then I guess nothing Nate writes should bug you because nobody ever reads him....
My overall point is that Nate is a pompous ass who's due for a humiliation, not that this specific debate is clearly against him.
I think this is sort of like hating a sports team because they win a lot of games. Lets be honest. Nobody hates the Clippers or thinks they are a bunch of pompous asses.
You have sort of admitted that he doesn't get caught being embarrassed when sorely mistaken often. I guess your argument boils down to "he knows his shit but I really wish he didn't or at least didn't say it out loud."
Its an interesting way to look at the world and I can see why he must frustrate you.
Check it out.
Strategic Vision has dropped the addresses from its site and disabled the chat feature. I still have the addresses if anybody is interested.
The heat is on, apparently.
BTW, I don't think Nate walks on water. Read my comments about the article on how Americans love to drive. I think he got some things wrong and made a few bad analogies.
But "calling someone out" the way he did with Strategic Vision is obviously extreme and I doubt he'd do it without cause.
I am sort of curious how this plays out. Maybe SV will release all their methodology and phone bills and this will turn out to be a false alarm. Still there is definitely smoke here and smoke means you must investigate the possibility of fire.
"We are sorry, due to misuse, our chat feature has been temporarily disabled."
mis·use (ms-ys)
n.
Tons of people asking us whether or not were full of shit.
@Mark Grebner: I don't follow you in detail, but I think I get the idea.
In my own research experience with surveys abroad, in which researchers somtimes used a kind of quota sample approach, it was much more common than in Americna surveys to end up with exact even numbers, e.g., 850, 900, 1200, 1500 respondents. (Note that in the U.S. I recall that last year in many of the state-level polls in the pre-election period Rasmussen surveys tend to have such very even numbers, typically 500 or 600.)
In the surveys abroad, which were carried out in face-to-face interviews, if they ended up meeting their quotas in specific categories (by age, gender, etc.) they would sometimes end up with "extra" interviews. Instead of applying weights, to correct the results, however, they actually threw away completed cases (!) to achieve the size and distribution of the respondents they were seeking. This gives the researchers the possibility of 'selecting' from the completed cases and makes me very uncomfortable.
Now Rasmussen's state-level polls were always automatic (no-interviewer) telephone surveys. They didn't throw cases away, I am sure, to get their exact 500 or 600. They just programmed their computers to stop calling when they achieved the exact target number of interviews.
It seems to me that if SV, LLC, is using either the foreign survey approach (some form of quota sample with fixed target) or the Rasmussen approach, they might be able get an exact number of surveys (e.g., 800).
But the "magic" in many of these surveys is what kind of post-survey weights are applied, e.g., whether they weight by party ID, and what kind of likely voter adjustment they make. This is what AAPOR was after in requesting information from SV-LLC and other polling organizations. And this magic is what SV-LLC is keeping secret.
Sorry, forgot to paste the link.
http://www.strategicvision.biz/contact_us.html
Looks like they must have something to hide.
Davy said...
Check it out.
Strategic Vision has dropped the addresses from its site and disabled the chat feature.
~~~~~~~~~~
Next they'll be moving to an undisclosed location, like darth when the going gets tough wingers leave town, er disappear as have all the Reps ;) at 538 lol
@Davy: just for the record, please post the (former) addresses here.
Thanks clarkejeffrey.
@Davy
What are the latest numbers on landlineless households? Got a link?
Something else that doesn't seem to be accounted for by all pollsters is how many phone numbers a household has. I've been polled and asked that question (by a university study to do with health and fitness) but often not. If you don't account for that you could get some heavy sampling towards the more affluent, larger family, end of households.
P.S. If I'd been here for the last election and gone door to door for Obama I'd be happy if people phoned the police rather than just shot me and left me on the front walk. :) It's not exactly Obama Friendly territory out here.
Fifi…
Please shut up.
The only reason you would hang out in a forum where you have been insulted, exposed as a fraud, and ridiculed is that you have no alternative place to go, such as the house of a friend, or to the movies with a friend, or just hang out a friend.
This suggests, inescapably, that you have no friends, and we don’t need to go into trailing digits or gaussian blur or a Monte Carlo simulation to come to that conclusion.
You are a poodle, a lonely unloved poodle.
You going to break out the murder threats again? Now would be a good time, sort of put the icing on the cake of your foam-flecked, witless fury.
@Juris
It occurs to me that since Strategic Vision took them down I might be causing Nate some troubles by making them available on his website.
Here is the one address they did leave up:
Atlanta, GA (Headquarters)
2451 Cumberland Parkway
Suite 3607
Atlanta, GA 30339
P 877-556-0004 (toll free)
F 877-556-9822 (toll free)
If you really need them I can post my junk e-mail address here.
A lot of the addresses were actually put up on the original post about this, so if you want to find them, go there.
If I recall, the Atlanta, Madison, and Tallahassee locations were UPS stores. The Seattle location was a Mail Boxes, Etc. store. Which is fine, but it just seems weird for there to be no physical addresses.
@Dwight
You'd think as a grad student working on a thesis that I would be in the habit of notating those types of things. I can't remember where I saw that tidbit of info though. I'll poke around and see if I can't find something.
I guess since is was previously a matter of public record it wouldn't hurt.
Atlanta, GA (Headquarters)
2451 Cumberland Parkway
Suite 3607
Atlanta, GA 30339
P 877-556-0004 (toll free)
F 877-556-9822 (toll free)
Madison, WI
1360 Regent Street
Suite 152
Madison, WI 53715
P 877-287-2141 (toll free)
F 877-287-2399 (toll free)
Seattle, WA
800 5th Avenue
Suite 101-387
Seattle, WA 98104
P 877-245-0098 (toll free)
F 877-245-4699 (toll free)
Tallahassee, FL
2892 Park Avenue
Suite 5
Tallahassee, FL 32301
P 877-878-2680 (toll free)
F 877-878-2681 (toll free)
AT&T has been losing landline subscribers each quarter at an accelerated rate since 2006. It dropped 7.4 percent in 2007. Analysts said the economic downturn could also have an effect on the landline business. They say consumers looking to cut expenses will drop their landline – which can cost up to $60 a month – before they drop their wireless phone plan.
From Dvorak Uncensored
The other article, if I remember correctly was a link off of MSNBC to an article titled (paraphrasing) '40 things that are disappearing'
Speaking as a blogger with a journo background and an aversion to being sued for cause, I'd have to say that Nate was extra-ordinarily careful in avoiding any actionable statement.
Of course, the obvious response to Nate's questions would be to provide the methodology. (This would amount to a high-value consultation, for free.)
Threatening legal action as a first response - well, it's the sort of thing that gets reporters assigned to stories, to see if there might be something to them.
Given the obvious financial opportunities for generating apparent data that confirms what people wish to believe, and given other indicators - such as the POBox "offices" of the firm in question, as well as their core business, which their polling seems to be intended to support - there are a lot of circumstantial reasons to treat their results skeptically.
BTW - although I have no data to support this, it's my observation that the trolls were better at trolling when I was younger.
Perhaps there's a dilution effect at work, coupled with a lack of troll-on troll competition.
I also note the decline of the internet flame as an artform; perhaps there is some linkage.
I chose to view it in light of an old joke:
Masochist: Beat Me! Beat Me!
Sadist: No.
I'm ignoring the troll. Sadistically.
Got it.
http://www.computerworld.com/s/article/9136629/Obsolete_tech_40_things_on_their_way_out?source=CTWNLE_nlt_dailyam_2009-08-19
wv: palinje. Martial art form for former Alaskan Governors
"The difference between now and then is that I treated this site like a mud pit before and didn't think the blog authors or any of the commenters were serious. While I still think Nate and his cohorts still put out some garbage - and they still put plenty of spin in the good ones - and the comments section is still filled with too much name-colling and devolves into far too many pissing contests, there are enough people here - between authors and commenters - to challenge my intellectual acumen that it is worth my while to stay tuned to people who view the world from a slightly different ideological angle and see what they have to say."
If only your actual criticism was as coherent and well thought out as your reason for said criticism.
Rudy:
You're essentially making my point for me when you say, "The absolute numbers between polls are not directly comparable because of mix differences. They are best calibrated at election time, which is when Rasmussen has historically shined in comparison the others."
Polling on issues is different from polling on actions. Actions can be measured; pollsters can measure what they did right and what they did wrong and improve, and poll readers can measure which polls are better and which are worse.
Issues are different: they depend on question wording and question order, and are arguably more demographically sensitive than voting. Rasmussen has used push-poll wording before in its issue polls, and has a weird likely voter screen for the first year of a Presidential term; this suggests its issue polling is in general less credible than its election polling.
To add to the addresses, the address in Atlanta has changed twice since 2003, none of the others have changed. (Tallahassee has been there since at least 2003, Madison and Seattle were added in 2006)
In 2003 the address in Atlanta was
Atlanta Office (Main)
235 Peachtree Street
Peachtree Center - North Tower
Suite 400
Atlanta, GA 30303
ph: 678-556-0053
fax: 404-287-2397
In 2004 it changed to
Atlanta Office (Main)
260 Peachtree Street
Suite 503
Atlanta, GA 30303
ph: 404-880-0098
fax: 404-880-0084
And then in 2006 it changed to the one already posted.
So Nate, you should recalculate a number of the election prediction posts you put up before November 2008 (using their polls) and recompute predictions of win:lose margins and see if removing them leaves your numbers more or less accurate and report on whether their results might have given your predictions (and those of others) a liberal or conservative bias.
@Mule Rider
It's funny that with soooooo many examples that have been "shoved" into my face over and over again, you still can't take 2 minutes to link to just one instance - either here or on the backtype log - of me saying something blatantly inaccurate.
Well there is me right in this thread mocking you for your past claim about having a successful career as a respected economist. :)
But sure, I'll do a quick Google (you can limit it to the 538 site by adding the text site:http://www.fivethirtyeight.com to the search paramaters). Read through this thread for just one example.
Maybe you could add that link to your browser favorites list as "Oh yeah, I am a bullshit artist and everyone knows it"? That way when you go back into denial, in about 30 seconds, it'll make it easier for all of us if you can just go reference that from your Favorites.
Googling the addresses is kind of fun.
Here's a partial list of the businesses at that Atlanta address:
Abortion Clinics OnLine
Amazing Maids
At Your Services Atlanta
Dirty Work Pooper Scooper Service - 2 reviews
Friendly Systems, Inc.
John Nicholson Web Design
Keystone Interior Design, LLC - 8 reviews
Mail Boxes Etc
Movie Gallery - 2 reviews
Organic & Environmental Systems Inc - 1 review
Publix Super Market - 1 review
Simply Flowers, Inc. - 2 reviews
Southeast Computer Forensics & Security
Subway - 1 review
Sun Trust Bank - 1 review
UPS Store - 3 reviews
I myself would like to believe that the pollsters are also the same guys who run Abortions OnLine (WTF? D&C-over-DSL?) and the poop-scooping service.
Adam
Instead of worrying about the obscure and little-known Strategic Vision (who may or may not be on the level) why not focus on the obvious bias that we typically see in the CBS/NYT poll?
As Jay Cost helpfully notes,
Poll after poll shows the public has real concerns about the health care proposals working their way through the Congress, as well as the President's handling of the issue. Even the latest CBS News/New York Times poll - whose 22/37 Republican/Democrat split has probably not been seen in an actual election since 1936 - shows a confused and divided public.
If Nate, Numbers Guru Extraordinaire, you are so concerned with cooking the books why not direct your fire on this polling outfit? Having questions about SV is one thing but here is a perfect example of completely bogus polling.
22R to 37D? What the hell is that? When has the electorate EVER looked like that? And I know that some on here are going to spew nonsense about self-identification for Republicans being down and the like. But it's utter horseshit. On Election Day, the identification of the electorate NEVER looks like the CBS/NYT polls. Never.
In 2008 the party ID was DEM 39, GOP 32. So in the year where the Republicans has the lowest share of the vote since 1996, there was only a 7-point disparity.
In 2006, it was DEM 38, GOP 36.
In 2004, it was GOP 37, DEM 37.
Never 15 points. The CBS/NYT polls deliberately and always skews to the left in such a way that it affects media coverage favorably for Democrats. Why more attention isn't paid to this is beyond me.
For those of you who are still interested in the statistical argument I did some extra analyses that I think help illuminate potential differences.
Using the data from the original post and Steve's latest Survey USA numbers I computed all three Chi-Square tests with the assumption that the numbers, if truly random, would be evenly distributed.
Going beyond the normal null hypothesis test I also computed phi, a measure of effect size. For those of you who don't do math for fun the effect size is better for comparisons of different sample sizes, because (as I mentioned earlier) the p-value is heavily influenced by sample size. In fact, with a big enough sample any Chi-Square test will be significant.
Senate & General
Chi-square = 22.83
p-value = .007
Phi = .04
Strategic Vision
Chi-square = 99.59
p-value < .0000001
Phi = .13
Survey USA
Chi-square = 6.08
p-value =.73
Phi = .05
So, while the Senate and General comparisons do have a significant Chi-Square test, their effect size is very similar to the Survey USA aggregate. Both of these are much smaller than the Strategic Vision effect size.
All of this suggests that Strategic Vision is much less random than the other two polling samples.
I'm not entirely sure how to compute the confidence interval of the effect size, so that is a caveat to keep in mind. Also, as others have discussed, this analysis may be using incorrect assumptions about the distribution of numbers.
Happy GOPer
Are you defining party ID on election day by who gets the votes?
While the distribution of the CBS/NYT polls might be off, it just seems that election day voting patterns wouldn't be the best comparison.
For example, on polls and surveys I will show up as an independent, but rarely vote for 3rd parties. I can't imagine I'm the only person.
However if I'm wrong about how you are defining election day party ID then my argument is useless. Enjoy.
Happy GOPer…
I think what you’re seeing is the utter rejection of the GOP by the public.
If Democrats are suffering a decline in popularity, the GOP are suffering an unprecedented plunge. For every point the Democrats fall, the GOP falls two.
The truth is that most pollsters adjust their numbers to be favorable to the GOP out of fear of being accused of anti-GOP bias. Your post demonstrates that this apprehension is well founded.
Mark,
No. I am defining it by how voters self-identified in exit polling.
The CBS/NYT poll typically will survey "adults" and skew way too much to the Democrats, only to slowly and gradually adjust their sample to more closely resemble real life as Election Day approaches. It's deliberate and intended to shape public opinion rather than report it.
@Davy
I didn't see in that article where they talked about the switch? But here, I've found a couple references. The first is 2 years old.
http://www.msnbc.msn.com/id/18659835/
The next is a little over a year old, and it references where the number comes from (Mediamark Research, Inc.).
http://www.mainstreet.com/article/smart-spending/budgeting/cell-phones-vs-landlines-surprising-truths
Those are pretty big numbers, I didn't realize it was quite that big. Now over a decade ago I went to cell-only. But I was newly-single back then, and still in my twenties.
These days it'd be pretty tough for me to [convince the wife to] get rid of the landline because of what George Carlin would refer to as "needing more shit to keep all my shit from everyone else". House alarm with dial-in. *shrug*
Pragmatus,
That addresses exactly zero substance to my argument. If on Election Day voters NEVER self identify 15 points more Democratic, then how is it that a mainstream media poll that consistantly shows such a huge Democratic party edge have any credibility?
AHG - NYT/CBS were hardly the only outfits reporting a large post-election partisan ID gap. Gallup had them too. Note the term post election in there. Everybody loves a winner, and therefore are more likely to identify as D immediately after; everybody hates a loser, which is the most likely reason for the large PID gaps reported after the election.
Choice of LV model will play some role as well, but I think there's a very real post-election depression effect that should be considered - and which usually is NOT at play when people actually go to the polls.
TJ Hairball,
Sounds like a reasonable argument. The problem is that there was no rallying effect for the GOP in 2001. CBS/NYT still showed large Democratic edges in their polling.
Yes it is true that other pollsters show inflated D identification. That's what happens. For some reason when you poll a bunch of schlubs that never vote (polling "adults" instead of RV or LV) they always seem to skew Democrat.
But CBS/NYT is *consistantly* the worst offender, *consistantly* skewing left and always plays these games with sampling - until they are forced to adjust the numbers and increase Republicans in their polling as Election Day draws near.
Post a Comment