11.23.2008

Projection: Franken to Win Recount by 27 Votes

As we wrote yesterday evening, the ever-increasing number of challenged ballots in Minnesota is making it more and more difficult to determine the extent to which Al Franken is in fact gaining ground in the state's recount process. An analysis of precinct-by-precinct returns available on the Secretary of State's website, however, suggests that Franken's position is somewhat stronger than it appears, and that he may in fact be the favorite to prevail in the recount process.

Consider the following. In precincts where no challenges have been issued (these are the only precincts in which, in some sense, the results of the recount can be considered to be final and "official") Franken has gained a total of 34 votes, and Coleman a total of 6 votes, for a net gain by Franken of 28 votes. Moreover, in precincts where just 1 challenge has been issued, Franken has gained a net of 31 votes on Coleman, and in precincts where exactly 2 challenges have been issued, Franken has gained a net of 32 votes on Coleman.

By contrast, in precincts where 5 or more ballots have been challenged between the two campaigns, Coleman has gained a net of 57 votes on Franken.

In other words, the fewer the number of challenged ballots, the better Franken is doing, and the higher the number of challenged ballots, the worse he is doing; the relationship is in fact quite strong.

Precinct-Level Returns Analysis
# Challenges n Franken Coleman Net
0 2233 +34 +6 Franken +28
1 419 -94 -125 Franken +31
2 154 -90 -122 Franken +32
3-4 133 -157 -171 Franken +14
5-9 59 -158 -116 Coleman -42
10+ 26 -156 -141 Coleman -15
It is not an accident, then, that as the number of challenges has increased with each day of the recount, Franken's momentum appears to have stalled out. Very probably, a majority of the challenges are coming from Franken's pile. This is somewhat irrespective of which campaign actually instigates the challenge, since as we suggested yesterday, a potential Franken undervote could be the subject of a challenge from either campaign depending on the initial ruling of the local elections judge.

We can address this phenomenon more systematically by means of a regression analysis. In the regression, we are attempting to predict a variable I've defined as franken_net, which is the net gain by Franken per 10,000 ballots cast in that precinct. The independent variables considered in the regression are as follows:

t: the proportion of the two-way vote received by Franken in the initial count (e.g. excluding votes for third parties)

c_f: the number of challenges initiated by the Franken campaign per 10,000 ballots counted in that precinct

c_c: the number of challenges initiated by the Coleman campaign per 10,000 ballots counted in that precinct

In addition, the regression analysis contains interaction terms between each combination of two variables, as well as an interaction term for all three variables, all of which are statistically significant. The regression is weighted by the square root of the number of ballots cast in that precinct.

The results of the regression are as follows:
franken_net        Coef.     t     P>|t|
t 8.922 2.89 0.004
c_f -0.280 -3.99 0.000
c_c -0.926 -9.82 0.000
t * c_f -0.703 -8.59 0.000
t * c_c +0.565 2.89 0.004
c_f * c_c -0.013 -4.29 0.000
t * c_f * c_c +0.012 2.81 0.005
_constant -3.622 -2.36 0.019
This regression is a bit difficult to interpret, particularly with the presence of all the interaction terms, but the key intuition is as follows. Suppose that the number of challenges is zero -- as will happen once the state canvassing board finishes considering all such challenges in December. In this case, all terms in the regression equation reduce to zero, except for the constant term and t, which is Franken's share of the two-way vote in that precinct. We are thus left with the following:

franken_net = t * 8.922 - 3.622

Now, we can attempt to solve this equation at the statewide level. When we plug in a t of .499956 -- Franken was picked on just slightly very less than half of the ballots during the initial count -- we get a value for franken_net of .837. That is, Franken will gain a net of .837 votes for every 10,000 cast. With a total of 2,885,555 ballots having been recorded in the initial count, this works out to a projected gain of 242 votes for Franken statewide. Since Norm Coleman led by 215 votes in the initial count, this suggests that Franken will win by 27 votes once the recount process is complete (including specifically the adjudication of all challenged ballots).

The error bars on this regression analysis are fairly high, and so even if you buy my analysis, you should not regard Franken as more than a very slight favorite. Nevertheless, there is good reason to believe that the high rate of ballot challenges is in fact hurting Franken disproportionately, and that once such challenges are resolved, Franken stands to gain ground, perhaps enough to let him overtake Coleman.

(Note: it is also possible to build a multivariate regression model that attempts to solve for both Franken and Coleman's totals in an absolute sense, rather than Franken's gain relative to Coleman. This multivariate model produces a slightly more optimistic result for Franken, suggesting that he will gain 254 votes statewide and Coleman will lose 12, producing a net swing of 268 votes toward Franken.)

206 comments

Max said...

FIRST!?! If you are right about this you should put all others out of business.

banditapu said...

First! And very bold Nate, way to keep ahead of the competition.

twister823 said...

I thought I was good at math until I came to this site. Damn you, Nate.

Senator Al Franken. Only in America.

obsessed said...

Does anyone have a timetable for the rest of the counting?

Alex S. said...

I LOVE the details of the calculation. A part of me actually wishes the difference of votes to be even smaller, just one vote if possible. It would be a fitting end to this neverending senate race. But 27 is good enough.

reelgeist said...

I don't buy your analysis. Primarily because you are making assumptions you can not make about intent of the campaigns here. My theory is that you maybe right as to their intent, but there is no way knowing this you can still predict composition because you don't know the nature of challenges made. If you do- can someone please point it out to me where i can find that information- ie, undervote versus no vote. I am going to take the view that we won't know for a while one way or the other. I say 50/50 is right just because we have no idea and I suspect Coleman is ginning the system as you describe while the Franken campaign seems to be applying a different strategy.

Josh said...

Could you possibly provide a graph with the CI for the mean and the prediction intervals?

Ben said...

if you are right, Nate, you should be given an award of some sorts.

The Arse said...

Confidence Intervals?

STepper said...

Very probably, a majority of the challenges are coming from Franken's pile.

I love ya Nate, but this is a TOTAL GUESS.

Do you want to bet $10K that Franken wins? If so, let me know here and I'll send you my e-mail and you can pick up an easy $10K.

Vinny said...

*Head explodes*

Vinny said...

Also, did anyone notice the polls were way off on this one? They all showed either a respectable Coleman lead or a respectable Franken lead, none of the individual polls showed a tie or within the MoE.

zwrite said...

Mr. Silver:

I think you need to put out two reports -- one for geniuses and one for the rest of us.

Through the first several paragraphs and first chart, I completely understood your analysis and was prepared to log off a happy person.

Then, I read the words "regression analysis" and struggled for the rest of the report. Now, I'm going to log off thinking I'm dumb (and I had a nearly perfect score on the Math part of the SAT).

OK, I'm just trying to rationalize that I'm really not dumb.

I'm not sure whether I'm kidding or serious when I advocate two reports.

Shalom,
ZWrite

Loralee said...

Dude, you are blowing my mind right now.

nikip5555 said...

I don't think Nate's math makes any assumptions about anyone's intent. He is simply crunching the numbers we have so far for % of Franken vote and # of challenges from each side, then projecting that trend forward to the point where all challenges are resolved. Of course, there's no guarantee that the numbers so far can perfectly predict the final result ("past performance is not an indication of future results"). But the math itself is pretty generic and does not involve any subjectivity.

Nick Catalano said...

If it gets that close somebody is going to sue.

Oh, and I'm voting for Lizard People next time around.

Joe said...

Wait a sec -- but Franken was *not* picked on just under half the ballots during the initial count. He was picked on something like 42%, no? Wasn't the split something like 42-42-15 (Coleman-Franken-Barkley)?

Would that effect the t value, which is currently listed at .499XXX?


WV: Speric: I wish the people who keep doing word verification definitions would sperically go speric themselves right in the speric.

zwrite said...

obsessed said...
Does anyone have a timetable for the rest of the counting?

The Minneapolis Star-Tribune has a timetable. I don' recall it off the top of my head, but you're going to be obsessed for a while.

There will be counting in December and then it's another week or two before a 5-person board begins looking at challenged ballots.

Shalom,
ZWrite

Mark said...

Well, Nate, I don't understand this, but I hope you're right. I'm not willing to celebrate yet, but we'll see.

STepper said...

Nate's analysis breaks down because the basis for the challenges appear to vary wildly by district and candidate representative. Thus, no generalized statement concerning challenges is valid.

The election is still a "you pick 'em" with a very slight advantage to the candidate who is currently leading (Coleman) -- unfortunately.

wv - reepicks (someone has a sense of humor)

Bill Newsome said...

I have full faith in your projection Nate!

The Obama team should be looking to hire you (and pay you big bucks) if your projection is right!

Vinny said...

Good job, Barkley. Way to pull a Nader and force us into a recount that the Republicans will probably cheat in.

Duluth said...

A. There's a danger in getting too "mathed out" in a process that's so riddled with human error.

B. I think the math here will be thrown WAY off once the campaigns (or the canvass board) start throwing out challenges (thereby counting the votes) in huge blocs as they rule on large trends in challenges.

C. The timetable has all hand recounts wrapping up by Dec 6. The canvass board meets shortly thereafter.

Ron said...

Do you want to bet $10K that Franken wins? If so, let me know here and I'll send you my e-mail and you can pick up an easy $10K.

Nate himself said Franken was a very slight favorite here. He may project a 27 vote win, but there are probably a lot better bets he could make than betting either way on the outcome of this one.

reelgeist said...

The subjectivity is assuming that what he sees in the numbers he has will be what the rest of the sample looks like. To make that assumption is a big one to make unless one is making a point also about gaming the system. I believe Nate to be right. That does not change the fact it's subject no matter how much math he brings to the generic point about what the stuff he covers does actulually say about the numbers he can know.

Joe said...

...as a follow-up to my previous comment, I'm assuming Nate means that Franken was chosen on slightly less than half the ballots **which chose Coleman or Franken**, which makes sense. And would jibe with the calculations being attempted here.

It seems we can ignore other votes in this scenario, because they aren't relevant to this particular analysis.

Makes sense now, and I apologize for questioning the wisdom of Nate Tha Great :)

-- Joe

wv: hoargaur -- Those who hoargaur whine about hoargaur others posting "first" are incredibly annoying hoargaur. Hoargaur, hoargaur.

ahoff48 said...

I would love to believe this analysis, but they are both losing votes, presumably because of challenges. So I don't see how Nate can draw the conclusion that a majority of the challenges are coming from the Franken pile. Certain challenges are frivolous - i.e. the McCain -Franken challenge is frivolous, because the standard is intent as expressed on the ballot, not implicit intent. But we really don't know the full extent of the challenges.

reelgeist said...

By the way- the intent does not have to be to specific intent to game the sytem. It could just be one to make sure that Franken does get votes by Franken's people, and for Coleman's people to do the reverse- to make sure Franken does not get additional vote. This actually makes a kind of psychological sense given the position of the two candidates. His numbers tend to reinforce that, but its still subjective.

Steve said...

52-48, 53-47%

The regression forecast probably translates into about a 52-48% edge or thereabouts for Franken. That is what the high standard errors would seem to indicate.

That said, I would put the probability at least somewhat higher than the results of the regression alone. While there is a speculative element, the numbers pretty much fit what has been going on.

What has been going on is an escalating challenge war from which intent can be derived, though not with certainty. Each campaign started off appearing to take their best shots. I think as the Franken campaign realized the Coleman campaign was being more aggressive, Franken adjusted to be as aggressive.

Once that happened it spiraled, and both sides realized the recount itself would not be sufficient to really clarify much of anything. It would take the canvassing board to do that. Each side's aggressiveness spiraled. In short, the later challenges are almost certainly less meritorious than the earlier ones....which is exactly what the regression indicates.

So whether or not the numbers are exactly right, and it doesn't take much to tip the balance, I think the regression has a high degree of trustworthiness in illuminating the validity of the challenges.

andyhardt said...

I have been reading this site for months and this is the first time I have commented because this is the first time I have had a problem with Nate's methodology. I am not very knowledgeable at statistics but I think that using precincts with no challenges to predict precincts with many challenges is a bad idea. These precincts have a fundamental difference: whether or not ballots were challenged. It could be that Franken will gain votes in these precincts at the same rate as he did in the precincts with no challenges, but just as easily, Coleman could gain votes. I think it is foolish to use precincts with no challenges to predict the behavior of precincts with many challenges because challenges play such a large role in recounts, even if most are frivolous. I think that the best assumption to make at this point is the neutral one: that the precincts with many challenges are as likely to shift towards one candidate as the other, but that they are likely to swing in the same direction, whichever that may be, just as those with no challenges have tended to swing in one direction (towards Franken).

reelgeist said...

Ahoff

It dosn't matter if they are both doing it. It matters to what relative degree are each side so inclined. For Franken it's about gaining votes. For Coleman its about preventing Franken from gaining votes. That's just their relative positions. I agree that one can't say what the final composition is like , and that nt the nubmers here are being pushed to hard. But that does not mean that you can assume equal composition either. I agree its better to wait becasue its not possible to really know. Degree matters here, but we don't know the degree.

As i wrote yesterday, the degree by the time all this overwith only has to be 5 percent in all likelihood. That is the number of votes that are challenged that would result in a Franken vote needs to only be a 5 percent of challenges in Franken's favor to obtain an outcome that results in a Franken win. The large pile of challenges already makes this number possible. This is why the recount is becoming a non factor.

Howie said...

They should have to share the seat. They each get half a vote and share an office.

jt said...

We need to add a bet on InTrade...an over/under on Nate's margin of error. I would take even odds that the final recount is within 10 votes of Franken+27.

jt

Kari Leigh said...

what in the world does Silver eat for breakfast? some kind of super sex nerd energy bars.....

you make my head explode.

in a good way.

Chris said...

"They should have to share the seat. They each get half a vote and share an office."

That sounds like an intriguing sitcom.

humanist said...

e.g. - exempli gratia - "for the sake of the example"
i.e. - id est - "that is"

Substantially: the question remains whether there may not be an independent reason why more Franken-friendly recount positions could be correlated with low incidence of challenges?

JC said...

Dude, go watch some football. You've got too much time on your hands. ;-)

fred said...

This is why we love Nate.

60 Senate seats!

livemild said...

i agree with nate
except my counts give it to franken by 22.

this figure is of course before the rejected absentee decision comes down to allow them.

then i project franken to increase his lead to almost a hundred votes.

in st louis there are 63 more challenges for coleman. that seems like way off the charts. something is going down in the coleman camp.

sherifffruitfly said...

twister823 said

"I thought I was good at math until I came to this site."

You can't minimize the 2-norm of Ax-b?

Oy.

Nick said...

:-D

My heart explodes when politics and Ubernerdism collide.

Are you all familiar with the term "man crush"? Like, when an NFL fan talks about Peyton Manning: he isn't actually _attracted_ to him, but at the same time, there's no doubt that it's love, or at least overwhelming infatuation.

Such is my love for 538. There is nothing better. The only way to more deeply indulge your nerdiness is to go on a thinkgeek/newegg shopping spree.

Flatblade said...

Another factor to consider is that all of the somewhat more inaccurate Eagle voting machines have been counted, which produce more undervotes where there clearly was intent. This should effect the recount numbers in two ways, both bad for Franken.

1) Using the St. Louis County "Eagle Precincts" provides a bad base line for the number of undervotes and the number going to Franken.

2) Since there are no more "Eagle Precincts" there will be a smaller number of fertile Franken undervotes in the last counties counted. I think both of these factors wouldn't account for many votes, but we are talking about <20 votes overall.

sherifffruitfly said...

Assumption challenge:

Why should the regression be *linear*?

For example, maybe it's more plausible to think that a candidate would get more desperate later in the game, and challenge more. Or something of that general sort.

pdb said...

Damn. I am glad you are a democrat, Nate.

Andy JS said...

If the result is Franken by 27, it's obviously going to end up in court. Coleman won't just sit back and accept it.

joel said...

If Franken wins by 27 votes this thing will be tied up in the courts
for months. The republicans won`t sit still losing a seat by a few dozen votes. I still think the senate should refuse to seat anyone and force MN to have another election.

Ari said...

Great analysis. I second the call for confidence intervals.

Another Mike said...

Franken should be senator in odd-numbered years and Coleman in even-numbered years. I'm only half joking.

John K said...

I think the tactic here for Coleman is to come out on top at the end of the recount, no matter what. To this end, his camp is challenging perfectly understandable ballots. Not because they really think they're going to win these challenges, but simply to be ahead when they get to the challenging phase. This is to buttress "popular opinion" that a) he is ahead; b) if he loses the election was 'stolen from him'; and c) to put Franken in a defensive posture. As a lifelong Democrat, I have to say this is where we get beat the the R's hand's down. They just want it more and fight harder and dirtier. If we're going to win this one, we're going to have to jump into the pigslop with them. They ain't giving this one up until it hits the nine folks in black robes in Washington.

Andy JS said...

I don't know if Minnesota is one of those states where the judiciary are not party based. I hope so.

Tim said...

Oh, this math is lovely. let's see if I can boil it down into simpler English for the people who are having trouble.

What Nate is doing here is using the no-challenge precincts to estimate the "clean" recount change - that is to say, what the recount change would look like if there were no challenges. Alternately, we can look at that as the projected result for what happens after all challenges have been properly resolved. It doesn't matter which campaign makes what challenges, because everything will be resolved consistently in the end. We've gotten caught up in thinking about challenges when we need to keep thinking of it as a recount - the question Nate addresses here is whether Franken is going to gain 215+ votes in the recount, not in the challenges.

Another way of looking at things is that it looks like both candidates are losing votes at approximately the same rate, because we're comparing to the original results. If it's true that Franken gets the typical significant Democratic recount boost, we have to add several hundred votes to his count before analyzing what he's lost due to challenges - plus he's also likely gaining some uncontested votes, including the ones from the old machines in St. Louis County. So I agree with Nate that it looks likely that Franken has lost quite a few more votes - maybe even hundreds - to frivolous challenges than Coleman.

It looks pretty good for Al right now, to me.

Jim said...

Marvelous job, Nate. Perhaps you might have said it a little better if you had something resembling the distribution of votes (very much like what you have for the number of Senate seats). That would give some notion of the dispersion. I haven't done the calculations myself, but I would put the standard error around your 27 somewhere in the neighborhood of 50, which probably means that 52-48 is close to a good probability.

Hudson Valley Gooner said...

As someone who does grok the statistics, let me caution everyone that while this is thought-provoking, there are enough assumptions and methodological choices (interacting everything with everything and leaving it all in the model) that it's nothing more than a statistical best guess. And when the best guess turns out to be someone winning an election by 27 votes that tells me that when you factor in the things this model can't account for (correlated interpretations of challenged ballots is the big one), it's basically a toss-up.

The first two paragraphs are where the real insight is at. It bodes well for Franken. But I'm pretty sure that even Nate would caution against reading this as Franken being anything better than 50-50.

fred said...

This site does you make you happy the smart guys are on OUR side!

Buckley was the last smart conservative, Will and Kristol do not hold a candle to Buckley.

Smart people are dems, this bodes very well.

Did you see even Brit Hume have nothing bad to say about Obama? Chris Wallace is becoming a fan, Obama's moderation wil lead to the possibility of a huge dem party going forward. We need to expand the tent, not do wht Bush did and limit the appeal to one side. Win the moderates, run the world!

Michael said...

Nate- This looks like a serious analysis. Two questions

1. Have you tried it excising the few "Eagle' precincts in St. Louis Co. clearly anomalies, and treating them separately? They could screw up the regression coefficients.

2. Have you tried applying this analysis to the 206 hand-audited precincts? (204 non-Eagle) We know to high accuracy what their results will be. Does your regression came close?
/mbw

fred said...

Gooner-

Yes, I think we all know this is a toss up, but that is better than it was two weeks back, and lots of fun to think about. The fun of this post is not the final number, but the detailed attempt to model it.

Coleman does not deserve to be a Senator, Franken maybe does not either, but , heck, Franken for Sanate!

Zach said...

Nate Silver for the Fields Medal!!!

Pavlov's 2nd Dog said...

Nate,

Thanks for the powerball numbers last week. Yuor check is in the mail.

Tim said...

A question occurs to me: could we reduce the error here substantially by doing the same analysis on a precinct level? Do we have that data?

M said...

I'm no statistical whiz, but my reading of this post is that Franken is ABSOLUTELY GUARANTEED to win by EXACTLY 27 votes.

Woohoo! What's that you say? Can't hear you! Lalalalalala lalalalalalala can't hear you!

Pete S said...

Very interesting analysis. But as a fellow nerd I must point out:
1) Linear probability models won't fit very well in those lopsided districts. Why not test Logit? Why not LOR (Log of the Odds Ratio)?
2) Try it with a dummy for 1 or 2 challenges, dummy for 3 or 4 challenges, etc to test the linearity assumption on the response to challenges. If non-linear, might thing about included a squared term for the challenges.
3) Any testing for heteroskedasticity? I'd like to see how the residuals are distributed.

aaron said...

One thing to consider, although you can use standard weighted OLS as a first approximate you really should be performing a generalized linear model with a poisson distribution and log-link,

ABowers said...

LOVE this analysis. Makes a lot of sense to me (even though I only dimly grasp the regression anaysis)because Nate has a great track record.
Remember he was calling Alaska for Begich by 1,900 votes when all the media was engrossed with "what ifs" when Stevens won and Palin did what? Yada,yada,yada.
And Begich won by a landslice of over 3,000 votes. Just like he said.

J. A. Wheeler said...

Daily Kos has some interesting footage on Coleman challenges:

http://www.dailykos.com/storyonly/2008/11/23/11419/809/708/665429

This supports Nate's theory that the recount process encourages spurious challenges. Let's hope that Franken's challenges are more legitimate. Maybe it won't be anywhere near 27.

Scott said...

Great work, Nate,and hope you are on the money...but I have a few questions (first, ditto on publishing the CIs):
I'm wondering whether the Obama Effect (reddish vs. bluish) predictor should also have been added to the regression equation? It could be that liberals are screwier with their ballots (or less screwy), and that the true nature of the effect might have more to do with political stripe than raw vote challenges. In any event, I am not sure I fully understand your intuitive rationale linking fewer challenges as a Franken benefit - if that were really true (and assuming that they bought the logic of your model), come to think of it they shouldn't be challenging at all! But my biggest puzzlement is that your assumptions are based on zero (post hoc) regression terms. Assuming the model is still correct, it is a limited projection, as we still don't know the full trajectory, course and size of the challenges - there are many votes to be counted and every indication of more, not fewer challenges being mounted as you yourself have already indicated: "the number of challenges has increased with each day". Maybe the projected increase itself should be a predictor!
Scott
scottsvine@comcast.net

by thesea said...

This kind of analysis is fun and it helps pass the time while we wait for the recount to finish. The chance of being ballparky right depends on the validity of Nate's intuition. The choice of how to model the problem is art, not science. I for one am hoping that Nate's intuition is right.

KIC said...

Well, I think they should just stick them in thongs and make 'em mud wrestle for it.

cynicismsyndicate said...

I think Tim has a good explanation of what is going on here, and as much as I love Nate's normal analysis, I think the basis for his extrapolation is extremely questionable. I get the caveats, but I would not feel confident making any predictions from this model, confidence intervals or not.

Here's the problem - using the precincts without challenges to estimate with any sort of confidence the precincts with challenges ignores (rather brazenly) the differences between districts with and without challenges. we can not assume the challenge rate in any district to be independent of its count, unless we can show otherwise, and we certainly can't make predictions on the outcome of challenged votes based on the outcomes of unchallenged votes if we acknowledge that they are being judged differently. This model has no way of predicting how challenges will be resolved without assuming that they will be no different than the unchallenged votes. This may be true, but there is not data and very little reason other than convenience to assume they're the same. I'm not saying you can't predict, i just don't think your reducing all other coefficients to zero and plugging in values is a method that holds water. Once the final votes are tallied, run the same regression with the new vote data. I'd be shocked if those coefficients were not significantly different from zero. Taken in combination with high error, I'd take a pass on making the prediction. But then again, you run an amazing and informative website, and perhaps know more than I do.

njbunker said...

This is great. Just learned about interaction terms in my stats class. Nate, can you do my intermediate micro homework for me? Public goods are a bitch.

icebergslim said...

Nate,

If you end up being RIGHT, the other pollsters should go out of BUSINESS.

ice

Holy Stokes! said...

I've been coming to this site for quite a while, and Nate's latest post is exactly what I both love and hate about his analysis. Obviously, this is a lot more effort than most others are willing to go through to analyze elections. Still, quite a bit of the analysis is arbitrary and important aspects like error are significantly underreported. Why is a linear regression used? What is the chi-square value for this regression? What exactly is the error value for this analysis? (All that was said is that it's "very high"-- that's not good enough for me.) Using the regression and error values, what is the most important number-- the probability that Franken or Coleman wins? What are some factors that may explain the supposed correlation between the number of challenges and the number of votes Franken picks up? (Identifying such factors might lead to a more accurate and detailed analysis.)

I'm quite impressed by the effort, but I think it's fairly evident that this race is essentially a coin flip, or perhaps leans slightly in Coleman's favor. Nate's greatest weakness is not knowing when to acknowledge that very little further information can be gleaned from a thorough statistical analysis.

Nick said...

This is GREAT news... for John McCain!

OldFatGuy said...

Well, I just went through the Star Tribune site county by county and looked at it a bit differently, because I think we can make a fairly safe assumption about the vast majority (about 2/3 of them). The clues are right there in the table.

For example, in Benton county, Franken's total is 24 lower than the machine total. And Coleman has 25 challenges. Conversely, Coleman's total is 25 lower than the machine total from election day, and Franken has 31 challenges. Carlton is another good example. There Coleman's total is 10 fewer than originally, while Franken's challenges number 11, and Franken's total is 19 fewer than originally and Coleman issued 23 challenges. See the connection??

We can infer from good evidence (the original machine count plus how well the number of challenged votes by the opponent matches up with the number of fewer votes received in the recount) that most of these challenges are one campaign challenging the validity of the other's vote received.

The assumption I'm making is the machines do a pretty good job (they do) and the explanation for the lost votes is Coleman challenging Franken votes and Franken challenging Coleman votes.

So, from there, I went county by county and put these challenges BACK IN, so to speak. In doing this county by county, I would revert back to the machine count as best I could using the number of challenges. So, if a county was showing a 10 vote lower total for Franken and Coleman had made 12 challenges, I gave Franken those 10 votes back, and left Coleman with 2 challenges. If a county had a Franken recount total of 10 votes fewer and Coleman only challenged 7, then I put back in those 7, leaving Coleman with 0 challenges, and leaving Franken with -3 net for that county. I of course repeated this for Coleman votes and Franken challenges.

The result is disappointing to say the least, but I really don't think the assumptions I'm making are that bad. When I did this, going county by county (with a paper and pencil so a math mistake was possible), the end result was Coleman losing 4 votes and Franken losing 7 votes, a net gain of 3 for Coleman.

As for challenges left after this method, this would leave Coleman with 355 challenges and Franken with 291.

So, if the assumption that most of the challenges are as I described above, the bottom line is for Franken to win his challenges would need to be exponentially better than Coleman's to make up 200 votes.

Now I'll calmly and gladly wait for someone to tell me why I'm so wrong. (That's not sarcasm, I'm always looking to learn something.)

Constantine Gustafesto said...

I hope you are right and that Franken wins, and I have enjoyed your analysis and reading your site throughout this election season.

Still, this particular post brings to mind Mark Twain's reaction to the news that the Mississippi was shortening its course by 1 1/3 miles a year.

He observed that, by extension, that meant that the Mississippi must have been 1.3 million miles long a million years ago, sticking out over the Gulf of Mexico like a fishing rod, and concluded, "There's something fascinating about science. One gets such wholesale return of conjecture out of such a trifling investment of fact."

Mylegacy said...

I've read Nate's article and I've read the foregoing responses of the "nerds" to Nate's postulation.

I've come to two conclusions: 1) My brain hurts. 2) I'll bet Nate several zillion dollars (US, Canadian or Aussie - his choice), and give Nate 1,000 to 1 odds, that the Great Comedian does NOT win by exactly 27 votes.

And thirdly, ya I know I'd only come to two conclusions - but hey I changed my mind - I now know FOR SURE why I stopped taking math like stuff after I'd finished my Algebra, Trig and Geometry.

And fourthly, my brain still hurts. I need a drink - I'll make it a double.

Tim said...

The correct answer for AL is 42 - Deep Thought

oct said...

Where are Pete Kent and Mule Rider to blog on the eminent Fall of the 3rd Reich.

Russ Martocci said...

Well, this is a fascinating examination of election result likelihoods for the Franken / Coleman race and what a terriffic job Nate does at this blog, in general. Hopefully I'm (we're) getting some education in understanding statistical analysis.

I like to recall that Nate had Franken a slight favorite on election day. I think very likely that predicted result is the sort of outcome we will eventualy see in this race (if they do count all the votes); a narrow Franken victory which reflects the pre-election polling.

If the Republican party had a trick up their shirt cuff for Minnesota, it was probably exactly what they tell everyone that they do: Voter Supression. Republicans win when people can't vote, so they try to supress the Democrat vote.

My guess is that they disqualified votes where ever they could. Just like they always do. Just like they say they do. Based on the Republican track record, there will probably turn out to be more Franken votes in the disqualified pile than there will be Coleman votes.

So, what about Georgia? How's the early voting going in Georgia? What does the polling say about Georgia? With all this time to vote early, shouldn't the Democrats manage to get ALL their voters to return to the polls?

C'mon Nate! What's up with Georgia?

Tom said...

Ok, I admit my eyes glazed over a bit on the regression analysis (MA in History, not Math of any kind!) but I have to say you've put your prognosticating neck on the block as boldly as I've ever seen.

If you're right, I feel we can do away with elections and just let your models pick the winners from here on out!

Mark Ludwick said...

OldFatGuy -
I was also looking at the correlation between the number of votes a candidate has "lost" and the number of challenges made by the other candidate. The data certainly suggest that a higher percentage of Franken's challenges are causing Coleman's vote count to drop than vice versa. If this suggestion is true, it is terrible news for Franken.

Mike said...

I think the real issue here--and something that people don't want to accept--is any polling methodology still has error bars around it. In this case, it's discrepancies in optical scanning. We're not even talking about ballots that scanned fine that were not the intent of the voter.

But the issue afoot in Minnesota may be simply that the difference in votes between the candidates is undetectable.

walt526 said...

I was also looking at the correlation between the number of votes a candidate has "lost" and the number of challenges made by the other candidate. The data certainly suggest that a higher percentage of Franken's challenges are causing Coleman's vote count to drop than vice versa. If this suggestion is true, it is terrible news for Franken.

FWIW, that's the conclusion that I came to as well when I looked at the tables last night. Obviously Nate has reasons for making his assumptions for the model that he described, but intentionally or not they seem to have the effect of biasing his analysis in favor of Franken.

tmess2 said...

For those folks using the MST county numbers for your analysis:

you really really need to go to the SOS site and pull up the breakdown by precinct within the counties.

Challenges are done at the precinct level. Both sides have challenges in precincts in which the other side actually gained votes. Both sides have precincts in which they lost votes without any challenges from the other side.

As to Nate's estimates, I am not sure that in a recount the population of precincts is large enough to assume that a sample of precincts allows you to model all precincts from that sample, especially when the choice of precincts is not random and there is no way to tell if it is representative. That seems to be the major flaw in the estimate.

Greg said...

So 2 Eagle precinct had no challenges? How many Eagle precincts are there in all? What happens to the results when you take them all from the analysis and treat them separately?

And someone up there had the excellent idea of looking at the residuals. Back in grad school I used to playing with mapping the residuals of regressions of Canadian election results. There would be clear, sometimes counter-intuitive, patterns to the maps of the residuals. The maps would point me at variables I hadn't thought to look at. I'd love to see the residuals of this analysis mapped.

AxmxZ said...

If Franken really does win by 27, you'll be burned for a witch, you know.

wv: coilike - distinctly fishy

hill.tops said...

Colbert XMas Special TONIGHT on Comedy Central at 10:00 PM Eastern

Phoneranger said...

amazing how your post moved the odds on Intrade towards Franken...

should I sell my C and F on the bailout or hold'em?

kellysirkus said...

Hard to argue or doubt you Nate -- if you pull this one off, you are King!

hill.tops said...

Nate, can we just put this in God's hands, like Palins says?

Math and science make me nervous.

Bone said...

I WANT TO BELIEVE!!!

coolstar said...

This is a perfect example of what is known in the trade as "over-interpreting the data". Plus, the phrase "the error bars on this regression analysis are fairly high" is completely meaningless UNLESS THOSE ERROR BARS ARE QUOTED.
(my guess: between 50-150 votes minimum) Might as well cast chicken bones or throw the I Ching rather than ginning up an analysis to make it appear "scientific". Total informational content of Silver's last post: "it's a tossup". Gee, I've known that since Nov. 5.
Sometimes the honorable thing to do is just to say "there's not enough data".

David said...

Can you post a 95% confidence interval for the predicted value? That would give a much better picture of what the likely outcomes are

KAP said...

Brilliant as usual. I'll sleep better tonight.

William Land said...

I'm following the recount closely.
Thanks for all of these comments!
Does anyone know when the 3 remaining
House elections will be called?
CA, VA, and OH.

Gordon Sewer said...

Nate = Captain Tangibles

Malcolm Boon said...

Hi guys,

I'm trying to start a somewhat constructive and intelligent political/policy debate forum. I've been coming here for months (not an avid poster, though), so I know you guys tend to be more reasonable and even-handed than most of the people posting all over the internet. If you're interested, the site is http://policy.getofftheinternet.net

Greg said...

Coolstar, this is what is called "having fun with the data."

ralph058 said...

Why should multivariate regression analysis frighten you? People do the same thing mentally all the time. Just like the guys that do Sudoku do multivariate simultaneous equations.

Anuradha said...

I hope Nate is correct, however as I see it, the REAL decider is going to be the decision regarding the rejected absentees that have not been counted (for technical reasons). I believe there are several thousand of those, largely from Franken-leaning counties, and even a 10% acceptance rate on review would dwarf the current deficit in Franken votes.

apeescape said...

As commented by others in this thread, the residuals of the model need to be reported if one wants to make inference. All the covariates of the model are significant, but significance of this kind can be easily achieved even with noisy data. For example, if you were to repeat the regression using the same data but repeated over and over, you will get an even more significant result without any new information. Since this looks like a simple linear regression, metrics like R^2 can be easily calculated to justify (or unjustify) your model. Also I'm worried about the correlations between the covariates since we have all these interaction terms that could result in over-fitting.

Otherwise, good work Nate. I always enjoy your efforts.

Cugel said...
This post has been removed by the author.
Cugel said...

I think people probably just blipped over what Gooner said because they didn't understand it, but it's important:

"And when the best guess turns out to be someone winning an election by 27 votes that tells me that when you factor in the things this model can't account for (correlated interpretations of challenged ballots is the big one), it's basically a toss-up."

Did you catch the key phrase, "correlated interpretations of challenged ballots?"

That means that most of the challenged ballots will fall into a few categories, and all similar ballots will all be counted the same way. They'll ALL be either included or excluded. Rather than making an INDIVIDUAL determination about each ballot, the judges will develop rules, and all ballots that fall under a certain rule will be handled a certain way:


Ex: Some voters don't fill the oval in completely or they use a red pen or something. It's faint, so the machine can't read it. All such ballots will probably be handled the same way. So, if there are 100 such ballots, they ALL count - or NOT count. They don't get wishy-washy and say "this oval is filled in slightly more than this one, so we count one and don't count the other arbitrarily.

They say "incomplete ballot" "We've decided to count all these (or NOT count all these."

The initial rulings the ultimate canvass judges make about how to handle certain types of ballot challenges will go a long way towards determining who wins.

weinerdog said...

Some more 'disputed' ballots. My favorite is #3.
You decide

MNLatteLiberal said...

@OldFatGuy et al,
As someone who observed the ballot recount for several days, I can point to one gaping error in your analysis: missing ballots.

I already posted a couple of days ago about just two precincts in the otherwise totally anally retentive Woodbury missing 27 ballots. "Missing" is a misnomer because the machine count is not set in stone, and machine error happens. But regardless of the reason, there you have two instances in a very affluent and elite area where a significant number of ballots go missing and are never found.

There are similar occurrences in other counties that I only read about. Add to that the mis-sorted duplicates, Eagle machine problems and you pretty much eclipse the 206 pre-hand-count delta. Yes, I am excluding the 9 from the random hand count of the 206 precincts.

I am sure there are other reason for why your method falls short, but my brain hurts after all the Nathematics in this thread.

~Latte

wv impesson: the act of conversion into the Lizard/Fish People

TP said...

Nate:

In precincts that have challenges we can't observe the true franken_net (the dependent variable), so how can we regress against it?

John said...

I am confused. I thought the hope for Franken was the ballots that showed undervotes -- and the hope that they would break 52%-48% for Franken. In the discussion I hear a lot about ballots that were already counted by machine as actual votes, but not much about the undervotes.

I guess what I am saying is that I would expect the machine counted ballots to not change much in the hand count, but for there to be more votes counted in the end because some of the ballots that the machines showed as undervotes would have a voter intent indicated.

Lisa said...

FYI, MN law does not contemplate runoffs. So, while a loser (Coleman?) would most assuredly sue, a runoff or re-run cannot be mandated under MN law. If you win by one vote ... you win.

OldFatGuy said...

Hi guys, this time I did my "analysis" using precinct level data rather than county wide (from the Tribune site), and while the results are MUCH MUCH MUCH better for Franken, I still see him falling short in this thing unless his challenges are somehow WAY WAY WAY better than Colemans.

I used the same assumptions I posted about above, but at the precinct level this time. And BTW, it became even more clear about the correlation between Candidate A's lower recount total and Candidate B's challenges. To me, this confirms my original assumption.

These votes were counted by the machine, they were counted and ruled on AGAIN by a person, and for whatever reason one or the other candidate challenges it. I believe these will be counted as originally counted on probably 99.999% of the time. So, I went ahead and put these "challenged" ballots "back in" the running total.

At the end of the process this time it looks better than above, with Coleman losing a net of 24 votes and Franken gaining a net of 34, for a total net gain of 58. The resulting numbers are challenges left then (a majority of these would almost certainly be votes that were originally registered as "undervotes", since most of the lost vote challenges I'm assuming won't stand up anyway. So the new total of challenges is Coleman 302 and Franken 295.

So, with over half the vote, assuming challenges to twice counted votes fail, we're left with Franken gaining 58 votes and both campaigns with roughly 300 challenges. I'm sorry, I don't see how Franken gets there. Man, am I sorry. I really, really, really can't stand that fake SOB Coleman.

BTW, I'm guessing I'll make this my last post. For some reason, EVERYTIME I click on 538.com and click on Post a Comment and try to sign in as a blogger, it never lets me. I ALWAYS have to go through the reset password process. Every single time. Don't know what that's about, but the good news is no more long winded blowhards from me.

Hope Martin creams the other asshole in December, but I think that's MUCH less likely than a Franken win.

David V said...

Hey, Nate... was wondering how this projection would look if done using Random Forest regresssion... which does not overfit.

If you're not familiar with it, google for Leo Breiman and Random Forest for a description... there's a package for randomForest in the free stat software R, too... Or if you want to send me the raw data, I can do it in about 5 minutes. Fun! dlvanbrunt-at-gmail-dot-com

eve said...

This is my favorite kind of post on this site. Nate gives us stats and analysis that is way beyond what I learned in my measly stat classes. And then people who do understand it (or not) debate his reasoning and methodology.
It's all really interesting PLUS we get a prediction on this uber long recount. yay

My math prediction is that if either of these guys wins by 27 votes, the other will take it to court.

moondancer said...

Why does this smell like something that's about to get very judicial, and very ugly...

coolstar said...

In reply to Greg:
I agree (and I've done many similar things myself). The danger lies in the fact that the innumerate treat Nate Silver like a god and will take an analysis like this seriously.

George In Florida said...

Nate:

You gotta chil out, man. Take a break!!!

You've been doing this election analysis too long. So, take a break, get a good bottle of scotch, a woman (oh, I forgot, us math nerds don't get women) and forget about number crunching for a while.

You're trying to overanalyse this.

WV: frain You should frain from doing anymore math.

Charles said...

Dude, Nate... if you get this right, and Franken wins. OMG you will be THE POLLSTER of RECORD of the USA. Stay cool, don't let the success get to your head. All the best man...

sdf said...

John K said...
I think the tactic here for Coleman is to come out on top at the end of the recount, no matter what. To this end, his camp is challenging perfectly understandable ballots. Not because they really think they're going to win these challenges, but simply to be ahead when they get to the challenging phase. This is to buttress "popular opinion" that a) he is ahead; b) if he loses the election was 'stolen from him'; and c) to put Franken in a defensive posture.


I think there is absolutely no doubt that this is exactly what is going on. What we are about to see, as of the end of the recount, at which point Coleman will probably be in the same general ballpark of a lead as he is now, is a growing chorus during the settling of challenged ballots from the Coleman side. Ritchie will undoubtedly be slimed and slimed again if, as Nate predicts, more of the challenges are resolved in favor of Franken. We may not have teh wonderful pleasure of a Brooks Brothers riot, but we will have clear accusations that the challenge process has been gamed for Franken to win. That is something that I need have no knowledge of statistics to say with absolute certainty.

Pragmatus said...

I still say--Franken wins by under 200 votes.

Ryan said...

Al Franken in the United States Senate would be the best thing ever.

Henry said...

FACTUAL PROBLEM - I think it is likelier that I don't understand than Nate has made a mistake BUT if you go to the Star Trib breakdown I DON'T SEE ANY COUNTIES WITH ZERO CHALLENGES IN WHICH EITHER CANDIDATE HAS A NET GAIN - Almost all the 0 challenge counties have not yet counted the votes. I hope I'm missing something. HELP!!! I want to believe. Nate starts out by talking about counties with no challenges and how well Franken does - what am I misisng?

KWRegan said...

Ad coolstar and Greg and others caveating the methodology: It may be possible to buttress the conclusion by carrying further the analysis of challenges which OldFatGuy did here (after he and I discussed the same idea at county level).

Actually, in my view OldFatGuy's posts already lend support, provided he did not already include the gains noted by Nate from precincts with 0 (or 1 or 2) challenges. His posts corroborate each other insofar as one gets Coleman +3 but making 64 more challenges than Franken, the other Franken +58 but near-equal challenge numbers. He termed his net disappointing, but if it's largely independent of Nate's net of +28 (0-challenge) plus some good portion of 31+32 (1,2-challenge precincts), then that would show about 120. Then it's easier to envision that precincts missed by this analysis plus the 32% of the vote still to come would push the net into th 200s.

To attempt to carry this further, one could take Nate's conclusion above as a new a-priori hypothesis, and use this to calculate estimates of how many of OldFatGuy's leftover 302 Coleman challenges are on ballots initially ruled for Framken, and how many of Franken's 295 remaining challenges would be on ballots for Coleman. Then go back and see how these estimates can be squared back with the data in precincts these leftovers come from. This might revise the prior in a way that is favorable---or unfavorable---for Franken, thus supporting the optimistic conclusion---or not. (Can someone better versed in Bayesian methods tell if this makes sense?)

Regardless of the stat arguments, Nate's eagle-eyed observation that votes in counties with few challenges break nicely for Franken is important in itself. That datum could after all have broken for Coleman---and unlike other behavioral caveats noted by commenters, I don't see much human contingency there. Also regardless, these considerations will derive predictions of "frivolous" challenge numbers, and nailing those figures would be even more a feather in his cap. Also for generating arguments over stat methods (which kind-of professionally justifies time spent here:), this is great stuff.

Jamie R said...

Say what????? Talk about angles on pin heads. Let's wait until reality catches up.

TheOpinionGuy said...

Nate, you should try a Negative Binomial model instead. By considering each vote as the outcome in an arrival process that is non-negative in nature (which it is), you should get a better estimate.

What does NB tell you?

reelgeist said...

SDF

No one should be confused about the politics here.

Coleman's game plan was set the minute they announced a mandatory recount: delegitimize the democratic process. A legitimized process meant that Coleman may risk a lose. It's part of the basic moden GOP playbook. It's why they needed to be defeated as a party.

I look forward one day to an ethical somewhat centrist party. But that's not the Republican party nationally.

You could see it both in Coleman's own actions (asking Franken to give up) and those of right wing surrogates. You could see that in claims that the legitimacy of even having a recouunt at all as a process is "stealing" the election.

Franken has mostly played it smart by saying what Al Gore didnt in 2000: let all votes count. Don't even get into the game of legitimization.

In GOP leadership's world view, up is down, down is up. Free speech is free speech if it is without critique. If you criticize them, you are squashing their speech (something Sarah Palin said about reporters criticizing her for her negative campaigning).

In the gay marriage protests after the vote on Prop 8, mob rule were the protestors, but not the definition used by the founder in which the bare majority controls the rights of the minority. Equal protection is thrown on its head to mean whatever the majority wants.

Up is down. Black is white.

It's the GOP playbook.

Not sure why anyone here would be surprised that Coleman is using that playbook. I am not sure what kind of Senator that Franken will be, but the MN voters should stepped in it by even making this close.

The fact is however lackluster Franken maybe to some- he still mostly plays by the the rulebook of fairness and legitimate democratic processes even as he tries to game it for his favor.

I have not seen Franken's camp engage in the efforts to delegitimize the proccess as a whole that has come out of the Coleman. That again should tell you something.

Its sad that this happens again and again, and we are shocked, but again, the playbook is pretty old. This is not surprising to anyone who has followed politics.

oct said...

Ritchie will undoubtedly be slimed and slimed again if, as Nate predicts, more of the challenges are resolved in favor of Franken.

All Ritchie needs to do is publish photcopies of all the challenged ballots to show what an asshole the Coleman campaign was. Then he can counter-smear the dumbasses. I honestly don't know why Coleman is trying to use this strategy it is like Enron bookkeeping. You lose in the end.

UtahGamer said...

I wish Nate's analysis was valid. Alas, I worry it isn't. The regression equation is developed based on precinct level data, and I have no problem with it (I am assuming that examination for heteroscedascity of the residuals was done, collinarity was checked, etc.).

My key problem is the final (zero contested ballot) equation should not be applied with the state level numbers. The proper way to apply that equation is to use it for each precinct to see how many votes are shift at each precinct. Then, the changes from the precincts are summed to yield the statewide result.

Another intriguing approach would be to use a hierarchical analysis, taking into account county level variables. Political parties are organized from the national, to state, to county to precinct level (with some intermediate levels occasionally found). The county party officials may well have created a common approach within that county for determining contested ballots which could vary from the statewide guidelines. Therefore, precincts within counties are probably acting more similar to each other compared to precincts in other counties, and not necessarily due to similarity of politics.

An Hierarchical Linear Model approach might yield superior results.
http://en.wikipedia.org/wiki/Hierarchical_linear_model

heliotrope said...

I voted for Coleman even though I'm not a fan of Republican's and he is probably a prick. It was just a spur of the moment decision in the booth, brought on by an aversion to Al Franken. As I did it, I thought, "sure, now it'll come down to one vote."

My vote.

I'm sorry. I never thought it could matter.

Jersey said...

UtahGamer: do your calculations, then let us know how your numbers come out. Still Franken favored to win?

bobnsj said...

Nate's prediction of Franken by 27 has made the top of HuffPo

http://www.huffingtonpost.com/

DirkGently said...

This theory would preclude it seems, the potential that the Coleman challengers are informed actors when it comes to choosing those districts in which to focus their challenging activity.

It seems evident that Franken intends to raise fewer challenges - therefore the degree to which their actions are informed is less important.

The deviation between this projection and the actual outcome, may be in part attributable to useful information possessed by the Coleman camp - specifically in which districts confused voters are more likely to be Franken voters. By disqualifying a few more votes in Franken districts, Coleman could manufactor a lead.

Is it the case that every challenged vote will be resolved? Surely in some cases, a ballot is sufficiently spoiled as to be uncountable. Coleman has to be trying to increase the number of spoiled ballots in Franken districts.

Bob X said...

"For some reason, EVERYTIME I click on 538.com and click on Post a Comment and try to sign in as a blogger, it never lets me. I ALWAYS have to go through the reset password process. Every single time. Don't know what that's about"
I commonly get told I have the wrong password (I have never changed the password) and sent to the original sign-up, where I enter my e-mail, twice, my password, twice, my signin name (all exactly the same, every time), click the terms of service and enter the word-verify --- and this sends me back to the original sign-up screen, except that it is only my password that I have to enter again, twice, and a new word-verify --- then I can post, except when it wants me to do it a third time.

I stick to my prediction based on the crudest of all projections from the very earliest of the data, Franken -60 before the challenges are resolved but +250 afterwards. Do I get some kind of prize if my guess is better than Nate's?

Bob X said...

Henry said "FACTUAL PROBLEM ... if you go to the Star Trib breakdown I DON'T SEE ANY COUNTIES WITH ZERO CHALLENGES IN WHICH EITHER CANDIDATE HAS A NET GAIN "
I believe Nate is looking at PRECINCT BY PRECINCT data: there are more individual precincts where zero challenges happened than entire counties.

BeanoCook said...

LOL!!!

Nate still has not explained exactly why it is assumed Franken voters either undervoted or overvoted. I just want to hear him admit Democrats are typically less educated and in general dumber than Republicans.

Chuck said...

Math is not required here. If there are precincts where the Coleman challenger is just being a belligerent jerk then the number of challenges goes up and Franken doesn't get very many votes in this phase of the process.

This is part of a very deliberate strategy to politicize the final process. By challenging every Franken vote now, the Coleman camp insures that a fair final judgement phase will result in a lop-sided surge favoring Franken.

The Coleman camp can then claim that the final phase by a board of public figures was unfair and they will then challenge in court, etc.

This only occured after the Coleman campaign saw how the undervote was breaking. The descision was made to challenge everything now and drive the weekend message in the media that Franken was losing and then when it turns around later they will claim fraud.

Nate - This isn't math, it's Rovian politics.

shiloh said...

Hey Beano, you predicted Notre Dame quarterback Ron Powlus would win (3) Heisman's ;)

btw, trust me, there are just as many stupid dems as reps, Lizard People notwithstanding, which is why I'm an independent! :)

Sorry, the set up was perfect and had to kick the ball through lol

take care, blessings

David W. said...

Let's remember that both sides are playing hardball with the ballot challenges. There were a couple of ballot challenges Franken's side made based upon the fact the voter left identifying information. For example, there's thumbprint on one ballot, and another, the voter signed the ballot.

Both ballots appear to vote for Colman, but there's a state law that says a ballot is invalid if it has personal identifying information. But, if you don't challenge a ballot, there's no possibility that it will be thrown out.

I like Nate's analysis, not because I am a Democrat, but because it was an excellent analysis. However, he should never have drawn a definite conclusion like Franken by 29 votes. The error bars are just too big to make any sense at this point.

At this point, I'd say the election is an absolute tie, and if anyone wins, it is due to luck rather than a mandate by the voters.

Robert Green said...

is this a david rees* parody of a 538 post or is it serious? it's getting harder to tell lately.

either way, this is bad news for the democrats.

*www.mnftiu.cc

Kevin said...

anyone who has taken elementary statistics can understand what nate is doing. it's nothing advanced. not trying to sound like a smart ass.

i'm just merely reinforcing that his methodology is sound.

- kl

David said...

That sounds like a very dubious conclusion to me. Face it, there are lots of ways you could interpret the data, and the people issuing challenges are not doing so at random. You came up with the conclusion you were hoping for (consciously or unconsciously) and you have exactly no historical analogues to base your analysis on.

Michael said...
This post has been removed by the author.
Michael said...

I still think that it makes sense to test any model against the 206 precincts where we essentially know the answer. That's much safer, if statistically noisy, than assuming linearity over precincts with wildly varying challenge rates.

On the question of how random the actual votes themselves are: yes, they are already much more random than any errors in the MN counting. What are the odds that a voter doesn't show up because of a flat tire, an illness in the family, an unforeseen work crisis, etc? I'll bet significantly above 1%. Out of 3M voters, that means a random term of about +/- 200 or more. It looks like the MN process will be more accurate at counting than that.

Bottom line: Even if the outcome here is a disaster, we've seen a really excellent voting system in action, one that could be used elsewhere. Of course, it won't work as well in FL but still it's nice to see a counting system that's less random than the input.
/mbw

WV: 'terse', but perhaps I wasn't

Clark said...

Ironically, independent verification of your prediction obtains from an error in your presentation, to wit: "The independent variables considered in the regression are as follows:
t: the proportion of the two-way vote received by Franken in the initial count (e.g. excluding votes for third parties)" You meant to use "i.e." not "e.g." I.e. converts to 95 in the simple (English) alphabetic conversion to numerics (i->9, e->5; 9+5=14; factors of 14 are 2 and 7 (27) [27->BG, By God]. Well done!

susan said...

Reelgeist: tells it like it is.

Howie: I have a new proposed WV:
buttock - one gets the but and the other gets the tock (maybe the butt and ock?)

Agree about Fields medal, just for fun.

I think blogger requires a second entry the first time one comes on; at least it always does for me.

The statistics is all way over my head, but fascinating and great if Franken wins. Site appears to be seriously weeding out all the people like me who find it too much.

susan said...

this time I only had to verify once - so it's not the rule, anh!

sean said...

Go Franken!

http://www.HopeWon.com

d.K. said...

You need to add a "donate to 538" button on this site. If this pans out, you'll be indispensable, and lots of us would be willing to chuck in $25 or so...

I'm dead serious.

cher said...

Comment section is great but seems that this is still not decided by the entire team, just Nate. I saw it here first then went to Huff Po and they had put it on their front page. Tomorrow it will be a buzz... Nate Silver says. However, the comment section here doesn't seem to agree. I always learn a great deal and am going to look up 'heteroscedascity' and see how I can work it into my next conversation.

Mark Grebner said...

I hope Nate's right, but I find myself pondering a contrarian view: in 2233 precincts where nobody screwed around with challenges, only 40 (net) votes moved. If that represents the reality we'd find if we removed all the nonsense of challenges, it would be very bad news for Franken.

Is there evidence that in precincts where one or more ballots was mis-read by the scanner - and therefor the recount will change the tally - that challenges are more common? If the number of challenges is not strongly correlated with the number of mis-counted ballots, a very different (and bleak) understanding arises.

To-wit: there's a huge battle between two equally matched armies. On one end of the battlefield, Franken is consistently making slight, slow, progress. On the other end, there's a huge cloud of smoke and dust obscuring the battle, but neither side is really moving forward or retreating very much. And since Franken needs to WIN the battle, a near stalemate will amount to a loss.

The problem with Nate's analysis is simply that we don't know the causal relations among machine miscounting and the challenge strategies of each campaign. Without knowing that, assuming statistical independence is likely to be a mistake.

apeescape said...

I second the hierarchical analysis here. In addition to providing more informed predictions, it will show which precinct is more important in contesting the ballots. Looks like OldFatGuy has already done this with meeeh results for Franken. Hopefully he can pull it off.

BTW, Andrew Gelman is the God at this kind of stuff.

BassoProfundo said...

It all seems quite logical. I had no idea that Franken was making bigger gains in precincts with no challenges (or one challenge). I guess the excessive challenging is serving its purpose - to muddy the waters, and insinuate that the election is not aboveboard.

I think it would behoove the canvassing board to post every single challenged ballot, along with their ultimate decision, especially if Franken winds up with a narrow victory.

Voice of Reason said...

Go over to The Hedgehog Report and read all the comments from the wingnuts about Nates analysis on this. Nothing funnier than these losers bashing the guy who was right on election day while they were all predicting a "close McCain victory!"

I was the one who posted the $700,000 comment on Nates bookdeal. I love rubbing that into this Polaris guys face who posts there and who they all worship.

Here's a link:

http://www.hedgehogreport.com/?p=8874

interstices said...

My initial reaction when reading this was that the precincts with little or no challenge may not be representative of the state as a whole, and certainly not a random sample.

But I also find interesting and perhaps persuasive that the challenges escalated day-by-day. So rather than a Franken or Coleman observer squeezing off a shot or two at a questionable ballot, like a sniper, we are now in a Scarface phase. And if you assume that both sides are doing about the same thing in their greatly increased challenges, then those should cancel out and the underlying trends we can see with those 206 precincts will emerge.

A political and spatial profile of which precincts and number of ballot challenges might make for interesting examination...

piranha said...

@BeanoCook

democratic voters aren't any more stupid than republican voters, but they might easily be less educated, less experienced with voting, less fluent in english, elderly, or disabled -- suffering from vision/neurological problems. the democratic party attracts more underprivileged and marginalized people than the republican party, because those people feel that at least the democrats care a little bit about them.

and each of their votes should be counted just as the votes should be counted of those 23% of texans who think obama is a muslim while at the same time believing his christian pastor made him hate america.

intelligence and knowledge are no requirement for voting, and shouldn't be, because such people are just as affected by political decisions.

Lauri said...

Nate,

You make checking the polls fun.

Thank you.

Lauri

Down to the Wire Designs said...

It seems likely to me that the Coleman team increased the number of vote challenges as a way of delegitimizing a possible Franken victory.

As Franken gained votes in what were largely Republican areas, and the situation began to look dire, the Coleman campaign began to issue more challenges against the kinds of votes that they had earlier let pass.

This would have the effect of making Franken's gains diminish as the voting went on, and it would look like Coleman had begun to hold his own or even pick up votes. While many of these challenged votes will likely be awarded to Franken in the end, the strategy is to stop the media from reporting daily gains for Franken and finish the recount with the lead. Then, if most of the challenges go Franken's way and he wins the election, they can claim the result is unfair or rigged.

Dan said...

yeah where are the confidence intervals? i'm infering from your analysis that your finding isn't significant (overlaps 0).

loya said...

I don't understand , but I have a feeling that this is BS

bm said...

Hi Nate.
I agree a nice and controversial - but quite long shot. Please provide the confidence intervals on this + please tell us why you're entering all the interactions.

More substantively: you pointed out earlier that the undervote / novote realtion might be a function of voting equipment. I would love to see a variable included that allows to distinguish districts based on the technology. Maybe challenges correlate with equipment ...

either way: real fun to read! thx!

Clarke Bustard said...

@Pete S: Heteroskedasticity is my new favorite word. Don't tell me what it means. I'd rather imagine that it's the place that straight geeks call home.

Keith said...

I'd love a confidence interval as well.

There is still a chance that we will enter the real "Al Franken decade"

zappa24 said...

Clarke, you might want to skip reading this post if you want to keep heteroskedasticity a mystery...











Heteroskedasticity breaks one of the assumptions on Ordinary Least Squares regressions, namely the one where the variance is constant. There is no effect on the estimate (in this case the 27). Instead, the effect is that the standard error is underestimated. The end result of heteroskedasticity is to often add statistical significance to a variable. This could lead to putting a variable into a model that according to the statistical measures doesn't belong (in other words, it is still LUE but it may not be BLUE if there is heteroskedasticity.)

Clarke, if you are still reading this, I opened up some new mysterious things for you. I admit that LUE and BLUE are not quite as fun as heteroskedasticity.

Herunar said...

Nate's math isn't really that impressive - university level where I live. It's his clear analysis and temperament that makes this blog great. As for this prediction, as Nate said, don't take it too seriously. While Franken does have a higher chance than most pundits out there recognize, he could still very well lose, given the large number of assumptions and estimates and the small difference of number of votes.

adrian mckinty said...

Hmmmm

I want to believe too, but I think your approach is a bit like applying a regression analysis to an apple pie baking contest. We just dont know what the judges of the disputed ballots are going to do - it's going to come down to a lot of discrete human decisions and thats never that predictable.

Mrs B said...

I have no idea about the maths on this. Most of the terminology is way above my level of competence. I stopped maths after we did standard deviation, and even that was too much for me.
I really admire all the people posting here who are looking at the maths and suggesting other ways of doing it.
BUT I think maybe you can't see the wood for the trees. Nobody KNOWS enough about what's going on. It's one of those GIGO situations.

It's going to come down to whether absentee ballots are admitted and to what happens to the challenges. I suspect that the fact that the people making the decisions are mainly judges will make the process fair, but the relative strengths of Franken and Coleman's legal teams will be very important.
As for the GOP attempting to delegitimise the count by alleging bias in the judges, I agree with the post that said the SoS should publish all the challenged ballots and the decisions. That should be enough proof for most people. The wingnuts wouldn't accept anything, of course, but who cares?

And finally, Nate had Franken at 52% on 4th November. I see nothing yet to make that look wrong.

Zapunar Mckintard said...

Earth to Nate: can you apply your math skills to something a bit more fucking constructive than whether Al the Clown gets to join the Wasington Circus, please??!!

MrsB: you sound waaay hot!

Natalie Rosen said...

The only "regression analysis" I ever heard of was on my analyst's couch! Only kidding. I don't have an analyst!

I did NOT get a perfect score on my math SAT's as a matter of fact I probably was the only one who got a 50 on my math SAT's and the lowest one can allegedly get is 200. Only kidding there to but no doubt about it as much as I wish Nate Silver's genes were a part of my DNA repertoire they are unfortunately not. I do NOT get the whole Franken/Coleman mess!

Mrs B said...

zapunar mcK

NOT that sort of deviation!!!!

Zapunar Mckintard said...

B: Zap's a straight little bar chart- the only deviant thing about me is my stupid name- I think I'll change to Mr Zzzzzz (after reading the latest Minnesota recount post).

andrewxc said...

Decently done, but this is, of course, assuming that the regions vote uniformly, as you had defined it in the first couple of paragraphs.
As a scientist, I am curious about the outcome.

dcgenerals said...

It's a good thing the democrats "found" those 32 votes, I guess!

Stephen C. Rose said...

M O N D A Y

Will Arnold Schwartzenegger Be Barack's Green Giant?
http://stephencrosehome.blogspot.com/2008/11/will-arnold-schwartzenegger-be-baracks.html

UPDATES ALL DAY: Can Barack Stop The Panic? + Our Job vis-a-vis The Obama Administration

YESTERDAY

Behind Barack's Steely Resolve, the Idea of Unity, the Capacity to Grow
http://stephencrosehome.blogspot.com/2008/11/behind-baracks-steely-resolve-idea-of.html

To receive this daily one-page "magazine" the next morning in your inbox, with all updates, go here http://stephencrosehome.blogspot.com and subscribe.

Huffington Post Page -- http://www.huffingtonpost.com/stephen-c-rose

Nick said...

Nate,

I love your math -- until the very very end. 254 + 12 = 266. But you are still the man.

Jacob said...

Can we get a Hotelling confidence band on this at 90, 95, and 99% confidence, please? I'll settle for Scheffé's procedure.

I'm assuming some of the intervals would include zero or negative votes and that in fact the regression isn't predicting anything at all within these levels, no?

David said...

i feel dumber now than i did when i woke up this morning.

Carl said...

WOOHOO!

Nah nah nah nah
Nah nah nah nah
Hey Hey Nahm

GOODBYYYYYYYYYYYYYYYYYE!

C said...

dcgenerals: can you "find" something that was never "lost"?

Because those absentee ballots in Minneapolis were never lost, or indeed unsecured.

I'm sure you don't care though

Chris in Asia said...

If a margin this narrow doesn't convince people that every vote counts, I don't know what will.

Oldnovice said...

Everything in the world is a regression analysis waiting to happen in Nate's eyes.

:-)

KWRegan said...

"Lizard People" voter comes forward

"Bemidji 25-year-old Lucas Davenport sounds convincing — he and a bud were joking around about the lousiness of the candidates, thought about writing "Revolution," and decided "Lizard People" was funnier."

As David Brauer continues, "Lucas, you were right, but your Democrat buddies will be mad you also filled in the Franken oval and screwed him out of a vote."

I would say the joke goes best with Franken, but this doesn't impute any intent. So as Chris says "every vote counts", but I don't think this one wanted to be counted.

WV "adampl": I hope this doesn't put a dampl on Franken's chances.

JohnJay60 said...

Nate, I run an analytics practice for a huge consulting firm. I like the math but I am puzzled that, if I read this correctly, you've decoupled your study from the underlying facts of the area the votes are being counted from.

That is, would not a better approach be to estimate what % of the votes in flux would tilt Franken, Coleman, or other compared to the composition of the precinct the votes came from? This assumes the characteristics of voters who cast votes in flux are the same overall as those whose votes are not in doubt, and this assumption may be wrong.

Tying this to number of challenges seems to be the wrong approach, since a challenge doesn't change the underlying vote. But maybe I'm missing something.

The rest of your site is awesome - I'm a huge fan and look forward to your credibility being recognized in 2010 and beyond in future elections.

susan said...

GIGO - had to look it up. Wonderful, had forgotten this useful phrase.

Tending off-topic, since I'm statistically challenged, this just in from Devon:

"We were in Morocco when the results came through and this lovely friendly French/Moroccan lady greeted us at our Riad with’oh que c’est bien! Un homme noir au Maison Blanc."

(sounds better in French, punches up black at white)

Hope someone enjoys this distraction.

just_looking said...

If Nate is correct, it will further cement his reputation.

However, I have my doubts. According to Nate's model, Franken loses votes every time he challenges (that makes no sense), and moreover it's about the same number he loses when Coleman challenges (just plug in the numebrs and try yourself). Of course if Franken loses votes on every challenge, the model will predict he makes big gains with no challenges.

In the precints in which there were 4 or fewer challenges, Franken picked up 105 votes, while Franken lost 57 when there were 5 or greater challenges (see Nate's table). And, in the 4-or-fewer precincts, the average challenge rate was about 4 per 10,000 votes for both sides. In the 5 or greater, it was over 30 per 10,000 for both sides.

Thus it is easy to see why the model concludes all challenges hurt Franken becuase when both sides challenge in big numbers, Franken loses.

But, the model missed these data: in the 4-or-fewer precincts, Franken had 34 more challenges. In the 5 or greater, Coleman had 37 more challenges. And it's the differential number of challenges between Coleman and Franken that reduce Franken votes, not the total number of challenges.

In short, the model is incorrectly looking at the total number of challenges on both sides, and not the differential. Wish I had time to build an alternate model based on (c_c - c_f), but I don't

Voice of the Midwest said...

Nate is right on target on the scientific side.

Now is where political science must be considered...

Once the recount comes to the point where Franken is within double digits and there is a pile of challenges that appear to lean Franken, the Coleman forces will unleash a campaign of confusion, lawsuits, and distortion not seen since the Miami-Dade County Courthouse in December, 2000.

What hurts them is the clarity in the Minnesota rules for recount and the above board nature they have had since Day 1.

The Coleman call for Franken to ask for the recount to be called off is a Republican play akin to a simple halfback over tackle play in football. Odds are, it won't hurt you.

Unless you are running in a state with good educational demographics...like Minnesota!

Jeff in CA said...

Nate, about the photo over on the left of the girl in the t-shirt and bikini bottom, who has been hanging out there for a while: It's distracting me from devoting my complete, undivided attention to your analysis. I'm starting to visit this site looking for the girl, not the math. And she wants to shower together. It's all so, so ... illogical. Aargh!

Juris said...

Has anybody else come out with a statistically based prediction of this outcome?

I think Nate's brave to make a point estimate. If he's close but Coleman wins, Nate doesn't look good. Suppose Coleman wins by 1 vote? Nate would look like a terrible prognosticator. But if Franken wins by any amount, 1 to 300 votes, say, Nate will look pretty good for sticking his neck out.

At least he won't be able to change his prognostication after the counting is over, as Sam Wang has done on his EV prediction.

wc: ovent

Pat &amp; Flo said...

Here is a sobering analysis of this "projection" by Princeton professor Sam Wang, who explains how silly it is to present things that way...

http://election.princeton.edu/2008/11/24/statistical-malpractice

Juris said...

Sam Wang? Take a really good look at what he's doing. On November 4, he posted his final prediction from his model: 352 EV for Obama. This is the model he keeps hyping. It's a good one.

But what number does he show at the top of his webpage: "Predicted EV 364."

Well where did that number come from? Sam says it's from his 'gut,' not his model. But you wouldn't know that if you happened to log onto his website, since he makes it seem that this 364 was the outcome of his model. Very deceptive.

Cugel said...

"Old Fat Guy Said:

We can infer from good evidence (the original machine count plus how well the number of challenged votes by the opponent matches up with the number of fewer votes received in the recount) that most of these challenges are one campaign challenging the validity of the other's vote received.

The assumption I'm making is the machines do a pretty good job (they do) and the explanation for the lost votes is Coleman challenging Franken votes and Franken challenging Coleman votes."


I'm sorry that your entire analysis is flawed from the beginning OFG, because your assumption is just unsupportable.

You simply can't look at the number of challenges matching the number of lost votes and conclude they were challenges by the opponent to perfectly legitimate votes! It might be the OPPOSITE! The machine may simply have counted a stray mark as a vote and the human judge determined it was a deliberate undervote -- and that results in the challenge.

We don't KNOW what sort of challenges are being made in what ratios.

Ex: The voter thinks about voting for Franken or Coleman, but doesn't like either one. She taps her pen on the oval which leaves a mark, enough for the card-reader to count as a vote, then decides "screw it" and leaves them both blank.

But, when the ballot is hand-counted, the election judges decide this falls into the category of a deliberate UNDERVOTE.

They take one vote away from Coleman. He challenges.

This shows up as a lost vote on his total, but when the ballots are ultimately considered by the Canvass Board, he loses the challenge and the vote isn't counted, so he doesn't gain any ground.

In short, the judges may simply be knocking out a lot of ballots that the machines wrongly counted (for both sides).

We simply DON'T KNOW what percentage of challenges are "lost vote" challenges to the decision of the judges NOT to count a vote, and "added vote" challenges to the decision of the judges to count a vote for your opponent that wasn't there before.

The ONLY evidence I've seen, and it's highly inferential at best, comes from a comment by Joe Manskey, the election supervisor of Ramsay county who keeps getting quoted in the Star-Tribune. He said that he thought "the word had come down" from the campaigns "to save all our votes" i.e. to challenge all the judges decisions to knock out their votes -- regardless of merit.

His comments suggested that MOST of the challenges were "lost vote" challenges.

I COULD BE WRONG ABOUT THAT! Or, he might only be talking about HIS county. But, that's the only piece of evidence I've seen that attempts to quantify whether most of the challenges are lost vote or added vote challenges.

Obviously BOTH campaigns do not publicize the "lost vote" challenges because that makes them look bad! They only publicize the egregious attempts by the other guy to knock their votes out and accuse the other campaign of fraud!

Cugel said...

It occurred to me that a big reason for the high number of challenges is simply: THE LAWYERS!

Lawyers are trained that if you DON'T make an immediate objection at trial to any ruling by the judge to admit or exclude evidence you WON'T be allowed to complain about it afterward. The appeals court will say "you didn't complain about it at trial so you lost your chance."

Same thing here. The Canvass Board may reject your challenge out of hand 99.9% of the time, but if you don't make one, you're 100% guaranteed to lose -- so CHALLENGE AWAY!

Zapunar Mckintard said...

Oui c'est un bon example de GIGO, Susan.

Russ Martocci said...

A. There are no coincidences, but there are degrees of relevance, ranging from almost none to lots.

It matters.

B. There is this line the Republicans are trying to sell these days. It's that Bush planted the seeds of democracy in the Iraq by illegally invading that country. Someday when peace comes to that land, as it always must, they plan to credit their criminality with fostering it. That is and will continue to be a lie.

The seeds of democracy are everywhere now. The seeds of democracy are everywhere people are. People yearn for freedom from tyranny like a flower yearns for the sun.

An illegal war brought the seeds of death and corruption and that's how they've sprouted.

Bring new seeds to ravished land, where ever the land is ravished. Till it, tend it and the new seeds will grow. Rarely with much ease, frequently with some pleasure.

This has been a message from Chance Gardener.

Juris said...

Cugel: you got it. It's about the lawyers.

The strategy is straighforward by the Coleman team: (1) at all costs stay ahead after the initial recount; (2) if we're ahead, loudly proclaim victory and go to court and try to stop the final review of challenged ballots; (3) if we're behind, claim the recount was flawed, and that even the resolution of challenges will be inadequate because of inconsistent procedures in what was allowed to be challenged.

I have confidence that Minnesota's SoS will get this right whoever wins. Coleman does not. I also think the lawyer strategy is a losing one in this case, again no matter who wins the recount.

livemild said...

while i dont think that franken is going to win i do find it interesting to look at a possibility of a win. politics is entertaining let's face it.
this site is entertaining.

the sunshine and lollipops of other sites like Huff end up like bubblegum. you enjoy it for a few minutes and then spit it out.

Juris said...

I should have said in my last post that I think Coleman DOES think the SoS of Minnesota will get the count right in the end. But he wants to prevent that correct count (via legal action) if it shows that Franken is ahead.

Dave said...

Nate,

By including interactions terms you are assuming that the predictor variables are colinear (correlated). In this case, linear regression analysis may be a bad predictor, since the process is inherently nonlinear.

There is a fairly standard test you can apply to the predictor variables to find out if in fact they are co-linear. A good book on this subject is: Belsley, David A., Edwin Kuh and Roy E. Welsch. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons.

Blind application of linear regression analysis to nonlinear problems can lead to erroneous results. This may or may not be the case here, but Belsey has outlined a simple way to test for what he terms "Harmful Collinearity." It would make me feel much more safe about your results.

-Dave

Craig said...

Why so many challenges?

Maybe it isn't the insidious motive of trying to be in the lead when the canvasing board meets.

I think the answer is pretty clear: the people at the top don't want the challengers to miss anything. In a race that comes down to a few votes, do you want your hundreds of lightly trained challengers to have discretion to miss something? If I were in charge, I would want the election lawyers making the discretionary decisions, not retirees, factory workers, etc...

I think the campaigns want a huge stack of anything questionable to go to their lawyers, and then the lawyers will filter out most of the frivolous ones and drop the challenge before they get to the canvassing board.

Juris said...

@Dave. I checked and that's a damn expensive book, whether in that edition or a later one. Amazing how much technical books cost these days. Way too much for the interested consumer.

In the case of Nate's model, it would be interesting to know what he got without the interaction terms (and/or with adjustment for nonlinearity). I suspect that whether consciously or not, he's picking his model just a bit for strategic reasons -- what's the smallest pro-Franken margin that he can reasonably get? At this point, since Coleman's ahead in the overall count, any Franken win would make Nate's prediction look respectable.

wv: dusticel. Where Ted Stevens may spend his remaining days.

Tom said...

The star tribune lists a time table for the recount and also which party has issued the challenge which might help with any analysis

http://ww2.startribune.com/news/metro/elections/returns/2008/recount/msenco.html

Mark Ludwick said...

I tried to recreate this regression without the initial vote % and the confusing interaction terms, making it dumber but somewhat easier to interpret...

Regressing across all precincts, Franken's gain per 10,000 votes in each precinct is modeled to be:

2.697 + [0.3586 * F] - [0.9502 * C],


where F is the number of Franken challenges per 10,000 votes and C is the number of Coleman challenges per 10,000 votes.

[Notice that this predicts a gain of 2.697 votes for Franken for every 10,000 votes in precincts where there are no challenges, while the average gain in precincts with no challenges has so far been 0.02 per 10,000 votes.]

While challenges exist, all else being equal:
- one more Franken challenge in a precinct increases Franken's gain in that precinct by 3586/P, where P is the number of voters in that precinct.
- one more Coleman challenge in a precinct decreases Franken's gain in that precinct by 9502/P, where P is the number of voters in that precinct.

Do these two points suggest that a higher percentage Coleman challenges are directly challenging votes that had been counted for Franken?


So... removing all challenges and expanding this to the whole state... 2.697 votes gained per 10,000, multiplied by 2.8 million votes, gives us a gain by Franken of 778 votes.

Franken by 563!

:P

kev said...

Swami Nate,

Could you please breakdown the challenged votes mess for me? I don't understand, is Coleman intentionally challenging the votes in hot dem counties just to keep the race looking close? If those are silly challenges, do those votes swing towards Al in the long-run? It's so confusing.

Sloanasaurus said...

Coleman will win the recount by 115 votes.

David said...

Nate, care to respond to those of us who think your analysis is "having fun with the data" (as Greg described it) with little or no relationship to fact?