I had previously been defining a "Must-Win State" (see the Swing State Anaylsis graph along the right-hand side of the page) as a state which is won the highest percentage of the time by the candidate winning the election. The problem with this definition is that it was producing some skew toward Obama-friendly states, since we now have Obama winning the election the strong majority of the time period.
The new definition is that a Must-Win State is a state won the highest percentage of the time by the winning candidate when the election is close. By "close", I mean an election in which the popular vote is within 4 percentage points. If the popular vote is outside that 4 percentage point range, it is pretty much mathematically impossible for the trailing candidate to win in the Electoral College, meaning that the electoral math becomes irrelevant. So these are the states that the candidate needs to win when winning individual states matters.
6.17.2008
Defining a "Must Win" (Technical)
by Nate Silver @ 8:29 PM...see also site, swing states
Subscribe to:
Post Comments (Atom)

96 comments
That gradient really doesn't work for me. What about a dark red/dark blue alternating color pattern?
Tipping point for both, must win for both? It would be interesting to see tipping point McCain, tipping point Obama. Must win McCain, must win Obama. Hope I made sense
edit: I see the color gradient now. Got it. Agree with falsehood though
Why not go whole-hog, and define a must-win state as one which 1) went for the winning candidate, and 2) would have altered the result of the election if it had gone the other way?
After all, if condition 2 doesn't hold, it isn't really a must-win state. Its a nice-to-have-to-run-up-my-vote-total state.
Does this mean that Ohio is a must win state 75% of all runs, or that it's a must win 75% of the time that the election is within 4%.
You could more appropriately choose to include a state as a must win when there is a single state that would cause the winner to loose if that one state flipped, then include all states which fit that criteria in that run.
Anon,
Because if you define things that way, then California would like always be a must win, since you can't very easily subtract 55 electoral votes and expect to win the election.
Nate,
Re: Your 7:58 comment, isn't that only true for Obama? I'd imagine there are a lot of situations in which McCain wins the election but loses California. Perhaps it makes the most sense to have separate lists of "must-win" states for each candidate.
Which is why you only include states that are within 5% on each simulation run.
This would give a much more useful list, and prevent "Must-Win" form being a misnomer.
Somebody at Daily Kos: "A bonus from this poll: DavidNYC has multiplied out the crosstabs from this poll, and they indicate that McCain leads Obama in Alaska by just two points, 43% to 41%."
Isn't that what a "must win" state? I mean, Obama doesn't win the election very many ways if he's loosing California so it is a "must win" state, though I'm guessing you mean something slightly different.
By your (english) definition though I imagine California is won very close to 100% of the time by Obama when he wins the election.
Not sure what I'm missing.
Following up on my previous post, I think that 7:54 anonymous probably has the more intuitive definition of "must-win" states.
What you seem to be listing are "telltale" states.
Could someone help me understand how it works out that it's nearly impossible to win the EV while losing the popular vote by more than 4%? So many of the states are effectively gerrymandered. I mean ... it's not unlikely that the Republican would win Utah with 70% and the Dem would win DC with 80%, so all those extra popular votes are "wasted".
Wouldn't purple be the appropriate color here?
Obsessed,
See this.
Because if you define things that way, then California would like always be a must win, since you can't very easily subtract 55 electoral votes and expect to win the election.
What if you threw out states where one candidate wins by more than x%? Then you'd have "must win swing states".
Modeler - thanks for the link!
pensblog jeff: I looked at DavidNYC's Daily Kos diaries and comments, as well as his posts at Swing State Project, and found nothing about Alaska. Got a link?
A petty complaint - the tipping point states percents must logically add up to 100%, since there is always one that tips it over.
There should be an entry at the bottom for 'other', I suggest.
Anon 8:25 -
It's on the front page under the AK-Sen story.
Could someone give a similar description of what the tipping point list is measuring?
Because if you define things that way, then California would like always be a must win, since you can't very easily subtract 55 electoral votes and expect to win the election
Actually, eyeballing the numbers on your graph, it looks like California isn't necessary for the 30% of simulations that result in McCain victory, or the ~35%? that result in a 325+ vote Obama win. So Californa is only a must-win state 35% of the time, which is probably much less than, say, Ohio, which is likely critical to a large portion of close McCain and Obama wins. I still maintain this is a better definition than the arbitrary cutoff of 4%.
However, this could be refined a bit further, by also looking at the ratio of times state X is necessary in simulations that McCain wins to the number of times it is necessary in simulations that Obama wins. For California, this ratio is probably nearly 0 (if McCain wins Cal, he probably wins so much else he doesn't need it.) For a state like Ohio, it probably approaches 1. The closer the ratio is to 1, the more the state can be said to be a true swing, must-win state.
I love all the ways you keep improving the site!
One alternative here would be to define
"must-win" percentage for state X as the average of:
Prob(Obama wins X given Obama wins the election) and
Prob(McCain wins X given McCain wins the election).
That might eliminate the bias toward "Obama-friendly states" without throwing out any of the runs.
regarding PV and EV:
Okay, let's say something dramatic and polarizing happens. For example, let's say McCain goes into a an angry rage in front of the cameras and starts screaming bloody murder about how much he hates abortion and how he favors the death penalty for abortion doctors. While this might cause his PV in states like Kansas to go up his PV in states like CA and NY might well plummet by more than 5% (especially in light of the recent poll that revealed that 49% of women think McCain is pro-choice!).
But McCain would have lost CA and NY anyway, so in this case a dramatic PV swing could well have minimal effect on the EV count. If it had been Bush in 2004, this could easily have given Kerry the PV by more than 4% without losing the EV for Bush.
Thus, depending on the reason for the 4% swing in PV it might have relatively little effect on the EV.
Conversely, what if it were revealed that Obama is black? This could cost him huge amounts of votes in WV, KY and AL while having minimal effect elsewhere.
8:40 anonymous:
See the FAQ.
Nate,
After more state polls are released, consider making a post about the efficacy of your new regression model. I was doubtful at first, but it seems to be holding up quite well. The new regression numbers have matched the post-primary numbers from the following states: AR, KY, MA, NC, OK, AK (based on the projection from Rass's polling of the Senate race)
Could it be defined for must win swing state" Of course California is must win for Democrats, but it is safely blue. But if it is a swing state who is it a more must win for? Based on established base states which states are the most important for each candidate. And which states (tipping point I assume) are key to leading to that candidate's victory among the swing states
Obsessed,
If it turns out that Obama is black, Nate can just study historical voting patterns for Warren G. Harding. ;-)
I understand your discomfort with the 4% cutoff. However the situations you describe are unlikely. It's probably not worth worrying about unless Nate starts seeing a reasonable probability of someone winning the PV by 4% but losing the EV in his simulations.
I think what Nate is really trying to find are telltale states: those for which a reasonable person can say "As goes state _______, so goes the nation."
For this definition, Nate's original approach makes sense, but so does his concern. For example, in 1984, as went 49 states so went the nation.
Perhaps we can say that a state is a "telltale state" if there is a high probability that Obama wins the election if Obama wins X AND McCain wins the election if McCain wins X. The telltale index could then be defined as the product of:
Probability(Obama wins election given that Obama wins X)
and
Probability(McCain wins election given that McCain wins X)
Maybe this has been pointed out before, or maybe I'm just blind, but it seems something is missing from the site, and the scenario list.
What are the odds Obama will actually win the election?
The scenario list has a bunch of "X happens" things, but none of them is "Obama wins". Is the expected Win Percentage on the left hand side the same thing as the odds that Obama will win? I don't get it.
Here's a simple way to get at the "must-win" notion.
Say that you run some number of simulations and get the following results for OH and the electoral college as a whole:
US
BO JM
OH BO 70% 1%
JM 2% 27%
---------------
Total 72% 28%
Scale the two US columns up and down so that BO and JM have equal chances of winning the national election:
48.6% 1.8%
1.4% 48.2%
---------------
50.0% 50.0%
Now you can read off that, in an even election, OH would vote for the winner 48.6%+48.2% = 96.8% of the time.
By contrast, a state like CA would probably end up with a grid like
49.9% 47.5%
0.1% 2.5%
---------------
50.0% 50.0%
So CA would only be a 52.4% must-win state.
In fact, it might make sense to apply the transformation x-->2x-1 to this result, in order to get a number between 0 and 100%.
(This method will get less reliable results if one candidate wins too few simulations.)
pallidbust, you're looking for the second pie-chart on the upper-left.
Pallidbust,
Look at the chart on the top-left of the home page.
Nate:
It seems to me that a "tipping point state" *IS* a "must-win state" and vice-versa. I would say that the definition of "must-win" is how often the state tips the balance of the election.
The proof is in the pudding. Under your old definition, the result was meaningless because solid dem states became "must-win" as Obama's win percentage went up. Under your new definition, the two lists (tipping point and must-win) are virtually identical.
I would just ditch "must-win" and keep tipping-point.
Benjamin,
Your method is reasonable as well, but why scale the columns and not the rows? Aren't we interested in the probability that a candidate wins the election given that the candidate won Ohio, as opposed to the probability that a candidate won Ohio given that the candidate won the election?
Benjamin, one more question: is your method mathematically equivalent to Tristram's?
modeler:
To your second question, yes it is. I totally overlooked Tristram's comment.
To your first question, I think you have to go back to the interesting high-level question, which is "What states are important if the election is interesting, i.e., tight?" To answer this question, scaling the national results to 50/50 is a decent thing to do. (But shifting the national results of each simulation, as is done for the Tipping Point measure is even better.)
Also, as a practical matter, as we approach election day, there won't be enough simulations where, say, Obama wins UT to be able to scale to 50/50 UT. (Frankly, I don't think that Nate gets the far-out states like UT and VT close enough to 0% and 100%, because of the way he double-counts the now-until-election-day volatility of popular opinion, but that's quite another technical subject.)
There's a second high-level question that I think is even more interesting, which neither of these measures gets at, called "What states are important as the election stands now?" Right now, the answer to that is roughly PA/OH/MI, but if Obama continues to pull away, the answer to that question may become "No state is very important, since the race is a blow-out."
kronius:
I also agree that "must-win" doesn't add much interesting to "tipping-point."
Nate, "must win" state seems intuitively to only be states that the candidate must win to win the election. If someone wins by 10 EVs, then CO, IO, NH, etc. are not must win states for that particular election because the candidate would have won without them. I saw your comment about California being a must win state under this definition in most Obama wins, even though it's not what most people would consider a swing state since he's likely to win it by a large margin. As others have noted, you can easily avoid this problem by limiting "must win" states to those in which the election is within some arbitrary number, such as 5% (a number a good campaign may be able to make up). I really think this list of states would be extremely interesting and a better indicator of where campaigns should devote their limited resources. As always, love the site and appreciate your obvious strenous efforts to make it the best on the web for election analysis.
The one glaring difference between the two lists is Florida -- it's a Tipping Point state, but not a Must-Win state.
Intuitively, I'm guessing that Florida is a tipping point for scenarios in which McCain wins, but not for Obama wins... is that correct?
I suppose you could count this as a request to break out those lists for each candidate.
(Also, have you considered putting a link to the relevant FAQ section on each of the graphs? It would make it easier for newcomers to figure out what they're looking at.)
I'll leave the mathematical nuiances of modeling equations to the statistical experts. But as a newcomer to this site something just jumped out at me; that I had not absorbed until tonight:
Looking at the POLL DETAIL for all the states you can see that a large majority of states have one or more or even many polls within the last month or so; i.e.:
The data is relatively or even very current; and in many cases has both breath and depth.
But a few states have effectively been ''stubbed off'' as much as 4 months ago:
7 states have no polling data after 2-27; and 2 more have no data after 3-29...
NOW; before somebody jumps on me:
I fully realize that in most of these ''only old data'' cases, it's kind of a case of why bother; they're not gonna change; i.e.:
DE, HI, IL, MD, RI, and VT are totally hard over for BO; they're not gonna change.
Same with ID for JM; he's got a lock.
BUT: Again speaking as an ex-ND guy:
Note last large sample poll in ND was nearly 4 months ago; and only 2 polls total are listed, with last one on 3-29. SD was only one poll better.
NOW: Keep in mind that Romney was pretty popular in ND; and then compare with UT where Romney was of course hugely popular:
JM was only up in UT on 2-27 by +11; but in 5-16 UT poll after (R) primary fights were long done and over JM is up by a whopping 35.
Let me hasten to add I'm not trying to suggest that if we get another ND poll it will show a jump for JM anything close to JM's huge leap ahead in UT between Feb and May. But I would say that it's not at all unreasonable to expect that JM would gain considerably if ND was polled today; due to what we might call a ''Romney and etcetera effect''; i.e.: Feb and even March was a LONG time ago in this campaign.
Perhaps the statistical massaging and regression / projection factors take this into account more than I perceive off the top, but when the data is this sparse and this old like for ND, I have to wonder; and would think the validity of the current BO-34 versus JM-66 for ND is tenuous at best.
For a somewhat similar example look at NE: JM up by only 3 on 2-27; but up by an average of 16 in NE for the 3 polls in May. My gut feel guess as a ND native is that JM is up by 10 points or more in ND... of course I have no hard data to substantiate that; and if this site is about anything it is real data. Just remember what I said if any reputable agency ever takes another ND poll.
I think Pallid's query is another argument for changing "Win Percentage" to "Win Likelihood."
Benjamin,
I think I understand your rationale for scaling to 50%, but scaling the rows seems more intuitive to me. The reason is that we tend to think of winning the US as conditional on winning a given state, rather than the other way around. Or, in terms of your model, you would scale up to say that each has an even chance of winning Ohio (so it is a true swing state), and then say what is the probability that the winner of Ohio wins the election.
Thanks for telling me its the 2nd pie chart. When I saw "Win Percentage", I assumed that meant what percentage of the electoral vote Obama was expected to win, especially since its juxtaposed with popular vote and electoral vote.
The word "percentage" isn't usually associated with predictions unless the word "likelihood" or "chance" is included, so I think its a little confusing.
This site is such a joke. It was obviously created to make Obama look better than Clinton against McCain, which http://www.electoral-vote.com/ was showing Clinton crushing Obama against McCain until Clinton dropped out. You of course had manipulated the numbers to show Obama doing better than CLinton against McCain which was a lie.
Now that she's out you have changed your methodology multiple times to the point where you are reporting numbers identical to http://www.electoral-vote.com/
If Obama slips in the polls are you going to manipulate your methodology again to show him winning? Of course you are.
You're just another manipulator of the truth to get their candidate elected. Go away and take your bs with you.
Benjamin,
To follow up, consider the following matrix:
50 40
0 10
In this case, OH is an absolute must-win for the second candidate; he never wins the US without it, and always loses when he loses it.
On the other hand, it's weakly a must-win for the first candidate. I would say that the "must win" score should be above 50.
On the other hand, consider this matrix:
25 25
25 25
The US outcome is completely uncorrelated to the outcome of this state. I would say this is is less of a must-win state than the first situation. If you scale up the columns, it becomes more of a must-win state.
Actually, now that I think about it, maybe covariance is the answer here. If X is a binary {-1,1} variable indicating which candidate won the state, and Y is a variable indicating which candidate won the US, we want to measure:
E(X*Y)
(.60-.40) = .2
And in the second matrix, it would be 0.
Sorry, the "(.60-.40)" example above is for the first matrix.
You know, the sharp 4% margin definition doesn't fit with many of your site's other definitions, such as the gradual fade-out of old polls. I think it would be better if the effect of must-wins simply diminishes with the margin of win.
Anon@11:18pm-
Actually, I very clearly recall this site showing Clinton with the upper hand vs. McCain for at least the last month of the primary race between the two. She was consistently winning 60-65% of the time, while Obama was winning only 48-52% of the time, as I recall.
Sigh... So covariance is essentially what Nate was doing before, and it has the same problem he's trying to fixed.
So let's try this instead: Covariance of the scaled matrix. I prefer scaled rows, but Benjamin/Tristram's solution of scaling columns would be reasonable as well. You would get the following results for the following matrices:
50 40
0 10
Scale Rows: .56
Scale columns: .2
25 25
25 25
Scale Rows: 0
Scale columns: 0
99 0
0 1
Scale Rows: 1
Scale columns: 1
50 0
15 35
Scale Rows: .70
Scale columns: .77
Finally, I should point out that the covariance approach is the same as Benjamin's (2x-1) adjustment. OK, I'm all caught up now. :-)
Anonymous at June 17, 2008 11:18 PM:
As has been pointed out, Clinton was rated as more likely to beat McCain by a significant margin over Obama for about a month at the end there. The other thing to note, too, is that the new methodology will be run retroactively over previous data to show what it would've said where it developed back then, so if indeed Clinton was more likely to win then it will be shown on this site soon. I hope you see this as an improvement, as you seem invested solely in the prospect of Clinton winning and not what is likely to happen, and this new moethod gives more of a chance that Clinton was a safer bet at some point before it was clear that Obama was going to be the nominee.
Anon @ 11:18:
Do you remember the charts? Clinton was shown as much stronger, a strength I believe wouldn't hold, 538 showed it none-the-less.
Do you have any real criticism of the methods, or are you solely annoyed at the results?
Anon @ 11:18:
Try to be at least a little coherent.
1. Clinton was dominating McCain on 538 near the end. So unless you want to point me to somewhere to prove this false, stop making stuff up.
2. How is he stealing numbers from that site? Do you even understand how this site works?
3. You admit that Obama is doing well in the polls, but are some how mad that he is doing well and 'winning' on this simulation? Isn't that supposed to occur? If you have a problem with the methodology than please say so, I'm sure Nate will be happy to listen to any improvements to his methodology you could prescribe.
4. Manipulator of the truth? Um... I assume that you support John McCain or w/e, but if there are polls that show Obama winning in places like VA, close in NC, up big in OH, doing much better in AR and KY, solid leads in traditional dem leaning swing states like WA,OR, WI, MN. Then why shouldn't he be winning?
5. You obviously have some beef with the new methodology. I agree and think that it is a bit to fluctuation for my taste, but there is still a lot of valuable information in national polling. Why can't you let him tinker with it to perfect it instead of jumping down his throat. Might I ask what benefit Nate gets from manipulating the 'truth'?
6. You clearly have a fundamental misunderstanding of statistical analysis if you think that Nate is trying to find the 'truth' when in reality he is trying to find the best analysis, projection, prediction
7. Nobody is forcing you to look at this website, maybe you should take some of your own advise and leave.
Im new to the site....what the difference in definition between "must win" and tipping point"? And what is the "Electoral vote distribution" graph show?
to anonymous@1:15:
As I now understand the Electoral vote distribution (others will correct me if I'm wrong):
1) a computer-simulated election is run 10,000 times (or some such number)
2) the vertical lines represent the number of simulated elections in which Obama got that many EVs. (They're all Obama - my error was thinking the red ones were McCain. In fact, red simply means that Obama got 268 or less and lost that simulated election)
3) Thus, the left half of the graph will always be red. If you're rooting for Obama, you want a lot of blue real estate. The ideal result would be a very tall skinny blue spike right at 538 and everything else flatlining along the bottom.
4) The Win Percentage in the left margin (currently 67.2%) corresponds directly - there should be 67.2% blue in the EV Distribution chart.
Hi Nate, Can you use your stats savviness to make Obama a bit more popular than he is so that I don't have to worry about the final result and I can get a life and stop following this tedious election! (politics and maths - geewhizz what fun!)
The Clinton/Obama graphic archive is here:
http://www.fivethirtyeight.com/2008/06/clinton-mccain-archive.html
And I think anon at 23:15 is another vote for tooltips or explanatory links on the graphics.
Modeler -
Scaling rows rather than columns would be a mistake. By X being a "must-win state", we mean that "Obama/McCain must win X to win the election", or equivalently "If Obama/McCain wins the election, then he wins X", not the other way round. With Nate's current results, scaling rows would label California a huge must-win state, for example.
Interesting point Methow Ken - though brevity is the soul of wit!
Are you basically saying that in the Dakotas and Montana you would expect to see a McCain bounce in the same way that Obama has bounced in Arkansas, West Virginia, Kentucky?
Sounds plausible.
New Qunnipiac polls! Florida finally gets polled!
FL 47-43 Obama!
OH 48-42 Obama!
PA 52-40 Obama!
http://www.dailykos.com/story/2008/6/18/61624/6607/303/537708
I now take this opportunity to attack Nate as being totally in the tank for McCain. Why didn't you stick to your guns when your model showed Florida ahead, Nate? It's because you secretly hate black people, isn't it, Nate?
OMG !
OBAMA has not even been campaigning in FL
If things go on like this November will be a HUGE
Wow, Nate's fast: the map is updated!
How many more polls coming out today ?
anybody know ?
Yes, modeler, you're right that my "2x-1" suggestion is the same as the covariance of the scaled matrix.
It also represents the "beta" of the national results to the state's results. That is, the increase in the percent likelihood of a national win, given a 1% increase in the likelihood of a state win.
Another useful number might be the correlation between the state and the nation (in the unscaled square):
(determinant of square) / sqrt(toprow*bottomrow*leftcol*rightcol).
These two numbers are related by:
beta = correl * sd(nation)/sd(state),
where sd(nation) and sd(state) are the standard deviations of your two binary variables.
Holy cow, every single "purple state" from 2004 (see left) is now blue!
"Red states" are also going purple
Nate, I'm curious about that growing probably in the distribution of Obama winning all 538. Could you post that probability (i'm sure it's still <1%) under the scenario analysis?
Virginia and Indiana! Woot!
7:29: That's not all 538: see that little line afterward, and some empty space? Also, at the beginning, that's not a spike at 0, it's at 3. DC is currently 100.00% Obama.
Obama also has a 0.2% shot at Utah, BTW.
Utah has 5 electoral votes
lol
Benjamin,
I'm not sure if your definition for correlation is correct; for example, it's infinite if any of the elements in the matrix are 0. I think the correlation in this case should be identical to the covariance.
Tristram, I understand where you are coming from, but considering that Obama must win California to win the election it makes sense that the "must win" score for CA should be at least 0.5. A score of 0.56, as shown in my example, would still make it less of a "must-win" state than a state like Ohio, but it would indicate that it really is "must-win" for one of the candidates.
In above John H said:
''Are you basically saying that in the Dakotas and Montana you would expect to see a McCain bounce in the same way that Obama has bounced in Arkansas, West Virginia, Kentucky?''
Yes (should qualify as brief...).
what about variability?
seems to me a "swing state" is one where the result is least predictible. states where the polls show the greatest variability - ones that really could swing either way - defines the swing state. where is the battle being waged? probably in those areas where the polls have yet to stabilize.
a "swinger state" is a whole othe matter.
"The new definition is that a Must-Win State is a state won the highest percentage of the time by the winning candidate when the election is close."
I think you need to go back to the drawing board for a better operational definition of what you're looking to capture.
Perhaps the problem starts with the phrase "must win". And it certainly extends to the "political fact" that "must win" states for Obama are not the same as must win states for "McCain".
In the political sense, I think "must win" means those states that "better be safe or the candidate is up an excrement tributary with no means of locomotion".
Although statistical analysis is ostensibly "objective" it must take into the account a knowledge of the underlying subject matter.
There are some states that that if Obama doesn't win, the race is certainly over. For example, if Obama doesn't win DC and IL, then we really don't have to look at the rest of the country because we know its over. I don't have to know the statistics, this is just a political fact. Same with McCain. If McCain can't win AZ, AL, or UT, then he's toast.
So from an understanding of the underlying subject, we'd properly say that DC and IL are "must win" states for Obama and AZ, AL, UT are must "win states" for McCain.
This is how I think the phrase "must win" is used in the political context, and how I would have used the term nearly 25 years ago when I worked for the DNC.
Once Gore lost his home state of TN, no reasonable statistician nor old school political analyst should have expected him to actually have a chance at winning.
But your use of the phrase "must-win" is trying to peal off the DCs and ALs and get to
the really "must-wins", i.e. not the must-wins that foretell a Goldwater, McGovern, Mondale, but the must wins that would have predicted that Ford or Carter or Dukakis was probably in trouble.
You've used the national popular vote as the operational definition but it seems to me that you should use an electoral vote spread as a better parameter.
I got some time to simulate 100,000 elections where every state is a coin flip (the "no information" assumption). The bulk of coin flip simulations are going to fall within Obama(or McCain) getting ~180 to ~360 EV. (There are some odd ball things about this, e.g. it's a lot more unlikely for a candidate in an every state is a flip to get 183 EV than 181 or 182)
So what I would do to perationalize "must win" is order the States worst to best for Obama and count (add EVs) up to 180 from one end. These are the "must wins" in the traditional sense for McCain. From the other end, add up, EVs up to 180. These are the "must-wins" for Obama.
By this definition, and realizing that the numbers keep changing, so mine are a few days old.
So for McCain, his "must wins":
AL, ID, KY, OK, UT, AR, TN, WY, NE, MS,AZ,KS,TX,GA,LA,SC,WV,AK,SD,and MT. If M is in trouble in any of these he is really in trouble.
For Obama, his "must wins":
ME, DE, OR, MN, MD, RI, MA, CA, NY, IL, WA, VT, DC, and HI. If O is in trouble in any of these he is really in trouble.
That leaves the BATTLEGROUND: FL, NC, ND, IN, MO, VA, NH, MI, NV, OH, NM, CO, PA, WI, IA
NJ, and CT.
In one sense, you might say that these are the only states that really directly matter. Using demographics or your evolving regression model, are these battle grounds more like Obama "must-wins" or McCain "must-wins" as I have defined them? That to me, is how you predict the outcome. It should also help you fine tune the regression model.
In addition, I think that your Electoral Vote Distribution chart, reveals a problem with your model. The presence of the way too fat tails near 0 and near 538, cannot be correct or are revealing something which needs explanation.
Finally, I want to be delicate here. But there is betting going on regarding this election. Sites like this one must affect the odds. Big time bookies are also doing there own number-crunching. I think for transparencies sake, Nate needs to disclose whether or not he's betting (directly or indirectly) and/or involved as a consultant for or contractually involved with bookies in LV or internationally.
PS what does 538 Regression Model say about VIGO COUNTY, IN?
For the swing state analysis to be valuable, I think we need new definitions.
Define a swing state for each simulation run as a state won in each simulation by less than 4% (this number is arbitrary, and might need to be adjusted).
Define a must-win state for each run as a state without which the winning candidate would not have reached 270.
Define a must-win swing state for each simulation run as a state that the winning candidate wins by fewer than 4% without which he would not have 270 electoral votes.
Thus if in a particular run McCain had 276 electoral votes and won Florida and New Hampshire by fewer than 4% but South Carolina by 17%:
1) New Hampshire would be a swing state, but would not be a must-win state because losing it would not result in a loss of the election.
2) South Carolina would be a must-win state because without it McCain would have had only 268 electoral votes, but would not be a swing state because it was solidly McCain.
3) Florida would be a must-win swing state.
The swing state analysis should then be replaced with a list of states most likely to be swing states and lists of each candidate's most likely must-win swing states. I think this best reflects what everyone wants to know when considering which states to focus on in the election: how likely is each state to determine the outcome of the election, to be the new "Florida Florida Florida."
I agree with the last two comments. I think the problem is in the terminology. "Must-win state" and "Swing-state" are just not the same thing.
There are separate lists of "must-win" states for each candidate. You seem to be trying to identify the states that most overlap. In that case, I go back to my previous suggestion of using a product (or geometric mean).
"Must-win swing" score =
SQRT((Must-win score for Obama) * (Must-win score for McCain))
The geometric mean will put the focus on the swing states.
There are a lot of ways to define the "must win score" for a given candidate, but you can simply define it by the covariance between winning the state and winning the election.
Never mind about the covariance in the above comment; I keep on forgetting why we don't like it. But the point is that a geometric mean might be the best way to combine the "must-win" scores for the different candidates.
OK, thinking about this a little more, I agree with Tristram and Charles that scaling the columns is a better idea.
Obsessed@1:41am,
The “Electoral Vote Distribution” graph has always confused me.
I kept trying to apply the “red=McCain” logic to the graph too, and it never quite made sense. I understood the gist of it, and the gist was good, but I could never explain it in clear and simple terms.
Your explanation was perfect! I finally, really, get it! Thanks!
(No offense Nate, but your explanation left me scratching my head, much like physics class!)
It'd be useful to also see the number of simulations where there weren't any "must-win" states. That way, we'd have a handle on the relative 'buffer' that McCain or Obama has on the likely scenarios.
One other broader thought, which is what about modeling variability in the polling data, such that you can identify 'best' and 'worst' case scenarios. What if Obama were polling +2 everywhere, or -2 everywhere? The relative stability of the map may be more effectively judged through such additional simulations.
IMO the most useful definitions are:
- a must-win state is a state in which winning is strongly correlated with winning the general election (i.e. I agree with Modeler), and
- a swing state is a state in which an improvement for either candidate would most strongly affect the probability of winning overall.
Hi Nate
somebode posted the following on another thread
//After your "We know less than we think" adjustments increased the variance of the results, perhaps you should increase the simulation size to smooth the distribution out a bit more. It won't really change the EV/win/vote numbers, but it will make the distribution itself a bit more aesthetically appealing.//
I fully agree with him.
Furthermore, there are some scenarios that only happen 0, 3 or 5 times in your simulation. Those percentages are a bit meaningless until you run your simulation, eg. 50000 times.
I hope your computer is able to handle that...
Modeler: The correlation formula I gave is correct (I think), and is definitely not infinite just because one of the elements of the 2x2 grid is 0. If the grid is
a b
c d
then the correlation is:
(ad-bc)/sqrt((a+b)*(c+d)*(a+c)*(b+d)).
The only way for this to be infinite is if a whole row/column is 0. In the case where one row/column is 0 and the other two numbers are nonzero, such as
a b
0 0
one should treat the correlation as 0, since lim_{c,d->0} correl(a,b,c,d) = 0.
Also, correlation != covariance in this case, since the standard deviations of the two random variables (nation and state) are not necessarily 1.
Benjamin,
Sorry, I misread your post. I read the four terms in the denominator as the four elements in the matrix; I didn't understand that they represented sums. And you are right that the correlation is the same as the covariance only when the expectation value of the random variables is 0; my mistake was in assuming that there was a 50% chance that a candidate won a state or won the nation.
In that case, what Nate was doing before was not actually equivalent to calculating the covariance. It seems like the correlation, or square root of correlation, might be a good metric for "must-win" states.
For example, if we use the square root of the correlation, we get the following results:
0.5 0.4
0.0 0.1
Score: 0.58
0.5 0.0
0.4 0.1
Score: 0.58
0.9 0.0
0.0 0.1
Score: 1.00
0.4 0.1
0.4 0.1
Score: 0.00
0.3 0.2
0.2 0.3
Score: 0.45
0.4 0.1
0.1 0.4
Score: 0.77
Those all look reasonable to me.
Some might not like the fact that the scores in the first and second matrices above are the same, as the second matrix clearly is more of a swing state.
However, the situation in the first matrix is that the state is 100% must-win for candidate A, and slightly must-win for candidate B, who 40% of the time loses the state and wins the US anyway.
In the second matrix, the state is 100% must-win for candidate B, and slightly must-win for candidate A, who 40% of the time loses the state and wins the US anyway.
So it seems reasonable that they should have the same score.
You make a decent argument for correlation as a measure of "must-win" (and I used to use correlation as my main measure of swinginess over at my website). But I don't think correlation of state and nat'l results quite captures the notion of "must-win."
Consider the following case. The night before the 2000 election, you run a bunch of simulations, and find that:
1) Gore is certain to win CA, CT, DE, DC, HI, IL, IA, ME, MD, MA, MI, MN, NJ, NM, NY, OR, PA, RI, VT, WA, WI; that is, all the states that ultimately did have their votes assigned to him. This would have been 262 electoral votes.
2) Bush is certain to win the rest of the states except NH and FL; that is, all the states that ultimately did have their votes assigned to him, minus FL. This would have been 242 electoral votes.
3) Both NH (4 votes) and FL (25 votes) have popular votes split perfectly 50/50.
Clearly, in this scenario, FL is the only state that matters, since it determines the result of the election completely. And sure enough, the correlation between FL and US results will be 100%.
However, if you believe (as I do) that changes in popular opinion in different parts of the country are correlated, then NH's grid will be something like
0.4 0.1
0.1 0.4
which gives a 60% correlation. This is a counterintuitive result, since NH is totally irrelevant.
I've found that this happens a lot with the correlation measure. It exaggerates the importance of small states that are unlikely to make a difference in the national outcome.
Oops, the Gore states in the above example should sum to 267 electoral votes.
Which completely screws up the scenario -- it's way too early in the morning. What I should have written was:
1) Gore is certain to win CA, CT, DE, DC, HI, IL, IA, ME, MD, MA, MI, MN, NJ, NY, OR, PA, RI, VT, WA, WI; that is, all the states that ultimately did have their votes assigned to him. This would have been 262 electoral votes.
2) Bush is certain to win the rest of the states except NM and FL; that is, all the states that ultimately did have their votes assigned to him, minus FL. This would have been 246 electoral votes.
3) Both NM (5 votes) and FL (25 votes) have popular votes split perfectly 50/50.
Everything else holds, ceteris paribus.
Benjamin,
I understand your concern, but I don't think any scoring mechanism has been discussed in which the "NH Matrix",
0.4 0.1
0.1 0.4
would not result in a high "must-win" score. It could be argued that if this is truly the matrix of simulation results, then winning NH, or equivalently the demographic mix that makes up NH, is imperative.
Based on your argument, you seem to be looking more for what Nate calls the "tipping point" states.
Perhaps another way to address your concern is to focus not on "must-win" states, but "must-win" demographics. What is the correlation between over-/under- performance in demographic groups with national victory?
Nate could include this in his model if he based his projections on the variance of his individual coefficients rather than average variance. The other advantage to this approach would be that it would become possible to identify demographic swing states. For example, as he raised in a post, if Obama overperforms in the south, how likely is it that Georgia will be a tipping point state as opposed to, say, NC?
"Make or Break State" Probability
The goal is to find an intuitive metric, that isolates out the Florida/Ohio/Michigans (big and competitive) but not the Californias (bigger but uncompetitive) or Nevadas (competitive but very small).
The effect a state can have on a selected election is broken into three:
* EV Margin outside range = P(No Effect)
* State causes candidate favored to win in that state to win national Election = P(Fav_Win)
* State causes candidate not favored to win in that state to win national Election = P(NotFav_win)
P(No Effect)+P(Fav_Win)+P(NotFav_Win)=1.0
I propose that the most interesting metric is P("Make or Break") for state = 2 * P(NotFav_Win). Times 2 since it both makes and breaks and gives interesting properties:
* In a fictional election, in which only one state is perfectly 50/50 competitive (and the other states are evenly split) P("Make or Break") will equal 1.0.
* If the state is 100% uncompetitive, or the national election is 100% uncompetitive, then P("Make or Break") equals zero.
* If you sum up the P("Make or Break") for all states you get a value equal or less than 1.0 (I believe).
Implementation:
- For each Monte Carlo run (out of 10000), if underdog win's state, and the state EV was within the margin of victory, increase P(NotFav_Win) for that state by 0.0001.
I think "terms" and operational definitions still need to be sorted out.
I note that you've started using Safe DEM, Likely DEM, Lean DEM, Toss Up, etc. Which sort of adds another layer of term confusion.
A year ago we would have said that IN is a safe Republican state. That it is now a "projected toss-up", has got to mean that McCain is in trouble.
But a year ago we would have also said that MD is a safe Democratic state. That your new terms call it only a "likely" Democratic state, is odd. Unlike IN it doesn't reveal that Obama is in trouble; it reveals that the terms safe, likely, lean, etc. are being used arbitrarily or by a method that is unusual.
Are you using standard deviations, natural breaks, eye-ball, or what to classify the strength of the projection?
So, I'm going to reiterate my proposed definition for "must win" -
A state which not won by a candidate signals disaster, operationalized as I suggested.
The "Battleground" are those states which are "must-wins" for neither candidate, which leaves us depending upon the numbers:
FL, NC, ND, IN, MO, VA, NH, MI, NV, OH, NM, CO, PA, WI, IA, NJ, and CT.
Ordered in strength from McCain to Obama.
Whoever picks up 90 EVs from these, wins.
Now that Hawaii shows up on the list of "must-wins", I am certain that both the terminology being employed as well as the operational definition being used, don't make much sense. Nor does the "tipping point states" seems useful information as defined.
HI apparently is in the winner's column 54% of the time in "close" elections. Surely, almost 100% those 54% will be for Obama. Because, the notion of a close election won by McCain in which he also wins HI is a technical term for "snowball's chance in hell"
Back to the drawing board.
You have:
"'Tipping Point States' are those states that tip the election of the outcome from one candidate to the other. ..."
It seems to me that what we are after is a metric that helps us predict what state(s)will likely be the 2008 analogy to "Florida, FLA, FLA" in 2000 (although, I saw it was TN) and Ohio in 2004. And we want to know the "battleground" states. I porvided earlier a definition will relies upon comparison to "no information solution."
There may also be a desire for an election night returns prediction. As the polls close, the probability will collapse to 100% for that state. So if IN comes out for Obama then all the simulations in which O doesn't win IN are irrelevant. Same kind of thing if McCain wins. Also the exit polls in IN will alter the regression model which will alter the predictions in other states.
Lastly, like a broken record, what is with the fat McCain romp tail? Does CLT or Chebyshev's inequality tell us anything about why that fat red tail is messed up?
I'm posting here because I haven't seen a direct comment on the new "Tipping States" graphic and the removal of the "must-wins".
I like the map idea. There still seems to be some confusion in what you are information you are trying to show.
I get how you measure "tipping states"(i.e. order best to worst for winner and for the state that puts the winner over count 1). But obviously the tipping states are different for Obama than the would be for McCain.
So I really think two maps are in order one for Obama and one for McCain.
Also figuring out the gradient needs to be worked out. Does it really make sense to list the 1%, or to put color on I guess the 0.5% like NJ,CT,DE,MD, TX, ND, SD, and ME?
Eyeballing or ranking is really not a good way to do it, which is why I prefer some actual method for distinguishing between noise (common cause variation) and something really worth looking at (special cause variation).
There are methods:
1. From Shewhart/Deming only list and color those which are more than 3 stds from the mean tipper likelihood.
2. Use "natural breaks method" for coloring (favored by information cartographers) with 2 groups.
3. Comparison vs a complete no-information situation. Roughly speaking (not quite because of differing EVs) in an every state is a coin flip, each state has a 1/51 chance of being the tipper. For 10,000 simulations, I think we'd could say any tipper % which was more than ~2.377% was meaningful. 1/51 + 3 * SQRT(1/51*(1-1/51)/N)
But Obama wins 68.8% of time today. So maybe his N should be only be 6880 and McCain's N should be 3120?
That makes the wheat from chafe line
for Obama at ~2.462%, and
for McCain at ~2.705%
Suggestion 3 seems the easiest method.
情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店經紀,
酒店打工經紀,
制服酒店工作,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
專業酒店經紀,
合法酒店經紀,
酒店暑假打工,
酒店寒假打工,
酒店經紀人,
菲梵酒店經紀,
酒店經紀,
禮服酒店上班,
酒店小姐兼職,
便服酒店工作,
酒店打工經紀,
制服酒店經紀,
酒店經紀,
菲
梵,
cheap nike shoes
nike sports shoes
puma running shoes
puma sneakers
ed hardy bags
ed hardy winter boots
ed hardy t shirts
nike shoes kids
nike women shoes
nike running shoes
ed hardy womens shoes
ed hardy t shirts for men
ed hardy mens jeans
wholesale nike shoes
nike shoes
nike tn dollar
nike air max 90
nike air max 2009
nike air max 2010
nike air max tn
puma cat
ed hardy mens shoes
ed hardy womens hoodies
ed hardy mens tees
puma shoes
ed hardy clothing us
ed hardy clothing
ed hardy outerwear
ed hardy t shirts
ed hardy boots
ed hardy hoodies
cheap ed hardy
ed hardy clothes
cheap ed hardy clothing
ed hardy wholesale
ed hardy men’s
ed hardy women’s
ed hardy kid’s
Truly a nice blog! Thanks for your great work! Wish you a nice day!
cheap puma shoes
cheap sport shoes
ed hardy ugg boots
ed hardy love kills slowly boots
ed hardy love kills slowly
discount puma shoes
nike shox torch
nike tn dollar
cheap nike shox
ed hardy polo shirts
ed hardy love kills slowly shoes
ed hardy wear
cheap nike shox shoes
nike shox r4
ed hardy love kills slowly shirts
ed hardy trousers
ed hardy jackets
puma mens shoes
cheap nike max
discount nike shox
ed hardy t shirts sale
ed hardy womens t shirts
ed hardy boots
cheap puma ferrari shoes
nike mens shoes
nike shox nz
ed hardy womens clothes
ed hardy womens shirts
ed hardy clothes
discount nike running shoes
discount nike shoes
nike shox shoes
ed hardy outerwear
ed hardy womens
ed hardy womens jeans
Post a Comment