Particularly before the Newsweek poll came out last night, which showed an oversized, 15-point post-primary bounce for Barack Obama, there had been an apparent discrepancy in the size of the bounce that Obama had received in state and national polling.
Firstly, let's update our chart of Obama's numbers in state polling. What follows is a comparison of the Obama less McCain margin in all states that have been polled since the primaries concluded that were also surveyed by the same agency at some point in May:State / Agency May June Change
Across 29 state polls, Obama's bounce is 4.1 points -- down slightly from our estimate earlier in the week. The bounce appears to be roughly normally distributed; if you drew a histogram of the bounce in individual states, it would resemble a bell curve.
AR Rasmussen -24 -9 +15
KY SurveyUSA -24 -12 +12
OH Quinnipiac -4 +6 +10
KS Rasmussen -19 -10 +9
ME Rasmsseun +13 +22 +9
GA Insider Adv -10 -1 +9
FL Quinnipiac -4 +4 +8
NY Siena +11 +18 +7
WA Rasmussen +11 +18 +7
NH Rasmussen +5 +11 +6
PA Quinnipaic +6 +12 +6
WI Rasmussen -4 +2 +6
AK Rasmussen -9 -4 +5
IA Rasmussen +2 +7 +5
CA SurveyUSA +8 +12 +4
VA Rasmussen -3 +1 +4
MI Rasmussen -1 +3 +4
WA SurveyUSA +14.0* +17 +3
NV Rasmussen -6 -3 +3
WI SurveyUSA +6 +9 +3
FL Rasmussen -10 -8 +2
NC Rasmussen -3 -2 +1
NC Civitas -5 -4 +1
OH Rasmussen -1 -1 0
MN Rasmussen +15 +13 -2
MN SurveyUSA +5 +1 -4
CO Rasmussen +6 +2 -4
IA SurveyUSA +9 +4 -5
OR Rasmussen +14 +8 -6
==========================================
AVERAGE -0.1 +4.0 +4.1
* Average of all May surveys.
What about Obama's bounce in national polling?State / Agency May June Change
In national polling, Obama's average bounce has been 2.7 points. That isn't all that far away from our state polling bounce to begin with, but there are a couple of additional things to notice that make discrepancy even smaller.
US Newsweek 0 +15 +15
US Harris +4 +11 +7
US Economist -3.2* +3 +6.2
US Rasmussen -1.1* +5 +6.1
US USA Today +2.0* +6 +4
US AP-Ipsos +4 +7 +3
US Cook/RT +1 +4 +3
US Gallup +0.1* +2 +1.9
US ABC/WaPo +7 +6 -1
US Diageo/Hotline +4 +2 -2
US Zogby +8 +5 -3
US IBD/Tipp +11 +3 -8
==========================================
AVERAGE +3.1 +5.8 +2.7
* Average of all May surveys.
Firstly, the national polls that showed Obama regressing this month all had him at a pretty high number before. ABC/WaPo had him 7 points ahead of John McCain last month, Zogby had him 8 points ahead, and IBD/Tipp had him 11 points ahead. The most recent version of each of these polls all showed Obama regressing. But really all Obama may have been doing is regressing toward the mean. Does anybody really believe that Obama was ahead by 7, 8 or even 11 points last month? That's certainly not the impression one was getting from the state-by-state polling results we were seeing in May. If Obama wins this election by 11 points, you will see things happen like him winning Pennsylvania by 16 points, or winning states like Texas, Arizona and Kansas, or winning Florida by high single digits. Those aren't the sorts of results we are seeing now, and they certainly weren't the sorts of results we were seeing in May.
So let's instead focus only on those pollsters that release national polls on a weekly-or-better basis. This means Rasmussen and Gallup, which release numbers daily, and Economist/YouGov, which releases numbers weekly. These polls are going to be far less subject to problems with small sample sizes than surveys that are conducted just once a month.State / Agency May June Change
These three polls show an average bounce of 4.7 points, barely different from our finding in the state-by-state results. If you exclude the Economist's poll and focus only on the two daily trackers, Obama's present bounce is 4.0 points, which is even closer to the target established by the state polls.
US Economist -3.2* +3 +6.7
US Rasmussen -1.1* +5 +6.1
US Gallup +0.1* +2 +1.9
===========================================
AVERAGE -1.4 +3.3 +4.7
* Average of all May surveys.
To the extent that one is going to use national polling results to divine trends, one ought to give a relatively large weight to the two national tracking polls. Gallup and Rasmussen are each surveying about 25,000 voters per month for their national trackers, as compared to a once-a-month survey like ABC/WaPo, which might poll one-twentieth that many people. Although there are some diminishing returns associated with large sample sizes, it is also not a case where all polls should be treated equally.
Saturday, June 21, 2008
Obama's bounce in state vs national polls
-- Nate at 2:55 PM 25 Comments...
Labels: bounces, national polls
Wednesday, June 18, 2008
Obama's bounce in state polling
Below is a just-the-numbers list of all states that have been polled since the conclusion of the Democratic primaries that were also polled by that same polling firm in May; positive numbers indicate an Obama lead.State May June Change
AR Rasmussen -24 -9 +15
KY SurveyUSA -24 -12 +12
OH Quinnipiac -4 +6 +10
KS Rasmussen -19 -10 +9
FL Quinnipiac -4 +4 +8
NY Siena +11 +18 +7
WA Rasmussen +11 +18 +7
PA Quinnipaic +6 +12 +6
WI Rasmussen -4 +2 +6
IA Rasmussen +2 +7 +5
VA Rasmussen -3 +1 +4
MI Rasmussen -1 +3 +4
WA SurveyUSA +14* +17 +3
NC Rasmussen -3 -2 +1
NC Civitas -5 -4 +1
MN Rasmussen +15 +13 -2
MN SurveyUSA +5 +1 -4
OR Rasmussen +14 +8 -6
=========================================
AVERAGE -0.7 +4.7 +5.4
* Average of two May surveys.
-- Nate at 8:10 AM 99 Comments...
Labels: bounces
Tuesday, June 17, 2008
Today's Polls, 6/17
An odd day of polling, but one attention-grabbing result dominates the rest. That is from Ohio, where Public Policy Polling has Barack Obama ahead by 11 points. While Public Policy Polling developed a reputation as being somewhat Obama-friendly in the primaries, its track record is fairly strong, and its prior Ohio poll -- taken way back in March -- had shown McCain ahead by 8 points. As Ohio is probably the single most important state in this election (it's by no means the only important state, but it's pretty darned important), this result is enough to drive Obama past the 67 percent threshold in our overall electoral projection; we presently have him as about a 2:1 favorite to win the election.
In Minnesota, however, SurveyUSA has Obama with just a 1-point lead over McCain. SurveyUSA's methodology takes a more fluid view of party identification, and so it tends to produce results that can be more encouraging for the non-dominant party in a particular state. Its most recent previous Minnesota poll, taken back in May, had shown Obama ahead by 6 points.
In North Carolina, Civitas has John McCain ahead by 4 points -- down a tick from the 5-point lead he held a month ago. Obama has yet to show a lead in North Carolina, but has trailed by somewhere between 2 and 4 points in the three most recent polls of the state.
There is also a SurveyUSA poll out in Kentucky that shows Obama trailing by 12 points. This poll made it across our wires too late to be included in our metrics, but it speaks to the extent that Obama is starting to improve his numbers among lapsed, Clinton-leaning Democrats, particularly in Appalachia. Obama had trailed by 24 points in Survey USA's May poll of Kentucky, and by as many as 36 points previously.
There are also a series of national polls out, all of which have consolidated in the area of Obama +4, exactly the popular vote margin that we attribute to him based on the state-by-state polling results.
So what to make of the meme that Obama's numbers haven't been bouncing? The only way that you can come to that conclusion is if you cherrypick results. There have been a few dozen polls released since Clinton conceded the primaries, and our methodology extracts an average bounce of about 4 points between them. Four points is not so large that some individual polls won't show a bounce, particularly if the bounce is concentrated in particular states and regions. But bounce Obama has, and the longer Republicans remain in denial about it, the less time they'll have to catch up.
-- Nate at 3:11 PM 46 Comments...
Labels: bounces, kentucky, minnesota, north carolina, ohio, today's polls
Monday, June 16, 2008
A Refinement to the Adjustment, Part II
The principal criticism of the Trend Adjustment that I introduced on Saturday is that it assumed that the trend was uniform across all states. Even if we can demonstrate that Barack Obama has gained, say, 3 points in his polling on average, and even if that average was taken across a fairly robust group of state and national polls, it might not hold that the bounce would be felt the same in Utah as it might be in Massachusetts.
I agree entirely with this criticism in theory. I would also argue that it is probably better to assume a uniform trend than no trend at all. The polling has become dense enough (particuarly if we include national polls) that we're getting a pretty fair mix of state and national polls in any given week. It is unlikely that Obama could improve his position in say 10 out of 12 state polls, and 5 out of 6 national polls, without his also being likely to have improved his position in other states that weren't polled during this period.
Nevertheless, it would clearly be best if we could have our cake and eat it too: adjust for the most recent trends (in a somewhat cautious way) without having to take some of the state-by-state specificity out of our model. I think I've developed a reasonable way to accomplsih that.
The basic way that we developed the trend estimator was to express each polling result as a combination of two dummy variables, one representing the state/pollster combination (e.g. "Quinnipac-Florida" or "Zogby-Delaware") and the other the week in which the poll was conducted. Each poll in our database can thus can be expressed in the form of a regression equation:
...'Margin' represents the polling result (Obama's total less McCain's), whereas the squiggly little 'e' you see is a term denoting the residual error/uncertainty. Technically speaking, there are coefficient terms on the two dummy variables, though over the long run, these coefficients will by definition equal one. Likewise, the error term will definitionally equal zero over the long run. However, just because the coefficients equal one on average does not mean that they do so in every single case. Another way to express our regression would be to embed the uncertainty term in the time-trend dummy, as follows:
In this equation, m represents a multiplier on the weekly trend variable. It is trivial to solve for m.
In a state which is more impacted by a time-defendant trend, m will be greater than one. In a state that is less impacted by the trend, it will be less than one.
Once we have a derived an m for each poll in our database, we can then regress it against a series of demographic variables in the state where the poll was conducted to see whether there is any pattern to the residuals. Since our particular concern is with recent trends, we weight recent polls much more heavily when conducting this analysis. (A couple of technical notes: we discard any cases in which the pollster has polled the state just once, as m will always be one in these cases. Also, we discard cases where the weekly dummy is a very small number -- anything less than one, in fact -- as this can produce very large, highly erratic values of m).
The demographic regression that I perform on m includes relatively few variables. This is because there aren't all that many useful data points to work with -- we need very recent polls, and for those polls to have been conducted in a state that the pollster surveyed previously -- so there is more risk of overfitting the model. The particular variables we include are a state's partisan ID index, its Kerry vote share in 2004, its black population, its Hispanic population, its average per capita income, its percentage of senior citizens, and its percentage of evangelicals. With the exception of the Kerry and 'partisan' variables, which are too fundamental to the model to be excluded, these variables have the virtue of not being strongly intercorrelated with one another.
As it turns out, there are some patterns in where Obama's bounce is showing up. It is coming in states where Democrats have a strong party identification advantage (no surprise), and seems to be especially strong in states where many voters are registered as Democrats, but where John Kerry did not perform well in 2004. This particularly describes states like West Virginia and Arkansas, where Obama's numbers have improved significantly, and where (assuredly not coincidentally) Hillary Clinton also performed well. The other observable trend is that Obama's bounce has been larger in states where there are not a lot of African-American voters, simply because there are few marginal gains for him to make among that demographic. It will probably always be the case in this election that states with lots of African-American voters will be less responsive to trends in the polling numbers.
This demographic regression allows us to estimate a unique value of m for each state. I cap the values of m at 0.0 and 2.0, respectively. The average value of m will not necessarily be 1.0, as it could be the case that particular kinds of states are especially predisposed to a bounce, and those states have also been polled more frequently (in fact, this does appear to have been the case to a small degree over the past couple weeks). The present m values for some representative states are as follows:Kentucky 1.98
In adjusting our polling numbers, we take the trend from our LOESS estimator and multiply it by m. For example, say that our LOESS curve estimates that Barack Obama is polling 3 points stronger now on average than he was three weeks ago. If we take a 3-week old poll from Kentucky, we will adjust it upward (toward Obama) by (3 x 1.98) = 5.94 points. In California, we will adjust it by (3 x 0.97) = 2.91 points. And in Arizona, we would adjust it by only 0.87 points.
Arkansas 1.93
Massachusetts 1.76
Oklahoma 1.66
New York 1.37
Michigan 1.05
North Carolina 1.01
California 0.97
Pennsylvania 0.93
Florida 0.71
Nevada 0.70
Ohio 0.54
Arizona 0.29
Utah 0.00
Taking into account the sensitivity of individual states to time trends produces a slightly less impressive result for Obama than we had been figuring on over the weekend, as his bounce seems to be most profound in states where he was already well ahead (like Massachusetts), or where he is probably too far behind to catch up (like Oklahoma). Still, we have seen at least some bounce for Obama across a large and relatively diverse array of states, and can expect to see that trend manifested in other states where new polls will come out unless his bounce begins to recede nationally.
-- Nate at 2:43 AM 33 Comments...
Labels: bounces, clinton, defectors, demographics, meta, methodology, obama, site
A Refinement to the Adjustment, Part I
In consideration of everyone's feedback, I am making two refinements to the timeline adjustment that I introduced yesterday.
The first refinement is to slightly dampen the effect of the timeline adjustment at the endpoints of the curve. The second is to use a state-specific timeline adjustment, rather than a one-size-fits all model. I will describe the first adjustment in this post.
Before I continue, I want to make clear what the goal of this project is. I want to provide you, at any given moment in time, with the best possible projection of what's going to happen in the November election. This is inherently a forward-looking exercise. If what you're interested in instead is simply a summation of what the polls are telling you now, there are plenty of other websites that can provide that for you. I do require that the projections be based on objective and quantifiable evidence. For example, I'm not going to say: "McCain is awful on the campaign trail, and people don't realize it yet. Let's take 5 points off his averages". Nor am I going to say "I heard from a well-connected source that the Republicans have put together a devastating attack ad on Barack Obama. We'd better cut his win percentage by 10 points". But that doesn't mean I'm going to limit myself to simply averaging the current polls.
* * *
In the long methodological discussion that we have had over the past couple days, there is one important point that hasn't been raised. Suppose you grant me that my timeline adjustment does an essentially optimal job of telling you what would happen if the election were held today? Does it necessarily follow that that the best projection of what would happen if the election were held today is also the best projection available to us of what would happen if the election were held tomorrow?
In other words, suppose that we are holding an election for the President of Hell. The candidates are Gary Condit and Mark Foley. In June, Foley leads by 2 points. In July, Foley leads by 5 points. What is our best possible projection in July of what the outcome will be in November? There are three possible answers to that question.
1. The random walk hypothesis. There is no way to guess whether the polls will move upward or downward in any given future period. Therefore, if a candidate's current lead in the polling is 5 points, our best guess at the eventual election outcome is 5 points.
2. The bounce hypothesis. Polls have some tendency to regress back to the mean established in previous periods. Therefore, if a candidate leads by 2 points in June, and by 5 points in July, our best guess is that he will probably finish somewhere between 2 points and 5 points ahead.
3. The trend hypothesis. This is sort of the opposite of the bounce hypothesis. Polling from previous periods does tell us something, but those polls are inversely related with the eventual outcome. So if Foley leads by 2 points in June and 5 points in July, that is evidence that he is trending upward, and is likely to eventually win by some number greater than 5 points.
I've tried to produce an answer to this question in several different ways, revisiting it this weekend by using Andrew Gelman's dataset. In some cases, like in 1988 or the summer of 1992, when the movement in the polls was fairly unidirectional for long periods of time, the more recent your poll was, the better off you'd be. In other cases, like in 2000 and 2004, the polls tended to oscillate, as though regressing back toward the mean; a bounce was usually just a bounce.
We can model this more formally by using different LOESS curves. The smoothness of a LOESS curve is determined by something called the smoothing parameter. A smoothing parameter of .7 or .8 will give you a very conservative curve that reacts slowly to new information (put differently, it still places some value in old information). A smoothing parameter of .3, on other hand, will give you an extremely volatile curve that gives a strong presumption to the most current information.
I went back and tried to evaluate whether there was an optimal smoothing parameter based on the weekly national polling averages from 1988, 1992, 2000 and 2004 (skipping 1996 because my dataset is scattershot for that year). I was looking for an answer in the following form: with X weeks to go until the general election, you will minimize your error by using smoothing parameter Y. If Y is a smaller number, like .3, that would be evidence for the random walk hypothesis or perhaps even the trend hypothesis. If Y is closer to .8, that would be evidence for the bounce hypothesis.
Unfortunately, there is no clear answer to this question. Different parameters performed better or worse in different elections, and at different points in those elections. All smoothing parameters from about .3 to .8 produced roughly the same average error when applied to the weekly polling data, with a possible exception of the two weeks immediately prior to the election, when a smaller parameter (e.g. a more sensitive curve) may be more desirable.
What this tells us is that it's frankly a judgment call as to how much emphasis we want to give to the most recent polling results. Neither the random walk hypothesis nor the bounce hypothesis can really be ruled out (we can probably rule out the trend hypothesis, however, as that would require low smoothing parameters to be demonstrably better than higher ones).
What I wound up doing was using a hybrid smoothing parameter, which is conservative toward the endpoints of the curve, but more aggressive in the middle of the curve.
There is a good, logical reason to do this, namely that we have less information available to us at the endpoints of the curve than we do in the middle. We can fairly clearly isolate the impact of something like Jeremiah Wright's first appearance on the scene, because we can look at polling both before and afterward: we see Obama's polls tumbling and then recovering. However, in trying to evaluate the polls right now, we only know what the polls were in the past; we do not know in which direction they'll move in the future. The hybrid curve allows us both to be fairly aggressive in isolating events that might have impacted the polls in the past, but also erring on the side of caution about the present direction of the polls.
The net effect of all of this is a somewhat more conservative estimate of Barack Obama's current strength in the polling; we know he's bouncing, but we don't know how long that bounce is going to last. If his polling remains strong into next week, that will be three weeks in a row where his numbers have shown a marked improvement, and even the most conservative estimator will start to give him credit for more or less the entirety of his bounce. If he and McCain regress back to a tie, on the other hand, we may even start to take a point or two away from polls that were conducted over the past couple of weeks. This is one thing, by the way, that I think some of the McCain supporters around here are missing. If Obama's post-nomination bounce does prove to be a temporary thing, we will be able to adjust for this more quickly, and recognize that states that were polled frequently during this period may not be as strong for him as they appear.
-- Nate at 12:35 AM 31 Comments...
Labels: bounces, history, meta, methodology, site
Monday, April 28, 2008
Clinton's Bounce
For the record, she got 2 points in Rasmussen, and 10 points in Gallup, or 6 points on average, by the standards I set up last week to evaluate this stuff. The bounce has since receded slightly to a 4.5-point average.
But what's interesting about numbers in that 4-6 point range is that this is about what she'd need to have an even-steven chance of winning the +Florida popular vote count -- which likely won't win her an argument in Denver but might get her through the doors.
It's harder to tell what's going on at the state level. In Indiana, Clinton either gained or lost points from the previous SurveyUSA poll, depending on what you consider a SurveyUSA poll and what you don't. PPP showed some significant movement to her in North Carolina, but PPP also probably had some work to do on its model after its poor performance in Pennsylvania.
Obama would benefit from a change in the media cycle -- that much I'm pretty certain about.
-- Nate at 3:16 PM 3 Comments...
Thursday, April 24, 2008
Will Clinton get a Keystone Bounce?
If recent history is any guide -- probably not a big one.
So far, the evidence on whether there will be any post-Pennsylvania movement in the national tracking polls is mixed. Clinton has gained 5 points on Obama since Tuesday's results were released in the Gallup tracker. Remember -- that includes just one complete day of post-Pennsylvan
25 comments
Post a Comment