I agree entirely with this criticism in theory. I would also argue that it is probably better to assume a uniform trend than no trend at all. The polling has become dense enough (particuarly if we include national polls) that we're getting a pretty fair mix of state and national polls in any given week. It is unlikely that Obama could improve his position in say 10 out of 12 state polls, and 5 out of 6 national polls, without his also being likely to have improved his position in other states that weren't polled during this period.
Nevertheless, it would clearly be best if we could have our cake and eat it too: adjust for the most recent trends (in a somewhat cautious way) without having to take some of the state-by-state specificity out of our model. I think I've developed a reasonable way to accomplsih that.
The basic way that we developed the trend estimator was to express each polling result as a combination of two dummy variables, one representing the state/pollster combination (e.g. "Quinnipac-Florida" or "Zogby-Delaware") and the other the week in which the poll was conducted. Each poll in our database can thus can be expressed in the form of a regression equation:
...'Margin' represents the polling result (Obama's total less McCain's), whereas the squiggly little 'e' you see is a term denoting the residual error/uncertainty. Technically speaking, there are coefficient terms on the two dummy variables, though over the long run, these coefficients will by definition equal one. Likewise, the error term will definitionally equal zero over the long run. However, just because the coefficients equal one on average does not mean that they do so in every single case. Another way to express our regression would be to embed the uncertainty term in the time-trend dummy, as follows:
In this equation, m represents a multiplier on the weekly trend variable. It is trivial to solve for m.
In a state which is more impacted by a time-defendant trend, m will be greater than one. In a state that is less impacted by the trend, it will be less than one.
Once we have a derived an m for each poll in our database, we can then regress it against a series of demographic variables in the state where the poll was conducted to see whether there is any pattern to the residuals. Since our particular concern is with recent trends, we weight recent polls much more heavily when conducting this analysis. (A couple of technical notes: we discard any cases in which the pollster has polled the state just once, as m will always be one in these cases. Also, we discard cases where the weekly dummy is a very small number -- anything less than one, in fact -- as this can produce very large, highly erratic values of m).
The demographic regression that I perform on m includes relatively few variables. This is because there aren't all that many useful data points to work with -- we need very recent polls, and for those polls to have been conducted in a state that the pollster surveyed previously -- so there is more risk of overfitting the model. The particular variables we include are a state's partisan ID index, its Kerry vote share in 2004, its black population, its Hispanic population, its average per capita income, its percentage of senior citizens, and its percentage of evangelicals. With the exception of the Kerry and 'partisan' variables, which are too fundamental to the model to be excluded, these variables have the virtue of not being strongly intercorrelated with one another.
As it turns out, there are some patterns in where Obama's bounce is showing up. It is coming in states where Democrats have a strong party identification advantage (no surprise), and seems to be especially strong in states where many voters are registered as Democrats, but where John Kerry did not perform well in 2004. This particularly describes states like West Virginia and Arkansas, where Obama's numbers have improved significantly, and where (assuredly not coincidentally) Hillary Clinton also performed well. The other observable trend is that Obama's bounce has been larger in states where there are not a lot of African-American voters, simply because there are few marginal gains for him to make among that demographic. It will probably always be the case in this election that states with lots of African-American voters will be less responsive to trends in the polling numbers.
This demographic regression allows us to estimate a unique value of m for each state. I cap the values of m at 0.0 and 2.0, respectively. The average value of m will not necessarily be 1.0, as it could be the case that particular kinds of states are especially predisposed to a bounce, and those states have also been polled more frequently (in fact, this does appear to have been the case to a small degree over the past couple weeks). The present m values for some representative states are as follows:
Kentucky 1.98In adjusting our polling numbers, we take the trend from our LOESS estimator and multiply it by m. For example, say that our LOESS curve estimates that Barack Obama is polling 3 points stronger now on average than he was three weeks ago. If we take a 3-week old poll from Kentucky, we will adjust it upward (toward Obama) by (3 x 1.98) = 5.94 points. In California, we will adjust it by (3 x 0.97) = 2.91 points. And in Arizona, we would adjust it by only 0.87 points.
New York 1.37
North Carolina 1.01
Taking into account the sensitivity of individual states to time trends produces a slightly less impressive result for Obama than we had been figuring on over the weekend, as his bounce seems to be most profound in states where he was already well ahead (like Massachusetts), or where he is probably too far behind to catch up (like Oklahoma). Still, we have seen at least some bounce for Obama across a large and relatively diverse array of states, and can expect to see that trend manifested in other states where new polls will come out unless his bounce begins to recede nationally.