Tuesday, July 9, 2013

The quick and easy (and effective) turnout model

After the polling debacle in the British Columbia election earlier this year, many pollsters identified turnout as a major issue in the error, and not in terms of the number of people who turned out. Rather, the problem was with who turned out. I looked at this issue in an article for The Globe and Mail shortly after the B.C. election, and a new article by Angus Reid for Maclean's identifies the same problem. In that article, he mentions that if Angus-Reid had used a turnout model based on who went to the polls in 2009, they would have given the NDP a three-point lead instead of the nine-point lead they actually awarded the party.

While that still would have been off, it would have been a lot closer and would have suggested that the race was getting too close to call. The egg on the faces of the pollsters (and those who base their work on polls, such as myself) would have been a little thinner.

Ever since the federal and provincial elections in 2011, I have played around with trying to guess at how the polls were going to miss turnout. This was employed in the manner of adjustments to the projection based on how the polls had been wrong in similar situations before. This helped me get very close to the results for the B.C. Greens and Conservatives in the election, and got me a little closer to the truth in Alberta (I probably would have had a Wildrose majority instead of a minority without it).

But the results haven't been satisfactory, and the B.C. election showed how necessary it was (Alberta had some other dynamics going on, not related to turnout). The 2011 federal election was also influenced by turnout, as it turned a Conservative minority government into a majority one.

I could try to design some complicated model for estimating turnout, but in most cases the simpler a forecasting model can be the better it is. One of the criticisms of Nate Silver's model is that it is overly complicated for only a slightly better performance than simpler alternatives. He has alluded to that himself, and discusses the benefits of simplicity in The Signal and the Noise (required reading).

I've determined that a very simple turnout model can, in fact, be quite effective. That model has to do entirely with age. Younger Canadians do not go to the polls in as large numbers as middle-aged Canadians, who do not vote in as large proportions as seniors. And, it seems, those who do vote tend to vote in similar ways to those who are older than them.

In order to reflect this, the simple turnout model I will be using in future elections (when the data is available, and as a separate calculation from the polling aggregation) employs the following formula: discard the results of those under the age of 35 and double the results from respondents 55 and older. That's it.

This is easy to do as most pollsters tend to report these age groups. When they use other definitions, a little tweaking needs to be employed. When they do not report these results, I will have to estimate them based on what other polls are showing or ignore them entirely. The results of this method are surprisingly good.

The chart below shows the difference between the final projection and what that final projection would have been if using this simple turnout model (ignoring polls without age breakdowns) in four recent elections. A few notes: in Alberta, I have compared the turnout projection to the unadjusted final projection - I was employing turnout adjustments for every party in that election already. And in Quebec, since Léger and CROP do not report age breakdowns, I have only applied the turnout model to Forum Research's final poll.


As you can see, in every case the turnout model performs better. In some cases, the difference is dramatic: total error would have been reduced by about a third in British Columbia and almost entirely erased in Ontario. The results in that province are particularly striking - no party would have been missed by more than 0.3 percentage points!

For British Columbia, the difference is the most consequential since it would have changed the narrative from an easy NDP victory to a lead of only 2.3 points. In the case of Angus-Reid, if the firm would have used my simple turnout model instead of a more complicated one based on 2009's turnout figures, they would have had the NDP's lead at only two points (42% to 40%) instead of the three reported in the Maclean's article. Ipsos-Reid would have also had a two-point lead (44% to 42%) instead of the seven-point gap they actually had, while Forum Research would have given the Liberals a two-point edge (43% to 41%) instead of the reverse.

In Quebec, Forum's adjusted numbers would have tied Léger's for the best performance, while in Alberta their final poll would have given Wildrose a one-point advantage instead of two points. The final polls of Angus-Reid and Abacus Data would have been made worse, however. Turnout was not the issue in Alberta, it seems.

UPDATE: I had forgotten that Léger did indeed release age breakdowns with their final Quebec poll (they normally do not release this information). As you can see in the discussion in the comments, applying the turnout model to the Léger and Forum polls and averaging them gives a very good result: 33% PQ, 30.5% PLQ, 27% CAQ, and 6% QS for a total error of only 1.5 points!

For Ontario, the results would have been simply astounding. The polls by EKOS Research, Angus-Reid, Abacus Data, and Ipsos-Reid would have all been closer with this very simple turnout model.

This demonstrates how Canadian election polling needs to rely on more dramatic turnout models in order to get closer to election results. This is problematic, however, since it implies that the pollster with the best model, and not the best methodology, would get the plaudits. In order to be acclaimed for having the best methods, pollsters should use a turnout model based on the questions in the poll itself, like Ipsos-Reid did in their most recent federal survey. If a pollster can estimate turnout correctly, as well as the results of the vote, they will have proven themselves exceedingly competent. This requires full reporting, however, as otherwise the public will not be able to determine if the model or the methodology won the day.

For ThreeHundredEight, this simple turnout model should provide readers with some decent expectations of results. At the very least, it should help prevent some surprises. But because the model is so simple, and verges more on a gimmick than anything based on voting behaviour research, I will only be including it as an extra piece of information. The site will abandon turnout modelling for the projection itself, and instead focus on polling averages and ranges based on how polls have been wrong before. Hopefully, this will give readers the best possible understanding of the dynamics of an on-going campaign.

21 comments:

  1. Of course just to complicate things even more, in the 2009 BC election the polls almost all underestimated NDP support and overestimated BC Liberal support and if those polls had taken those turnout factors into account - they would have had an even larger error. I think its a good idea to make some allowances for different turnouts among different age groups - though i think this should only be done during election campaigns themselves when trying to predict an outcome. It should not be done during inter-election periods when we are not predicting a result - we are just measuring public opinion.

    There is always the danger of being generals trying to re-fight the last war. Mitt Romney's people were sure he would win because they were so sure that younger people would not show up at the polls - and they were wrong.

    ReplyDelete
    Replies
    1. My policy of not making any adjustments to polls between elections will not change, this is in reference only to actual campaign projections.

      As to 2009, it seems that this model would have made the Ipsos-Reid poll worse, but would not have changed Angus-Reid's good result. Mustel and Environics don't have age information available for their final 2009 polls.

      Delete
  2. Being a fan of Nate and 538,enjoy the fact that he notes pollsters "biases"?...can we expect to see same moving forward?...note Ipsos CEO Mr Bricker co-penned "Big Shift",latest revelation on Nanos-Poll cited by Tory MP in union disclosure fight under review by polling standards group etc...

    ReplyDelete
    Replies
    1. Those biases he records are statistical, not personal or political (i.e., he doesn't assume a Democratic pollster will be Democrat friendly, he lets the numbers determine that). We don't have enough polling here to do something similar, and it is more difficult because of the multiple parties.

      Delete
  3. Angus Reid's numbers are still well outside the statistical error bars for the BC election, which suggests to me that they still have more problems to address.

    ReplyDelete
  4. On the flip side, do you think that the polling adjustment could be used to highlight just how much sway the youth would have if they voted? (coming at it from the angle of trying to get young people to vote)

    ReplyDelete
  5. I have tried to replicate what AR did with their data and I can not get there based on their final poll. The impact of the youth voting or not was not enough to make 6 percentage points.

    I know I do not have all their data but even with the what I have to work with I can not get it down to a 3 point gap unless I remove all the U35 respondents

    ReplyDelete
    Replies
    1. Wouldn't surprise me if they made other adjustments.

      Delete
  6. Also, Angus Reid talks about their last field work being done on May 8th but they released two surveys after that with later field dates so I am not sure which survey he is indicating he re-weighted

    ReplyDelete
  7. What happens if you apply this turnout model to the last Leger poll during the 2012 Quebec election? I'm asking cause while Forum were the best in predicting the Liberals, they were the worst for the PQ. And we know that they don't weigh according to mother tongue, which is crucial in Quebec.

    Mind doing the calculations for the last Leger (or Crop). Thanks.Interesting stuff.

    ReplyDelete
    Replies
    1. As mentioned, Leger doesn't report their polls by age. That's why I just applied the Forum numbers.

      Delete
    2. What re you toalking about? They totally do. The Leger poll from September 2nd 2012 has the complete age distribution.

      I did the math for you, it would have given:

      PQ: 30.88%
      PLQ: 29.3%
      CAQ: 29.38%

      So better for PQ-PLQ, but worse for CAQ. To be fair, this poll had the CAQ incredibly high among the 65+ demographic. Probably due to a small sample. Something that would be corrected by using many polls and doing an average.

      Delete
    3. You're right! I had forgotten that. Léger never releases those details normally, so I had assumed they didn't last year and didn't double-check.

      Worse for CAQ, correct, but overall a better poll with an error of six points (QS would have been 7%) instead of seven.

      Delete
    4. And yes, averaging the TO model numbers in the Forum and Leger polls gives a very good result: 33 PQ, 30.5 PLQ, 27 CAQ.

      Delete
  8. Quebec Liberals will always get underpolled. It is a no-no to openly admit that you vote Liberal in Quebec, so people stay quiet about it until they are alone at the ballot box: and vote Liberal privately

    ReplyDelete
  9. Except that the Quebec Liberals DON'T always get underpolled. In the 2007 and 2008 Quebec elections the polls actually overestimated their support - so people quite reasonably figured that old pattarn had died off.

    ReplyDelete
    Replies
    1. More a case of governments generally be underpolled I guess.

      Delete
  10. The Quebec Liberals were the incumbent party in Quebec in the '07 and '08 elections too...there goes that theory

    ReplyDelete
    Replies
    1. Bah! You're right. :/ Not sure what I was thinking.

      Delete
  11. This model is frankly dumb. What's going on is that many people under 35 have a preference for, largely, parties which don't exist, and to a lesser extent (as noted by DL) the NDP. Those who prefer parties which don't exist, do not vote.

    So this model will fail spectacularly as soon as a party actually appeals to the under-35s. I don't know when that will happen because it doesn't seem likely any time soon. But pay attention because it's likely to happen sooner or later.

    ReplyDelete
    Replies
    1. Don't worry, I am not applying this turnout model to anything.

      Delete

COMMENT MODERATION POLICY - Please be respectful when commenting. If choosing to remain anonymous, please sign your comment with some sort of pseudonym to avoid confusion. Please do not use any derogatory terms for fellow commenters, parties, or politicians. Inflammatory and overly partisan comments will not be posted. PLEASE KEEP DISCUSSION ON TOPIC.