Tuesday, May 10, 2011

Ranking the Pollsters (Updated)

Now that the dust has settled, we can take a look at how each of the pollsters did and assess their performances in the past election.
While it is true that polls are only a snapshot in time, and so are limited in their ability to predict into the future, polls are judged by how closely they align with election results. There is no other way to judge them. Certainly, it is possible that these polls were tracking voting intentions on their field dates accurately, and so cannot be judged for missing out on changing intentions over the last days or hours of the election campaign. While that is a valid argument, a means of assessing pollsters is required and comparing their final poll results to the actual results of the election is the only measuring stick we have. And pollsters themselves use that measuring stick, so we can certainly hold them to their own standards.

All polling firms have been assessed by their final poll results. Nanos Research has been assessed according to their final two-day report that was featured by CTV and The Globe and Mail. Crop, Innovative Research, and Environics are being assessed by their final polls, though they were all taken one week or more before the day of the vote. That may seem unfair, but these are the final numbers we have from these firms and they need to be assessed by some measure. Consider it a penalty for not releasing data closer to the end of the campaign.

Pollsters are assessed by their average error per party. In other words, being off by a total of 20 points for the five national parties combined would be the equivalent of having an average error of 4.0 per party. The party who performed best is highlighted in white. The polling firms are placed in the order of the date of their final poll. We'll start at the national level.
Angus-Reid had the best result for the second consecutive election, with an average error of only 1.0 per party. Nanos Research, Ipsos-Reid, Harris-Decima, and Léger Marketing also performed well, all with an average error of 1.4 points or lower. Of those who reported close to election day, EKOS and Compas performed worst.

See the bottom of this post for a discussion of how the margin of error could be taken into account in this assessment.

Note that only Compas, who was nevertheless very far off, was the only polling firm to over-estimate Conservative support. All others under-estimated their support by almost two points or more. The New Democrats were relatively well-tracked, but only Abacus and Ipsos-Reid had the Liberals lower than their actual result.

Disregarding the results from Environics and Innovative, the best polling method turned out to be the online panel, with an average error of 1.4 points per party. Traditional telephone polling scored 1.6 points, while IVR stood at 2.0 points' worth of error per party.

Also note that those pollsters who do not prompt for the Green Party (Ipsos-Reid) or any party (Nanos Research) generally predicted the Green Party's eventual tally better than those who prompted for the Greens.

Now let's move to the regional assessments, going west-to-east and so starting with British Columbia.

Here, Léger Marketing and Compas scored best with an average error of two points per party. Harris-Decima, at an error of 2.5 points, also did well in this province.

Nanos Research and Ipsos-Reid, who both put the Liberals in the mid-20s, did worst here.

As at the national level, only Compas over-estimated Conservative support. All of the others under-estimated their support, mostly to the benefit of the Liberals and New Democrats. But results varied widely, with Liberal support being pegged at between 10% and 26%, while the NDP was scored at between 25% and 40%. Small sample sizes are partly to blame. Green support, on the other hand, was well tracked.

In Alberta, Angus-Reid did best with an average error of only 1.3 points. The next best was Abacus Data, at 2.1 points. Harris-Decima did worst.

The pollsters had an easier time discerning Conservative support in this province, with two of the pollsters being exactly right. Others (Ipsos-Reid) inflated Tory support while yet others under-estimated them (Harris-Decima, EKOS, Forum). The NDP and Liberals were well tracked, though only Abacus had them in single-digits. All-in-all, the pollsters did well in Alberta.

Compas and Nanos, however, grouped Alberta with Saskatchewan and Manitoba. In these three provinces, Nanos bested Compas by an average of 0.7 points. Compas appeared to give some of the Liberal support to the Tories, while Nanos gave some of the Tory support to the NDP.

In the more usual grouping of Saskatchewan and Manitoba, Angus-Reid did best with an average error of 1.5 points, closely followed by Ipsos-Reid. EKOS struggled here.

Generally, the pollsters had an easier time pinpointing Conservative support here, with three of the pollsters within about a point of their final result. The New Democrats were also well represented, with the biggest error being only of 3.1 points among those polling firms active in the final days of the campaign. The Liberals were also well tracked, while only Ipsos-Reid correctly had the Greens at 3% in these two provinces.

Ontario was the most important province to poll correctly. Angus-Reid did the best here, with an average error of only 1.7 points. They were closely followed by Harris-Decima, while Ipsos-Reid did the worst among those who polled in the final days of the campaign.

But the problem here, as elsewhere, was in recording Conservative support. Again, only Compas over-estimated the Tories while all others had them at 41% or lower. That error had great consequences in determining whether they would win a majority or minority government.

The pollsters, except Compas, over-estimated NDP support, but only by a little. Some (EKOS, Harris-Decima, and Angus-Reid) were very close to accurately predicting the NDP's support, but others had them well over their actual result. The Liberals were generally well polled, however, two pollsters being on the money and another three being within a point.

Quebec provided the biggest surprise on election night, but amazingly the pollsters did very well in the province. They did better than they did in Ontario, which usually has larger sample sizes.

Ipsos-Reid takes the crown in Quebec, but was closely followed by Nanos, Forum, and Léger. Compas and Crop did the worst, though they were still relatively close.

Three pollsters were almost exactly right in predicting the NDP's result, while four others had the NDP over 40%. The pollsters had a bit more trouble with the Bloc, with only Nanos and Forum indicating that they would end up in the low 20s. The pollsters did an excellent job in recording Liberal support, but did a little worse with the Conservatives. All in all, though, the pollsters did an excellent job in Quebec.

That was not the case in Atlantic Canada, which was the worst polled region in Canada. Granted, it usually is polled in very small numbers but the same amount of people are usually surveyed in the Prairies, and there the pollsters did much better.

Harris-Decima was closest with a very good average error of 1.3 points, while Léger was close as well. But EKOS, Angus-Reid, Compas, and Ipsos-Reid were all off by six points or more, with seven pollsters putting the NDP in first in the region.

Results varied wildly, with the Conservatives pegged at between 26% and 44%, the NDP between 28% and 46%, and the Liberals between 11% and 30%, among pollsters active at the end of the campaign. Only Harris-Decima had the NDP below 30%, while most under-estimated Liberal support. Atlantic Canada was a wash.

And that brings us to our final ranking. The pollsters have been ranked on two scores: their national average error (recorded in the first chart at the top of this post) and their average regional error. Combining the two rankings anoints Angus-Reid as the best pollster in 2011, but also awards Harris-Decima, Léger Marketing, and Nanos Research with honourable mentions.
Nationally, Angus-Reid comes out on top but was only marginally better than Nanos, Ipsos-Reid, Harris-Decima, and Léger Marketing.

Forum Research and Abacus Data, in their first federal campaigns, did very respectably.

EKOS struggled while Compas was the worst pollster active in the final days of the campaign. Environics and Innovative Research might have done better had they polled closer to election day.

However, national results can be close merely because all of the regional errors cancel each other out. That was the case with Ipsos-Reid, which drops to 9th place in the average regional error ranking.

Instead, Léger Marketing takes the top spot on its regional results with an average error of 2.4 points. Harris-Decima and Angus-Reid were close behind with an average error of 2.7 and 3.0 points, respectively, while Abacus Data placed 4th on the regional ranking.

Regionally, of those active in the final days of the campaign the online pollsters were off by an average of 2.9 points. Traditional telephone surveys were off by 3.7 points, while IVR surveys were off by 3.9 points. Though it could have been blind luck, the much-maligned online surveys performed best in the 2011 federal election.

Margin of Error Update

Some pollsters and commenters have brought up the issue of the margin of error, and whether that should be taken into account to judge the performances of the polling firms.

From a statistical standpoint, polling firms should be able to accurately predict the result of each party within the poll's margin of error. Though it is hardly the focus of any press release or media report, the margin of error is (or should be) always included in any poll report, and thus if a poll gets the result correct within the margin of error it is a technically accurate poll. But this benefits polling firms with smaller samples, who have larger margins of error within which to work.

If we use this standard, at the national level we would have to eliminate all but Ipsos-Reid, Abacus Data (assuming a random sample for this online pollster), and Nanos Research. These are the only three firms whose national poll findings were within the margin of error of the sample.

If we take it to the next level, assessing each polling firm by the margin of error for each individual party (i.e., the margin of error for the Green Party is not the same as the margin of error for the Conservative Party because of their different levels of support), we have to eliminate all but Nanos Research from this assessment. Ipsos-Reid would fail to pin down NDP support taking this margin of error into consideration, while Abacus would be wrong for the Green Party.

But why stop at the national level? We do not elect presidents - regional data is as important, if not more important, than the national horserace numbers. If we bring it down to the regional level, then even Nanos has to be eliminated as their result of 23.6% for the Liberals in British Columbia (rather than their actual 13.4%) is outside of this particular sample's 7.7% margin of error. No pollster would survive this assessment, as no pollster would fit this criteria even taking into account the 95% confidence.

Just as problematic methodology will not be corrected by over-sampling, it can also be masked by the large margin of error of smaller samples. In the end, Nanos should be commended for having its national poll results within the margin of error, with honourable mentions also going to Ipsos-Reid and Abacus Data. But from my perspective, polling firms that conduct surveys with large samples, thereby giving us more reliable regional results, should not be thrown under the bus. Polls are reported for the consumption of the general public, and the general public is interested in how accurately the polls predict actual outcomes, both nationally and regionally. While I agree that some consideration should be taken for the margin of error in assessing the performance of the pollsters, which I have now done, I believe that my original assessment stands.

25 comments:

  1. Any idea why EKOS was off by so much? They underestimated Conservative support by more than 10% points in seven of the ten provinces, and were seriously off in their national projections. And while this could be the 1 poll out of 20 that's just going to be wrong, all of EKOS's polls in the election put the Conservatives well below where other polls from around the same time had them. That seems to indicate something was seriously wrong wtih their methodology. Any thoughts on what?

    ReplyDelete
  2. If I recall what I have read correctly, EKOS has developed a new polling method that they couldn't use during the campaign as it hadn't been fully tested yet. According to what I've read, this new methodology should peg Conservative support more accurately.

    ReplyDelete
  3. What about incorporating the MOE in to the evaluation?

    For instance, for the Conservatives, if one company polled Nationally at 42.5 +/- 3 and another polled at 41.5 +/- 1 - the first party is the one that did a better job even though their result is further away from the actual result.

    Based on a company's MOE and the confidence levels they're using, you should be able to estimate the likelihood that they'd make an error as big (or small) as they did. You could then use this information to rank each of the companies. This could also be more helpful in determining how to weight your polls in the future.

    ReplyDelete
  4. Mike,

    That is an important factor, but I'm more concerned with results.

    The MOE can be slightly misleading. A +/- 3.1 MOE gives a lot of leeway for a national poll, but statistically speaking if the poll reports a result of 40% for a party, it is far more likely that if the poll is done correctly the results will be closer to 40% than it would be to 43.1%.

    Bringing MOE into the equation would benefit pollsters who use very small sample sizes - yet even if the MOE is 10 points the results are reported as being, for example, 30% rather than "between 20% and 40%". I think the pollsters should be held to that standard.

    ReplyDelete
  5. "Bringing MOE into the equation would benefit pollsters who use very small sample sizes - yet even if the MOE is 10 points the results are reported as being, for example, 30% rather than "between 20% and 40%". I think the pollsters should be held to that standard."

    If you're evaluating them for the usefulness of your model, how they report the results shouldn't matter. Most polls that I read publish both their projection and the MOE - evaluating them based on a headline that is used to attract the attention of the lowest common denominator does not seem to be adding much value.

    "The MOE can be slightly misleading. A +/- 3.1 MOE gives a lot of leeway for a national poll, but statistically speaking if the poll reports a result of 40% for a party, it is far more likely that if the poll is done correctly the results will be closer to 40% than it would be to 43.1%."

    That's a beautiful little strawman that you've built ... what is your name for him?

    In this situation, you are comparing across different polling firms, not within one. So, what you are claiming in your evaluation is that you prefer a poll that projected the Conservatives nationally at 41.5 +/- 1 rather than a poll that projected the Conservatives nationally at 42.0 +/- 3.

    One of those projections is wrong. One of those is right. You prefer the one that is wrong.

    "Bringing MOE into the equation would benefit pollsters who use very small sample sizes"

    I guess it all depends on what you're looking for. As someone who is trying to project the outcome of an election, I think knowing the outcome is a possibility would be valuable. If you don't consider MOE, then you completely discard this information.

    ReplyDelete
  6. Margin of error is one of three factors taken into account to weigh polls, so it is already incorporated into the weighting system. There is no reason to incorporate it into the system twice.

    ReplyDelete
  7. Éric said...

    "If I recall what I have read correctly, EKOS has developed a new polling method that they couldn't use during the campaign as it hadn't been fully tested yet. According to what I've read, this new methodology should peg Conservative support more accurately."

    EKOS was playing with what was, essentially, a "likely voter" screen - They called it their "Voter Commitment Index" (A rose by any other name....)

    80% of people 65+ actually vote, less than 40% of those under 25 actually vote.... Incorporating this into your polling is, well, blindingly obvious... IMHO.

    ReplyDelete
  8. I wonder if the average error really is an appropriate methodology when considering the effectiveness of the pollsters.

    A pollster who was off on the Green results by 2% would have horribly missed their actual result, but even that result would diminish the average error were that pollster to have been off on the CPC by a larger absolute amount --as almost everyone was.

    I guess what I'm saying is that pollsters will get an average (absolute) error that looks more respectible if they were off on the smaller parties by amounts that could be proportionately just as off, (if not moreso) as they were on the larger parties.

    I don't know if any pollster would benefit from a ranking based on average standard deviations, (Can that be done here? It's been awhile since stats class) but I think the average error looks deceptive. People will look at the average error as if it can be applied to the larger parties, when it fact its only relevant when pooled with smaller parties...or vise versa.

    Does this make sense?

    ReplyDelete
  9. Re: margin of error concerns, the post has been updated.

    ReplyDelete
  10. It would be interesting to see which pollster's regional numbers, when dropped into your seat model, came closest to predicting the final seat totals.

    For example, those Nanos numbers in BC might have a significant effect on seat numbers. And the different regions have different marginal effects; a pollster missing CPC support by 5 points in Alberta might not move a single seat, but missing CPC support by 5 points in Ontario might move 20.

    ReplyDelete
  11. This might be more work than it's worth but I'd be interested to know how the average errors would come out if the results from the Green Party were omitted.

    In my view, the error for the Green Party is irrelevant. If they're off by 3 points, it might mean the difference between 0 and 1 seats. For the CPC on the other hand, that same error of 3 points is the difference between majority and minority. I think the higher the standing of each party, the more importance and weight that particular number needs to have.

    I would only use the top 4 parties for Quebec, and the top 3 nationally and in all other provinces - essentially for the parties that are actually winning seats.

    ReplyDelete
  12. Kevin,

    Yes, I could have just totaled the error rather than dividing it by the number of parties, but the result would have been the same.

    I think there is a danger in over-analyzing the results, as some pollsters rounded to the nearest full number and others reported to the first decimal point. That can mean the difference between (say) 37.4% and 36.5% (which would both be rounded to 37%). Extend that over five parties, and you get a potential 4.5-points' worth of extra error.

    ReplyDelete
  13. And I should point out that is only one example of why we shouldn't delve too deeply.

    ReplyDelete
  14. Anonymous 15:33,

    That's a fair point, but I think it is better to assess the pollsters by what they reported.

    ReplyDelete
  15. Or, I should say, everything they reported.

    ReplyDelete
  16. "But why stop at the national level? We do not elect presidents - regional data is as important, if not more important, than the national horserace numbers. If we bring it down to the regional level, then even Nanos has to be eliminated as their result of 23.6% for the Liberals in British Columbia (rather than their actual 13.4%) is outside of this particular sample's 7.7% margin of error"

    I'm not sure if you were doing this to be snarky or felt that you were making a valid point, but you're wrong on both of these objectives. MOEs are generally done at specific confidence levels (usually 95%). As you get to regional results for specific parties, the number of data points that you're evaluating gets pretty high (5-6 data points for 7-8 regions, depending on how you're counting things) - with 35-50 or so data points, you'd expect the polling company to get 2-3 results that were outside the range and this would still accurately reflect the confidence level that they're reporting.

    ReplyDelete
  17. Valid point, but with Nanos in particular there are 26 data points (four parties in BC, PR, ON, AT, five in QC, "Other" in all), and with 95% confidence Nanos should only have one or at most two wrong. I didn't check all of their numbers, but I know they are outside of the MOE on at least two counts (Liberals in BC and "Others" in QC), and I'd be surprised if they weren't off on a third, especially when you take into account the MOE for individual parties within each sample.

    ReplyDelete
  18. I think its a bit unfair to even include Environics and CROP in this "competition" - both firms did no polling in the last 10 days of the campaign when we know a lot of shifting was going on. CROP deserves credit for being the first polling company to show the NDP moving into the lead in Quebec - but since they didn't do a subsequent poll on the eve of the election when its clear that the NDP had moved much further ahead - I think its unfair to call them the least accurate pollster. IMHO, only pollsters who put out polls in the final week of the campaign should be be compared to one another.

    ReplyDelete
  19. DL,

    From the post above:

    "Crop, Innovative Research, and Environics are being assessed by their final polls, though they were all taken one week or more before the day of the vote. That may seem unfair, but these are the final numbers we have from these firms and they need to be assessed by some measure. Consider it a penalty for not releasing data closer to the end of the campaign."

    ReplyDelete
  20. Where can you find Nanos' "Others" numbers for each region? Otherwise, the only other outliers I saw were the NDP/Conservatives in the Prairies (they're outside the party sample's MOE). I'd count that as one outlier though. The two numbers in the Prairies aren't entirely independent - when the Tory number is too low, then by necessity the other party numbers will be a bit too high. With only 26 data points, I don't think it's that unreasonable that there could be 2-3 outliers.

    "Just as problematic methodology will not be corrected by over-sampling, it can also be masked by the large margin of error of smaller samples."

    I don't think that point is entirely valid. It's like a pollster saying that while they had the NDP 30% too high in Ontario and 30% too low in Quebec, the two errors balanced each other out so it's ok.

    I think what you're doing really is getting accuracy and precision confused. We know the precision of each poll - that's just its MOE. What we don't know (and what you're getting at in this article) is the accuracy of each pollster. To find that you need to separate out what error is simply due to statistical noise and what is due to poor methodology.

    The simplest way I could think of to do that would be to just do a Chi-squared test such as this: http://en.wikipedia.org/wiki/Pearson%27s_chi-square_test#Goodness_of_fit or some other goodness of fit test http://en.wikipedia.org/wiki/Goodness_of_fit. You'd probably want to set anything with a Chi-squared value of below 1 to just 1 and say those polls are equally good though. Maybe someone with a better background in statistics could chime in here... I dunno. Thoughts?

    ReplyDelete
  21. Ashley Morton11 May, 2011 21:48

    Hey Eric - a suggestion for a future model - when we have advance polls, track down an estimate of how many voted in the advance, and "freeze" that percentage of votes at your predicted results, as of that day. that might have helped with the overall underprediction of Conservative result, because I think that they were over 40 as of that weekend. On the other hand, it might have also overpredicted the Liberal vote, which your system was already doing...

    ReplyDelete
  22. Might I ask, Éric, what happened to your Provincial Coverage post? And your change of the site's format to provincial coverage generally?

    ReplyDelete
  23. Blogger had a technical issue, and had to reset to the way it was on Wednesday morning. They are working to restore everything that happened between Wednesday morning and the technical error, so I assume my post on provincial coverage and the graphics will return.

    ReplyDelete
  24. Pollsters need to report results to the nearest 0.1% (not a full percent).

    Otherwise you'd get an average error of 0.25% (correct me if I'm wrong) even if your poll was 100% accurate just due to the rounding.

    (If I'm correct, for Angus Reid's best poll - 25% of the error can be explained by rounding!)

    ReplyDelete
  25. As an American, I was shocked to see that no pollster asked interviewees whether they had already cast their ballot. Since well over 2 million Canadians cast their ballots in advance balloting, and many more probably cast their votes before Election Day using absentee votes, not asking this question raises serious questions. It would not be shocking to discover that the Tories used their money and manpower advantage to target marginal voters and get them out early - those people might have changed their minds later, but their votes were already in the box.

    ReplyDelete

COMMENT MODERATION POLICY - Please be respectful when commenting. If choosing to remain anonymous, please sign your comment with some sort of pseudonym to avoid confusion. Please do not use any derogatory terms for fellow commenters, parties, or politicians. Inflammatory and overly partisan comments will not be posted. PLEASE KEEP DISCUSSION ON TOPIC.