Frank Graves, President of EKOS Research Associates, was kind enough to give me a very complete interview over a few emails. In it, he discusses some aspects of polling and IVR polling in particular that I would have liked to have gotten the chance to cover in my article, but could not because of length. Below is the full transcript of the email interviews.
308: Several years ago, EKOS moved from traditional live-caller polling to interactive voice response. Why was that decision made?
FG: Actually we continue to do live interviewing and we have a large random probability panel which we use for survey work. Most but not all of our media polling comes from our version of IVR, which is a series of carefully tested protocols which we are now calling HD-IVR (high definition IVR). IVR is not a survey methodology, it is a method of reaching respondents. The sampling strategy, call back regimens, data purification techniques, instrument design, etc. are all crucial and determine the accuracy of the survey results. We turned to IVR several years ago noting the success of some American IVR pollsters. IVR has certain limitations but for short surveys, properly designed and administered it can be an excellent tool. We particularly like the ability to generate large random samples at far lower cost than with Live Interviewer CATI. In our experiments HD-IVR gives results that are equivalent or better than those with Live CATI and much more accurate than any opt-in online methods.
308: Having used different methodologies in the past, what do you consider the strengths of IVR compared to those other methodologies?
FG: Once again noting that we continue to actively use many other methods (we have our own call centre for live CATI) and we maintain a large random probability panel (PROBIT), I would give the following list of strengths for properly applied IVR techniques. (this includes the application of call backs , noise detection and elimination, dual land line and cell phone sampling frame, etc.):
- Accuracy, particularly on simple behavioural and intention measures.
- Speed: large samples can be assembled and analysed very rapidly.
- Large samples which produce lower MOE (particularly for sub population analysis and tracking).
- Economy (the live interviewer cost is replace by robotics).
- Minimisation of undesirable mode effects from live interviewer (particularly important on questions which can produce social desirability bias).
308: What are IVR's limitations, and what can be done to correct for them?
FG: Once again I want to stress the difference between HD-IVR and any properly designed IVR system and “raw” IVR. We get much better results with several important refinements which are often not applied in IVR polls. The biggest limitations of IVR are:
- Length, the survey must be quite short.
- Reputational problems: the use of IVR is associated with reputational issues (particularly in Canada) where there is lower familiarity with properly applied IVR. Sloppy applications of IVR and the nefarious connection to some vote suppression activities have done nothing to help this problem.
- Stricter limitations on calling periods.
- Programming complexity to deal with multiple random versions to eliminate response set biases and sequencing effects.
- More people are called so there is a modest increase in the intrusiveness of the research.
- In order to get sufficient representation of younger respondents we have to engage in call backs and a judicious sample of cell phone-only populations.
- Response rates are somewhat lower than those for live interviewer but with our techniques only modestly lower and with less systematic patterns of non-response.
- Our experiments show that there is more random noise in IVR than with live interviewer. This noise is easily detected with testing and can be purged.
308: What do you mean by noise?
FG: By noise I mean responses which are not measuring the concept being tested. Noise is random, meaningless data. The analogy is drawn from psychoacoustics but applies to other areas such as this (I believe Nate Silver uses the terms in the title of his last book). As an example of random noise consider the difference between someone answering the questions thoughtfully and accurately (signal) and someone just randomly pushing numbers. We find that people answering questions about fictitious events/products is higher in IVR than with live interviewer. This applies to other unwanted survey behaviour as well. What we used to call anomalous response set (yea and nay saying and more recently speeding and straight lining). With the noise detection questions we can identify and remove these sources of noise from the sample.
308: Generally speaking, how does IVR polling compare to other methodologies in terms of costs and effort?
FG: The front end programming and data base management is more complex but the obvious savings are in live interviewer time. Long distance charges are higher because of the greater numbers of calls. Our costs and efforts are perhaps half of what a live interviewer survey would cost and comparable to the costs of our probability panel offerings. Opt in panels, where the respondents volunteer for surveying and who have never been spoken to by the survey organization are cheaper still than HD-IVR.
308: Can you explain how your 'probability panel' is different from 'opt-in' panels?
FG: Probability methods select with an equal probability of selecting (EPSEM in Kish’s terminology) each member of a population. This is a canon of good sampling and the foundation of the ability to apply the central limit theorem and the law of large numbers, the foundations of inferential statistics. We sample each member of the population with a known probability of their likelihood of appearing in the sample. In the case of opt-in or convenience sampling there are (at least ) two fundamental problems. The sample is NOT randomly drawn from a frame of all individuals in the population. Respondents are invited to join or come from other pre-existing lists of some other portion of the population. They therefore opt-in or volunteer (typically for material incentives) and their relationship to the broader population is unclear. Since the process is not random inferential statistics are not possible (including calculation of Margin of Error). The problem is worsened by systematic coverage errors where those who cannot or will not do surveys on line will never appear in the sample
Now some say that as response rates decline the process of random sampling is no longer meeting the requirements of statistical inference. The hard, third party research suggests this is not true. While we have selection processes from a random invitation this is a much smaller problem than those problems PLUS a non-random invitation. The top authorities remain unconvinced that one can achieve scientific accuracy and MOE with non-random samples (MOST online panels are non-random). Under rare and extremely stringent conditions this happens but in most cases it is wrong. By the way, the response rates with HD IVR are close to what we get with live interviewer now. And objections from those using opt-in panels are hard to take seriously as their response rates are incalculable and if they could be calculated would be the percentage of all those who saw the internet ad and didn’t join the panel (maybe 99.9 % or higher?).
308: Because of the issues related to the use of robocalls in political campaigns, whether legitimate or not, and by telemarketers, there has been increased criticism of this methodology recently. What kind of problem does this pose for polling firms that use IVR?
FG: We have spoken to the CRTC on this issue, as well as the MRIA. We certainly would welcome limitations on the less savoury applications of robocalls as this would lessen our problems with public suspicions. We use a very rigorous code of application that meets the CRTC requirements for automatic dialling devices. We would welcome clarification that would distinguish legitimate research application based use of IVR from the much more common mass market uses. This distinction does apply to polling and market research in other areas and it is unclear how it would apply in the context of IVR. We would welcome sound guidelines and a demarcation between legitimate survey research and other areas of use.
308: You have recently discussed the challenges of building a representative sample of voters. But what challenges do you face in building a representative sample of the population, considering falling response rates and increased use of cell phones over landlines?
FG: We only really know who the voters are after the vote so this will remain a challenge. In the case of representative samples of known populations careful sampling, call backs and weighting can continue to produce scientific accuracy when based on random sampling, even with steeply declining response rates. Coverage errors for cell only and off line respondents can also be solved but these subpopulations are not included in lots of current work by others. Experimental testing can identify and calibrate deficiencies and patterns of selection even when using random selection. These patterns can be both demographic and psychographic; but they are correctable.
308: And where does the challenge come in building a representative sample of voters?
FG: The challenge is not one of modelling a known population but predicting a future event. We can never know this with certainty and using guesses like previous demographic characteristics of the past vote are very limited solutions. Some things that used to work (e.g. enthusiasm) no longer work and demographic turnout can and will vary from election to election. Asking people their certainty to vote is basically useless for predicting who will vote. Past voting patterns are of some assistance as are questions about whether you know where your polling station is. But these are highly limited aids in those situations where more than half of the eligible population isn’t voting and they are systematically different in their vote intentions than those who show up. Increasingly, political campaigns are all about getting out your vote, and keeping home the opponents' vote. Mandatory voting would eliminate this problem but I am not holding my breath on that one.
At the federal level one should be able to accurately forecast the outcome with sufficient tracking and diagnostic tools. And we have correctly forecast all federal elections save the ‘miss’ on the 2011 election which got the winner right but not the majority. In fairness, no one else predicted a majority that time and our poll was within the MOE of all polls for that election. We have been working extensively to understand the issues of turnout (which is the key - NOT late switching or undecided movements as some have claimed). We are very confident that we will get the next federal election outcome accurately as we did in all previous attempts.
308: What role does weighting play in producing accurate results?
FG: Weighting is very important but it should be a fairly light touch with crystal clear guidelines. It should never be used to correct huge deficiencies (e.g. weighting younger respondents by several times). Our unweighted HD IVR gives very similar results to our weighted version (age, gender, household size). One should definitely not root around in the weighting bin until things look okay. And pollsters should produce or have available both weighted and unweighted results. If weighted results look really different than the unweighted then something is wrong with the sample.
308: EKOS has been in the business for a very long time. How has political polling changed over the years?
FG: That is an essay in itself, but I would say that the methodological challenges we have been discussing and the transformation of methodologies are very important.
I think that the media-pollster relationship is in a state of disrepair. I think there are inadequate budgets and I think the statistical fluency in the media and possibly the public has declined. The role of the aggregators is another new feature; something I find to be a mixed blessing (although I do think you give a really good effort here Eric). I detest the conflation of polling accuracy with forecasting the next day election. This yardstick comes from a time when most voted and those who didn’t weren’t particularly different. The correspondence between the election and final polls was a great way to check a pollster’s accuracy. When half or more aren’t voting and those who didn’t have different political preferences this becomes a lousy yardstick for “polling accuracy”. There is a continued need for forecasting and this is a related skill but the tasks of forecasting and modeling the population should be seen as related but separate tasks.
308: If elections are no longer good ways to gauge a pollster's accuracy, how else can the accuracy of a pollster's work be tested?
FG: Pollsters should conduct proof of concept testing with known external benchmarks to show that they can achieve representativeness. Important polls should at least occasionally include a basic inventory of benchmark indicators of representativeness such as: do you smoke? Own a valid Canadian passport? Rent or own your home with or without mortgage? Heating fuel type, etc. And the unweighted raw data should look like the population on key demographic measures and these external benchmarks.
308: If the media does not have the budget for polling or dedicated poll analysts (and that will not be changing), what are pollsters to do? Should they back away or do they have a responsibility of some sort to put out numbers?
FG: They should probably limit their activities to those areas where they can put best effort forward. The media should pay (as they do in the US) as this is an area which really does generate viewership and readership. The industry could consider a consortium of players to offer this up as an industry service during elections. Or perhaps we could look at alternative models such as Ramussen.com that successfully sells directly to consumers with subscriptions.
308: How has the business of polling in general changed?
FG: The ‘business’ of polling has changed dramatically. We have discussed some of the methodological and technological transformations. Political polling really isn’t a ‘business’ for any of those doing it in Canada. Historically, we have probably been the largest supplier of polling services to the federal government. The federal polling budget has dropped from over $30M in 2006 to under $4M last year. This is a rather breathtaking elimination of what was non-partisan policy and communications work based on listening to Canadians. Interestingly while "listening" to Canadians has all but disappeared “persuading" Canadians has burgeoned. In 2006 there were roughly similar expenditures. Today, there is probably 30 to 40 times the expenditures federally on advertising that there is on polling. Fortunately our investments in new survey technologies have strengthened our other markets and we are now experiencing growth and profits. While we no longer depend on federal markets it is our hope that the federal government will return to listening to Canadians again.
308: In your polls, particularly of the general population, EKOS has tended to have larger proportions of people supporting the Greens or 'Others' than other firms. Why is that?
FG: Our polls (particularly between elections) are focussed on all eligible voters. We believe that our polls accurately measure the percentages of all eligible voters who support the Green Party on the day of the polls. If one doesn’t prompt for the Green Party one will get lower incidences as one would if you dropped any of the other party prompts. The simple fact is that many GPC supporters don’t bother voting. They are younger and younger voters vote in half the rate of older voters. They also correctly note that under the first-past-the-post voting system they are unlikely to see any electoral results if they did vote, so this is a further de-motivating factor. In 2008, nearly 7 per cent of all voters voted for the GPC. If you don’t mention GPC in your prompting you may get a number closer to the election (or you may well be lower than that). But I don’t like to mix up ad hoc adjustments for the fact that the GPC doesn’t vote as much when measuring all eligible voters. We carefully note that GPC support historically translates into fewer actual voters. Other pollsters have their own legitimate views on how this problem should be handled.
308: What changes, if any, need to be made to ensure that IVR polling produces good results in the future?
FG: Our HD-IVR has been refined to provide scientifically accurate models of all eligible voters. We have the experimental evidence to show that. If you are separating the question of how to make better forecasts of turnout there is lots of work needed there and we and others are focusing on this challenge. As Yogi Berra noted, ‘prediction is really hard, particularly when it’s about the future'.