Monday, November 2, 2015

A different approach to seat projections

One lesson learned from the 2015 federal election is in the difficulty in projecting outcomes when the voting base of a party has dramatically shifted from one election to the next. In this case, it was the Liberal Party that attracted new and lapsed voters to the polls. Many of these voters appeared in unexpected numbers and in unexpected places, which is why virtually all seat projectors estimated that the Liberals were on track to win only a minority government, with a majority government being possible but unlikely.

Turnout in most elections does not dramatically shift from one campaign to the next, making it much easier to project outcomes at the seat level based on regional or provincial changes in support. It is no accident that the model has performed best when changes in turnout were lowest, while it has struggled more in elections with big changes in turnout.

But this is not all about new voters. More people voted in 2006 than in 2008 or 2011, meaning that a lot of people who voted in the 2006 election stayed home in the next two elections as the Liberals plummeted in public support. The Liberals did attract a lot of first-time voters in this campaign, but they were also successful in bringing back a lot of voters that just stayed home in 2008 and 2011.

Though there were a few surprises, the vast majority of the ridings the Liberals won in this campaign were ridings they had held in the past. Many of their pick-ups in Quebec, for instance, were won by the Liberals in 2000.

Taken together, this suggests that looking further back into a riding's history than just the most recent election can tell us a lot about how likely a riding is to go one way or another in the next election. If the basic principles of the swing model can work when looking at just one election, then they should be able to work when applied to elections further back in time, swinging the results from two or three elections ago according to where the polls are today.

This is what I tested with the 2015 results, swinging the results in each riding over the last three elections. I tried various weightings (looking at just the last two elections, giving all three elections equal weighting, trying proportions of 4/2/1, etc.), but the one that worked best was also the one that was most intuitive. It weighted the three elections with a proportion of 3/2/1, or 50% for 2011, 33.3% for 2008, and 16.7% for 2006.

As shown below, the results were very positive.

Using the actual provincial and regional vote shares (which is the real test of the projection model), the old model would have delivered the Liberals 154 seats, the Conservatives 120 seats, and the NDP 56 seats. The actual results for the Liberals and Conservatives would have fallen outside of the likely ranges, and into the maximum and minimum ranges.

But with the new model, the Liberals would have been projected to take 166 seats, the Conservatives 116, and the New Democrats 45. It would have been almost right on the mark for the NDP, and exactly on the mark for the Bloc Québécois. Most importantly, the Conservatives and Liberal results would have fallen within the likely ranges — the Liberal result falling well within that mark.

The regional projections would have been improved significantly, particularly in Quebec. The actual result there was 40 seats for the Liberals, 16 for the NDP, 12 for the Conservatives, and 10 for the Bloc. With the new model, it would have given 38 seats to the Liberals in Quebec, with 18 going to the NDP and 12 to the Tories. Only in Atlantic Canada would the results have not fallen within the maximum and minimum ranges, with the Liberals topping out at 30 seats and the NDP bottoming out at two. Peter Stoffer and Jack Harris would have still been projected to survive.

Even considering where the vote projection model stood on the eve of the election, this new seat projection model would have given the Liberals a range of between 128 and 177 seats — meaning their final tally of 184 seats would have fallen outside the likely range, but the potential for a Liberal majority victory would have been a big part of the final analysis (rather than the marginal part it was actually given).

Accuracy would have been higher at the riding level, increasing from 81.4% to 82.5%. But most significantly, the potential winners would have been identified in the high-low ranges in 91.4% of cases, meaning in only 29 ridings would the potential winner have been missed, compared to 43 with the old model.

This new model requires a few other changes, including some judgement calls in terms of how to handle special cases. This involves ignoring the results of some past elections, such as the 2006 and 2008 elections in Saanich–Gulf Islands, where Elizabeth May did not run, or cases where an independent took an out-sized portion of the vote (as in 2006 in Portneuf–Jacques-Cartier, when the Conservatives ran a candidate against independent André Arthur, or in 2008 when Bill Casey ran as an independent in what is now Cumberland–Colchester).

In addition, the incumbency effect built into the model is now a redundancy, as the results of the previous elections already take into account an incumbent's ability to withstand wider trends. Losing an incumbent, however, is still applicable.

Note that I do intend to run this new model through more tests as time permits to ensure that its improved performance isn't due to a fluke related just to the 2015 federal election. But time is an issue, because the next provincial election is just weeks away.

The next test: Newfoundland and Labrador

This new model may never be tested for real at the federal level, as the next election may not be decided according to the first-past-the-post system. Elements of it could be part of a model designed to project the outcome of a ranked ballot election, however.

The real tests will come in the provincial elections that will still use the first-past-the-post electoral system. And the next one is coming very soon: Newfoundlanders and Labradorians are heading to the polls on November 30.

How would the new model have performed had it been used back in 2011? Again, it would have done a better job.

The actual results of that election delivered 37 seats to the Progressive Conservatives, six seats to the Liberals, and five to the New Democrats. This despite the NDP finishing ahead of the Liberals in the popular vote by over five percentage points.

With the actual results plugged into the 2011 model, it awarded 41 seats to the PCs, four to the Liberals, and three to the NDP. Not a bad showing, considering it still would have given the Official Opposition nod to the Liberals. But the riding-level accuracy of just 75%, or 36 out of 48 ridings, left a lot to be desired.

The new model would not have improved upon the 2011 performance, at least in the top-line numbers: it still would have been 41 Tories, four Liberals, and three New Democrats. But the riding-level accuracy would have increased to 81.3%, with correct calls in 39 out of 48 ridings.

The election in Newfoundland and Labrador will pose a few problems, in that the number of seats has been reduced from 48 to 40. There have also been a large number of floor-crossings and by-elections since 2011, further complicating matters. These are the sorts of things that can throw any seat projection model for a loop.

But I will put the new model to the test nevertheless and see how it does. The same principles behind what has been, I believe, a very effective model (which has been used in 17 provincial and federal elections) are still in place, so I consider this a refinement rather than a wholesale change. We'll see how it does soon.

32 comments:

  1. Since there is detailed polling information available leading up to the 2008 and 2011 elections, might I suggest you try this 3/2/1 weighting there as well?

    It would be helpful to ensure that you're not building a model just to fit one election. That would make it overfit, and an overfit model is a bad model.

    ReplyDelete
    Replies
    1. I don't need polling data to test this model, I just need actual results! I do plan on running more tests, but not necessarily with past federal elections. I can apply the test to every province as well, and this is my intention once I have the time. I'll put a note about this.

      Delete
    2. Also note, as mentioned above, that I ran the model for the 2011 NL election as well.

      Delete
    3. Good point. And thanks for thinking of it.

      It always irritates me when people have predictive models that they haven't tested against historical data. Climate models do this all the time - there's no evidence that they've been tested to see if they can predict current temperatures. But if what you're doing isn't falsifiable, it's not science.

      So thank you, Éric, for doing science.

      Delete
  2. Hi Éric - it seems to me that this builds more inertia into the model, or at least biasing an inclination to historical norms in individual ridings. How does the new model do with backtesting the results of the 2011 Federal election, especially in predicting the NDP results?

    ReplyDelete
    Replies
    1. I would argue that results are heavily influenced by historical norms. I noted the example of where the Liberals won in Quebec, for instance. And even in 2011, most of the NDP's best results were in ridings in which the party did best in Quebec in 2008.

      What this avoids is over-estimating the amount of swing in some ridings while under-estimating it in other ridings. It takes into account the incumbency effect over several elections, for instance for the Liberals in Montreal. The old model over-estimated the amount of increase the Liberals would experience there because it was based on just one election, and under-estimated it elsewhere in the province. The same would happen in Ontario and the ridings in which the Liberals did well in 2011.

      Delete
    2. Point taken, and I don't mean to handwave away a riding history. I'm only raising a similar point to Ira above, and hoping you don't overfit a model just based off the Liberals in 2015. Which it fortunately doesn't sound like you're doing! :)

      It sounds to me like you're modeling the concept of riding/voter elasticity, as fivehundredeight calls it. I'm not sure if the work Nate has done over there could be applied to your model? The US obviously has the advantages of much better data, larger 'ridings' (state level for his analysis), and only two parties, but perhaps there's some takeaways there in terms of approach.

      http://fivethirtyeight.com/features/swing-voters-and-elastic-states/

      Delete
  3. While this is interesting and useful information now, as we attempt to understand the dynamics of elxn42, it may not mean much come the next federal election. First, and most obviously, if the election is held under a new regime different from FPTP, all bets are off. Voter behaviour should be expected to change based on the new reality, and a potential lack of understanding of the new dynamics.

    Second, I still see the actual granting of a vote as involving a certain amount of "magical thinking". The only groups who can be reliably predicted are those die-hard partisans, for whom no other factors than party or candidate matter. Most Canadians have (fortunately) not been so partisan, meaning that elections come down to whatever narratives are established and what the zeitgeist brings with it. Some of us were predicting pretty decent Liberal majority as a likely outcome by the Friday before the election. This was based in large measure on polling, but also on some intangibles that were about choosing the right narrative. And in this election's case, the right narrative was determined only partly before the final week; that final week drove it home.

    So why did I see a majority? First, I accepted Greg Lyle's notion (borne out in the data) that Liberals were underestimated by most models, since 2008 and 2011 were abnormally poor exemplars. Second, I was supporting a candidate on the ground and hearing from more than a few voters who seemed to be "grudgingly" voting Liberal. This said to me that Liberals could now count on a host of new voters who were not established as likely Liberal voters. Third, the trend in the last week kept getting amplified day by day, and the conclusion was inescapable. Last, the Conservative ads, events and rallies were demonstrating a hunkering down impulse, and featuring crazy stunts that appealed uniquely to the motivated base. They also drove non-Harper voters to become more motivated, against the Conservatives.

    The point is this: I guessed right based on an interpretation of the data based on some actual and anecdotal experience, that aligned with the prevailing narrative. Even if we were stuck with another FPTP election next time, an entirely new narrative would have to be discerned from the evidence and anecdotes, and it might be very different again. The new model will produce more accurate data and interpretations, but without capturing the right narrative, it may still fail to capture significant movement in the last days and hours.

    ReplyDelete
    Replies
    1. Unless that "actual interpretation" has some describable methodology, you were just guessing.

      If you can't teach a computer to draw the same conclusions from the same data, you weren't interpreting it in a broadly applicable way.

      Delete
    2. Even if the voting system changes in four years, which it probably won't, there are ten provinces which will still have plurality voting systems.

      As for momentum, consider the UK election earlier this year that surprised everybody because they didn't detect any momentum towards Conservatives.

      Delete
    3. I was not surprised by the Tory victory in May. The polls throughout the campaign showed a Conservative Government was the most likely outcome. The major mistake of UK pollster was their seat projections which completely failed to model Scotland correctly and made it appear Labour was capable of winning 250+ seats but, this assumption rested on Labour maintaining a majority of Scottish seats. If one removed Scotland from the polls it soon became obvious that the Tories were closing in on 40% in England and a majority Government.

      Most importantly during Miliband's leadership Labour was unable to hold the 35% popular vote mark for a sustained period of time. Even during the austerity protests Labour's polling number were in the low thirties. The inability of Labour to consistently poll in the mid-high thirties says all any well attuned observers needed to know about momentum before, during and after the 2015 election campaign.

      Delete
  4. Turnout dynamics look like an interesting avenue of inquiry.

    Dormant LPC voters turning out while some CPC voters stayed home is hindsightwise something that was not measured or taken into account before the election.

    Harper's majority contained the seeds of his downfall. The excesses of his bills would have been contained in a minority parliament and not have been there to remotivate dormant LPC voters.

    It reminds me of Walpole's dictum: Let sleeping dogs lie.

    ReplyDelete
    Replies
    1. There's little evidence that CPC voters stayed home. They got almost exactly the same number of votes as in 2011.

      Delete
    2. The Tories retained 96% of their 2011 vote in 2015.

      Delete
    3. That's a huge number. Given the age bias in their support, CPC voters die more frequently than typical Canadians. Retaining 96% is a roaring success.

      Delete
  5. Eric, did you consider looking at the implication of weighting past results by age? So 2004-2008 would be closer to equal weighting while 2008-2015 would be closer to the 3/2/1 scale?

    ReplyDelete
    Replies
    1. That might be something to look into.

      Delete
  6. Hey Éric, I think you're absolutely right about all bets being off if electoral reform happens.

    I notice that the news sites are saying that FPTP favours the Conservatives, proportional representation favours the smaller parties (Greens, NDP, etc.), and the preferential balloting favours the Liberals (since they are the most centrist party and would in theory gain the most from 2nd place votes.

    The real world is not so simplistic because people will vote strategically no matter what system is used. Take the 1952 election here in BC. BC had gone to a form of preferential balloting because both the two main parties (Liberals and Conservatives) figured that second place votes would go to them, and thus the rising NDP would get decimated.

    It backfired big time as voters voted strategically for the fringe socially conservative Social Credit party as their second place, unexpectedly handing them the government.

    If history repeats itself, and the Liberals opt for preferential balloting to increase their chances of winning, it could also backfire, giving the victory to someone like the Greens.

    ReplyDelete
    Replies
    1. A very important point: voters act unexpectedly. Moving to a preferrential ballot is very likely to have unforseen results as in 1952.

      A slight correction-the 1952 B.C. General election did not "hand" not the Government over to Social Credit-quite the opposite. The BC General Election of 1952 may be the last example of the Crown choosing the Government.

      The C.C.F had won the largest share of the popular vote with 30% and 18 seats. The Coalition Government secured 40.3% of the vote but, this was split between the Liberals and Progressive Conservatives and resulted in only 10 seats. The Socreds came out of nowhere to capture 27% of the vote and 19 seats. There was a single Labour member Tom Uphill. All things considered this should have resulted in a CCF Government with Uphill voting with the CCF. However, W.A.C. Bennett convinced Uphill to vote with Social Credit and then convinced the Lieutenant Governor to appoint him as premier before the Legislative Assembly met and the rest as they say is history.

      My point; on election night and for a month afterwards it was not clear who would be premier including a continuation of the Liberal-PC Government with either CCF or Socred support. Harold Winch or the Socred leader, who did not run and was not a member of the Legislature after the election.

      Delete
  7. I've been thinking about how an election like this, with one party more than doubling its vote share, illustrates the limitations of your proportional-swing model

    Your model assumes that when a party's support grows, its growth is concentrated where its previous support was. I've always been skeptical about that, largely because it's not a zero-sum formula -- in any riding the net gains for the growing parties needn't match the net loss for the shrinking parties. However, I haven't crunched as many numbers as you, so I can accept that you've gotten good results in the past.

    But a party that doubles its vote, is not going to double it by going from 40% to 80% in the few areas where it was already stronger. It's going to win its victories by bigger margins, while also going from distant-third to competitive.

    Parties that decline, may decline something like proportionally -- you can't lose votes where you don't have votes to lose. But any party that goes from 18.9% to 39.5%, does it by gaining strength in areas where it wasn't already strong. That's where there's room to grow.

    So I'm wondering: if considering earlier elections in establishing the baseline improves the accuracy, is that because the earlier elections had something closer to the results this year? If so, I can envision a model with a dynamic baseline. Any previous election's weighting would be a function of two variables, the time passed and a metric for the closeness of the popular vote totals.

    ReplyDelete
    Replies
    1. I think that including results from previous elections will avoid some of these problems. The principle behind the swing model works, but it can have problems when you see big changes from one election to the next. A model that incorporates older data will mean that when a party does register big gains, it will often go to the ridings in which they lost a lot of support in previous elections, rather than go primarily to the ones in which they did not lose a lot of support.

      Delete
    2. Before you finalize your next update, it would be worthwhile to see what Forum, Cvm and Vox Pop included in their models. They did better than 308.

      In addition the 308 polling average itself could be improved. I noticed that a simple average of all 11 polls with polling ending on the 16 th or later has less error than 308. An average of the other estimates from Tctc, cvm, Cew 308, Saunders prediction, Saunders Daily and weekly does as well.

      Delete
    3. I don't think there was much secret to Forum's numbers, it was just a product of their final poll which over-estimated Liberal support in B.C., the Prairies, and Quebec. If I used only their numbers, I would have had almost identical results.

      I don't know enough about the CVM and VoxPop models to comment on their methodology. But CVM in particular had much greater errors than I did at the regional level - it just added up to better overall numbers. Look at their region-by-region results, and it does not look like the 2015 election very much.

      And of those 11 polls in your simple average, six of them are Nanos/EKOS rolling polls, so they can't be double-counted. You're also including Mainstreet, which was not released before Election Day (so it was impossible to include). A simple average of just the final polls that were conducted on or after October 16, excluding rolling polls, was only marginally better. But you're right, sometimes simple is better - we just never know when that will be the case.

      Delete
  8. How would this new model fare in elections in which parties see a growth in support that they never had historically? I'm thinking specifically about the recent Alberta election, where the NDP received 41% of the vote, even though the highest result in the past three elections was only 10%. Wouldn't the new model then have severely underestimated NDP support?

    ReplyDelete
    Replies
    1. There is no reason why it should. Again, it is applying the swing to past elections as well. In 2008, the NDP won 8.5%. So swinging them up to 41% would not be much different from swinging them up to 41% from the 10% of 2012.

      Delete
    2. The new model will tend to regress unexpected results back to the historical mean when projecting a subsequent election.

      I love this idea. This is something sports statisticians have been doing for years.

      Delete
    3. Oh, my apologies, I understand now. I like this new idea.

      Delete
  9. I love this idea. Was this based on my suggestion??

    ReplyDelete
    Replies
    1. Great minds think alike and fools never differ!

      Delete
  10. Try testing against the 1993 election data ....

    ReplyDelete
    Replies
    1. I expect most projection systems will handle the 1993 election badly. There's a reason the 1993 election is taught in political science courses around the world.

      Delete
  11. I am wondering if the three election swing model incorporates too much in terms of improving upon the original model. The new model's ability to account for changes in the immediately preceding election that persist to the present one relies mostly on the regional adjustments, which it seems to me would have limitations. To consider an example, the 1990 Ontario election saw the breaking up of longer run party incumbency factors in a number of ridings. As a consequence of this, if the three way swing model, or any such regression model had been applied to forecast the 1995 Ontario election, it would have performed significantly worse at predicting individual
    riding outcomes than the original model extrapolating from 1990, in spite of the fact that the 1995 Ontario election, like the 2015 federal election, saw a one time spike of the NDP vote in the preceding election collapse back to traditional levels, and a traditionally governing party from the recent past, in this case the Progressive Conservatives, rise from third place to a majority. This would seem to argue that the three election swing model may be generally inadequate.

    In view of this, I would speculate on a revised model, starting from the last paragraph of the comments of MGK above. A series of previous elections would be regressed upon, but only to make predictions based on the current popular support of the parties more robust to previous elections where the parties had similar levels of support. The actual ranking of ridings for a party by degree of its support, with respect to the party's so adjusted riding distribution vote profile, would for each party, then be matched as closely as possible to the party's corresponding riding result rankings drawn strictly from the preceding election. This alternative model would avoid the problem of over-correcting for short term local trends in the immediately preceding election, while responding to what may be the underlying issue of how the riding distribution vote profiles for a party are qualitatively different depending upon the party's overall popular vote.

    ReplyDelete

COMMENT MODERATION POLICY - Please be respectful when commenting. If choosing to remain anonymous, please sign your comment with some sort of pseudonym to avoid confusion. Please do not use any derogatory terms for fellow commenters, parties, or politicians. Inflammatory and overly partisan comments will not be posted. PLEASE KEEP DISCUSSION ON TOPIC.