Forecasting and Projection Methodology

The following is a detailed explanation of the forecasting and projection methodology used for the 2016 Saskatchewan and Manitoba elections. The fundamentals of the model were also employed in all federal and provincial elections held since 2011. Improvements, updates, and new features were added after each election season.

This methodological explanation is for projections and forecasts for an upcoming election. The methodology of seat projections for individual polls and for poll aggregations for Canada outside of an election campaign are slightly different, and is explained here and here.

Poll aggregation

The projection model starts with the aggregation of all publicly available opinion polls. Polls are weighted by their age and sample size, as well as by the track record and past performance of the polling firm.

The weight of a poll is reduced by 35% with each passing week outside of an election campaign and each passing day once a campaign has officially begun. The 'date' of the poll is determined by the last day the poll was in the field.

The sample size weighting is determined by the margin of error that would apply to the poll, assuming a completely random sampling of the population. The margin of error for a poll of 1,000 people, for example, is +/- 3.1%. A poll with a sample of 500 people has a margin of error of +/- 4.4%. Rather than giving the poll of 500 people half the weight of the poll of 1,000 people, the smaller poll would be weighted at 70% (3.1/4.4) of the larger poll.

An analysis of a polling firm's past experience in a province or at the federal level has suggested that polling firms that were not active in a jurisdiction's previous election have a total error 1.18 times that of firms that were active in the previous election. Accordingly, polling firms with prior experience in a jurisdiction are weighted more heavily than those that have none.

Polling firms are also weighted by their track record of accuracy over the last 10 years. Their accuracy rating is determined by three factors: 1) the last poll the firm released in an election campaign, 2) their average error for all parties that earned 3% or more of the popular vote, and 3) the amount of time that has passed since the election. In order to take into account changes of methodology or improvements made over time, the performance of a polling firm in a recent election is weighted more heavily than their performance in an older election. The difficulty of each election is also taken into account: elections where the average error was lower are weighted more heavily than elections in which the error was higher. This is meant to take into consideration elections in which there were particular factors contributing to pollster error that were outside of the pollster's control. Conversely, in elections where the consensus was close to the mark a pollster has fewer excuses for higher error levels.

The accuracy rating is determined by comparing the average error, weighted by how recent the election is, of the best performing polling firm to others. For example, if the best performing firm had an average error of 1.5 points per party, a firm with an average error of three points per party would be given half the weight.

All of these ratings are combined to give each poll in the projection model a weight (no poll is ever awarded more than 66.7% of the total weight, unless there have been no other polls done recently). In short, this means that newer polls with larger sample sizes from experienced polling firms with a good accuracy record are weighted more heavily than older and smaller polls from inexperienced firms with a bad track record.

The vote projection

After weighing all the polls to determine the average result and estimating likely support for independents and smaller parties (based on performance in the last election and the number of candidates running this time), the projection model gives the best estimate of support that each party is likely to get in an election.

But rather than suggest that the poll aggregation, after adjustment, reflects the results of an election "held today", the projection is instead a reflection of the result as of the last day of polling in the projection model. For example, if a poll is released on April 14 but the last day the poll was in the field was April 12, the vote projection will be presented as being the best estimate of what the result of an election held on April 12 would have been.

The provincial or national vote projection, however, has little bearing on the seat projection. That is because the seat projections in most elections are calculated regionally, using the same methods described above to estimate support in each region of a province or in the country. Polls whose regional definitions do not exactly match the projection model's definitions are adjusted accordingly, with the difference between the election result in a region as it is defined by the model and by the polling firm being used. However, in some provincial campaigns regional data is not regularly available and so the projection is based entirely on the province wide estimates.

The performance of this method

This adjusted and weighted poll aggregation performs better than most individual polls and better than an unweighted and simple averaging of the last polls of a campaign. In 19 federal, provincial, and municipal elections, ThreeHundredEight.com's vote projection model has outperformed the average error of the final polls conducted by all pollsters during a campaign 17 times and has, on average, had an error level of 2.13 points per party compared to 2.82 points per party for the polls.

Recognizing the limitations, and vote ranges

But despite performing better than most individual polls and the average of the polls, the vote projection is still heavily dependent on what the polls show. It can thus fail catastrophically when the polls do, as occurred in the provincial elections in Alberta in 2012 and British Columbia in 2013. A measure of the likely error in the vote projection needs to be made, using the degree of past error polls have had in recent elections.

This is calculated based on a party's position in the legislature at dissolution: the governing party, the Official Opposition, a third party with multiple seats, a third party with a single seat, and parties without a seat in the legislature. The electoral outcome for each of these parties in recent elections is then compared to the polling average.

All cases in which a party in a particular position in the legislature was under-estimated in the polls is then used to calculate the average "High" range. For example, the average under-estimation (when polls under-estimated a party's support) in recent elections for the governing party has been by a factor of 1.1. That means that the weighted polling average is adjusted by a factor of 1.1 to arrive at the "High" range. The same is done for cases of over-estimation to calculate the "Low" range.

The minimum and maximum projections ("Min." and "Max." on the chart) are calculated to show the range of outcomes likely to occur 95% of the time, or 19 times out of 20. These are calculated based on the standard deviation of over- and under-estimations from the average over- and under-estimations in the past.

This gives readers a full understanding of the potential range of outcomes that are possible, based on past polling performance and what the data is showing. My role is not to make bets, but to try to figure out what the polls are saying and what they aren't saying, and giving people an idea of what to expect. But to narrow it down a little, the projection also calculates the range of most likely outcomes for each party. The chart below spells this out for the 2016 Saskatchewan and Manitoba votes:

As the governing party, there is a 55% chance that the electoral outcome for the Saskatchewan Party and Manitoba NDP will fall within the average-to-high range. If we want to extend that further, we can say there is a 73% chance it will fall between the high and maximum marks.

For the Saskatchewan New Democrats and Manitoba PCs as the Official Oppositions, the range is not so tight. The most likely individual outcome is for the result to fall within the average-to-high range (42%), but it is more likely that it will fall outside of that range (the remaining 58%). To find the smallest range that incorporates the most likely outcome, we have to stretch that to the low-to-high range. There is a 65% chance that the outcome will fall within that range.

For the Manitoba Liberals as a third party with a single seat, there is an 82% chance that the result will fall within the minimum to average range.

For the Saskatchewan Liberals and Greens in both provinces, parties without a seat in the legislature, the most likely individual outcome is that the result will fall within the low-to-average range (52%). There is an 80% chance that the outcome will land in the minimum to low range.

It is, of course, possible that the outcome will fall outside of even the maximum and minimum projected ranges.

Seat projection methodology

Once the vote projection and likely ranges for each party are determined, the model then makes a seat projection. This seat projection is based on the vote projection: if the first is wrong, the second will be as well. If the vote projection is accurate, the seat projection will also be accurate. With completely accurate polls, the seat projection model would have a margin of error of only 4.1 seats per party and make the right call in each riding 83.8% of the time, and identify the winner via the ranges 89.5% of the time.

At its core, the seat projection model uses a simple proportional swing method based on the difference between the results of the last three elections and current polls. Prior to the 2015 Newfoundland and Labrador election, the model swung the results from only the most recent election, rather than the last three. The swing is weighted as follows: 50% for the most recent election, 33.3% for the next most recent, and 16.7% for the election before that.

Put simply, if a party managed 20% in a given region in a previous election and is now polling at 40% in that same region, their results from that election in each individual riding would be doubled. If the party managed 10% in the election prior to that, their results from that election would be quadrupled. And if the party managed 40% in the election prior to that, their results from that election would not be swung at all.

The image below shows how this method would have estimated the Liberals' support in the riding of Kingston and the Islands in the 2015 federal election.
This swing is applied to every party in each riding. As this will sometimes result in total support of more or less than 100%, the numbers are adjusted upwards or downwards proportionately to equal exactly 100%.

This model is in contrast to the uniform swing method popular in the United Kingdom. With that method, in the example of Kingston and the Islands and swinging only the last election, the Liberals increase of 19.5 percentage points in Ontario would have simply been added to the Liberal result in 2011 in Kingston and the Islands, estimating that the party would capture 59.5% of the vote instead of 57.2%, as the three-election proportional swing would suggest. In this one case, that would put the error of uniform swing at more than double the error using the three-election proportional swing method.

The proportional swing method is a better estimation of how support changes between elections, reflecting that a party with a large base of support in a riding is more likely to grow by larger proportions than a party with no real support. It can also perform well when parties make major gains - with the actual provincial results of the 2011 federal election plugged into the model, it would have projected 60 seats for the NDP in Quebec to four for the Bloc Québécois, instead of the actual result of 59 to four.

Taking other factors into account

The swing model alone, however, cannot take into account the individual characteristics of each riding. Other factors need to be taken into account.

Incumbency is a very important factor, and old versions of the model took it into account. But the new three-election swing model makes that a redundancy. But when a party lacks an incumbent (when the incumbent does not run for re-election or moves to a different party), they are penalized at just under 10% of what they would have been otherwise projected to take.

New to 2015 is a factor for leaders. These are special incumbents. When a leader is running for re-election in the same riding as before, they lose far less support than other incumbents when the party is dropping overall. At the same time, however, they gain less support than other incumbents when the party is increasing overall.

When an MP is running for the first time as leader, but not for the first time in his or her riding, the bonus is even larger when the party is dropping in support overall. When the party is gaining, new leaders see a larger boost than other incumbents. This also goes for leaders running for the first time both as a leader and in a given riding.

And similarly to when an incumbent MP decides not to run again, there is a steep penalty when a leader vacates a riding, either because they lost it in the previous election or because they are not running for re-election.

Star candidates improve their party's performance in the vast majority of cases, though the classification of star candidates is one of the purely subjective aspects of the model, as I have to determine whether a candidate should be considered a "star" or not. This is usually quite obvious, and one of the biggest determinant factors is whether a candidate is widely considered as a star in the media, which has its own effect on how the candidate is perceived by voters, or by the party itself, in terms of profile and resources the party pours into a candidate's riding. Star candidates are usually former MPs or cabinet ministers, party leaders, or well-known figures from the private sector.

Floor crossing is difficult to take into account, and has been dropped as a factor in 2015. Instead, the floor-crosser causes a no-incumbent penalty to be applied to the party the candidate crossed from, and a star candidate bonus is applied to the crosser.

The presence of independents can also be difficult to model. If an independent politician is running for re-election as an independent, their vote is dropped marginally from the previous election, as has occurred in other cases. The same penalty is applied to popular independent candidates who were never elected. Politicians who left or were forced out of their party caucuses and are running for re-election as independents are treated differently. Based on an analysis of previous cases, these candidates take a proportion of their vote share from the previous election based on the circumstances of their departure from caucus. Those who depart for positive reasons retain much more of their support than those who leave in disgrace. When the circumstances are hard to define, an average proportion is used. A no-incumbent penalty is applied to the party the candidate left.

Comeback attempts are another factor taken into account when an ex-MP attempts a comeback as an independent, a new factor in 2015. Using past cases as a guide, the independent ex-MP candidate retains a portion of the vote that they received the last time they stood as a candidate.

By-elections are also taken into account when the result involved the incumbent party losing the seat or, when the by-election occurred since the last general election, if the results were significantly different from that previous vote. The swing from by-elections is calculated by how current polling levels differ from where the parties stood in the polls at the time of the by-election.

When available, and when they differ from the projection's estimations, riding polls are also added to the projection for an individual riding. The weight of the poll is determined by the number of respondents (i.e., 431 respondents would give the riding poll a weight of 43.1%) but is capped at 50%. The riding poll's results are used as a new baseline, from which the numbers are adjusted as regional polling changes. In other words, just as the standard model adjusts party support in a riding by the proportion that the party's support has shifted in the region since the last election, the riding poll is adjusted by the proportion that the party's support has shifted in the region since the date that the riding poll was conducted. When multiple riding polls are released during an election, only the latest one is taken into account.

Unique circumstances are also taken into account when looking at the three prior elections: for example, the presence of a very popular independent candidate may invalidate that election's results for the use of the swing model. A floor-crosser or leader that dramatically changed the outcome of an older election may also cause that election to be dropped from consideration.

The particularities of an election

When necessary, the projection model takes into account the individual particularities of an election campaign. One common particularity is the presence of a new party, or a formerly fringe party running a full (or almost full) slate of candidates.

When a party is running candidates where they did not have a name on the ballot in the previous elections (whether that be limited to a handful of ridings, as often occurs with smaller parties, or in the bulk of ridings, as occurred in the 2012 election in Alberta for Wildrose), the regional vote projection for the party is applied directly to the riding. For example, if a party is polling at 20% in a region it will be projected to have 20% in each riding in that region. However, that number can be adjusted by any of the factors listed above and is always adjusted when the model makes all of the projections add up to 100%. In this example, in ridings where there is little room for the party to have 20% their vote will be adjusted downwards. When there is a lot more room, the vote will be adjusted upwards. This system performed well when the real results of the 2012 Alberta election were applied: Wildrose would have been projected to win 18 seats (instead of the actual result of 17).

Likely seat ranges

In order to take into account error in the polling and in the seat projection model itself, the vote projection ranges are used to determine likely seat ranges. These are applied directly to each party's projected results in each riding. For example, if the high projected vote for a party in a given region is 5% higher than the most likely projection, then the projected vote for the party in each riding in that region is increased by a factor of 1.05. How these high and low results for each party in each riding compare determines whether a seat is "in play". If the projected high result for a party in a riding is higher than the projected low result for the party expected to win the seat, the seat is then potentially winnable for the trailing party.

This gives the seat projection a confidence interval, based on likely results if the polls over- or under-estimate party support by the same degree as they have historically. The maximum and minimum vote ranges are used in the same fashion to calculate maximum and minimum seat ranges if the degree of error approaches or matches historically poor performances by the polls.

Probability of a correct call

One new feature added to the model in 2013 was the probability that a call made by the seat projection model will be correct. This is based on an analysis of the seat projection model's performance in the elections that it has made projections for individual ridings. This probability is determined by the margin the projection model estimates the winner will win by. The following chart tracks how the projection has performed in the past, based on the projected winning margin in each riding.
Each red dot shows the percentage of calls that were correct, while the blue line shows the general trend. As the data is somewhat noisy (but the trend is still clearly visible), the trend line has been used to determine what probability of a correct call should be applied to every riding.

If the riding projection shows that a party leading in a riding by 12 points has a 74% chance of winning, that means that based on past performance the model will be right about 74% of the time when it chooses a winner by a margin of 12 points. It does not mean that there is a 74% chance that the projection for every party in the riding will be correct, or that the trailing party has a 26% chance of winning (a third place party could win as well). It is referring to the odds that the party projected to win will win. This model has performed well, such as in the 2015 federal election, as the following chart shows:
Hopefully, this should provide a complete explanation of ThreeHundredEight.com's projection methodology during and in the run-up to election campaigns.