Monday, December 3, 2012

BC New Democrats hold wide lead in first projection

The next election in British Columbia is less than six months away, and in's first projection and forecast for the province the New Democrats hold a wide lead over the governing B.C. Liberals. They are also heavily favoured to win, even six months out.

An analysis of the projection results themselves can be read at The Globe and Mail here. Rather than go over the results a second time, I will explain some of the projection's new features.

A complete explanation of every aspect of how the projection and forecasting models work can be found here. For the most part, the basic methodology has not changed so the bulk of that explanation is covering what I have already described in detail before. In this post, I'll go over the features that have not been presented in past elections.

With this new model, I wanted to tackle a few problems but also take a somewhat different approach to how projections are presented. The most important problem is how to bridge the gap between what the polls show and what the election results end up being. The amount of data available in Canadian elections is relatively low, especially compared to the United States and particularly at the tail end of a campaign. It is common for the final polls of a campaign to be conducted days before the actual vote, leaving a big gap between the last information available and the time that people cast their ballot.

The projection now makes two calls. The first, which is what I call the projection itself, is the best estimate of a likely outcome if an election were held on the last day that the most recent poll was in the field. In the case of British Columbia, that is Angus-Reid's poll of November 21. The projection, then, is the best estimate of the outcome of an election held November 21.

In order to measure the degree of error that can be expected in the polling, high and low projection ranges are tallied. This is determined by applying the margin of error for the estimated "sample" of the projection itself. This sample is measured by comparing the weights and sample sizes of other polls to the most highly weighted poll in the projection. For example, let us assume that a projection has three polls in the database. All three have a sample of 1,000 people. One poll is rated at half the weight of the most highly rated poll, while the third poll is weighted at one-fourth of the most highly rated poll. The estimated "sample" of the projection would then be considered to be 1,750 people (1,000 people from the first poll, 500 from the half-weight poll, and 250 from the quarter-weight poll). That 1,750 sample is then used to determine the margin of error for each party, which changes according to the support a party holds. An easy demonstration of this is that a party with 2.5% support has a different margin of error than a party with 25%. If the margin of error in the poll is 3.1%, the range of results for the small party can't be as low as -0.6%.

By calculating the margin of error attributed to the projection's "sample", that is used to determine the likely ranges of the projection itself. Theoretically, this range is based on the assumption that the polls have accurately reflected the mood of the electorate, within their respective margins of error.

The current projection is heavily skewed towards Angus-Reid's last poll, since the amount of data is rather thin. This makes the "sample" of the projection relatively small, not much larger than the 800 people surveyed in that poll. For that reason, the high and low vote projections are rather wide. If more polls were available, that margin would be narrower. The high and low seat projection is based on these high and low vote projection ranges. Accordingly, that range narrows once more polls are available.

The second call made by the model is what I refer to as the forecast. Having a background in History, I am partial to the idea that the past can tell us a lot about the future. The forecasting model is based entirely on this premise.

In addition to the projection, the model also gives the plausible high and low results each party might be able to manage by election day, tentatively scheduled for May 14, 2013. Unless the polls begin fluctuating wildly, the margin will narrow as Election Day approaches.

The ranges are determined by measuring the degree of polling volatility in the past, with the period examined being equal to the amount of time before the next election. For example, the next election is scheduled for 174 days from the date of the projection, so the ranges are determined by the difference between the highest and lowest poll result for each party over the last 174 days. This is a measure of what kind of change in support is plausible based on how much that support has changed in the past. Of course, what is plausible does not equal what is possible - in theory, a party can get anywhere from 0% to 100% of the vote. That a party at 5% could win 75% of the vote six months from now is possible, and vice versa. It is not plausible, however, if that 5% has varied by only three points in the previous six months of polling.

This forecasting also cannot take into account completely exceptional events, but it is a best estimate of the plausibility of a party gaining or losing a certain amount of support.

The forecast will be continually updated as a measure of what should be expected, based on current information. Accordingly, it will change and the forecasts now may not overlap with the forecasts one month from the election. It is, instead, a best guess of what to expect based on what we know now.

In the same way that the seat projection model gives a range of likely outcomes based on the vote projection, the seat forecasting model gives a likely range of outcomes based on the vote forecast. These, of course, vary wide as the election date is far away. They give the range of plausible outcomes for the next election based on the information available to us right now.

Note that the seat forecast is not the same kind of assessment as the seat projection in terms of what seats are at play. Currently, the Conservatives are given a high forecast of 23 seats and the Greens of six seats. This does not mean that we should expect the Conservatives and Greens to win this many seats. In the case of the Conservatives, it means that if the party does end up at 25.8% support (their forecasted high, which would almost certainly mean the B.C. Liberals have dropped considerably), they could win as many as 23 seats. In the case of the Greens, it means that if the party ends up at 16% support (their forecasted high) they would be in play in as many as six seats. The one is dependent on the other. The Greens are not going to be at play in six seats at 7% support or even the high projected result of 8.7% support in the first projection. The model considers that they are at play in no seats at current polling levels.

One new feature added to the model is the probability that a call made by the seat projection model will be correct. This is based on an analysis of the seat projection model's performance in the eight elections that it has made projections for individual ridings. This probability is determined by the margin the projection model estimates the winner will win by. The following chart tracks how the projection has performed in the past, based on the projected winning margin in each riding.

Each red dot shows the percentage of calls that were correct, while the blue line shows the general trend. As the data is somewhat noisy (but the trend is still clearly visible), the trend line has been used to determine what probability of a correct call should be applied to every riding.

If the riding projection shows that a party leading in a riding by 12 points has a 73% chance of winning, that means that based on past performance the model will be right about 73% of the time when it chooses a winner by a margin of 12 points. It does not mean that there is a 73% chance that the projection for every party in the riding will be correct, or that the trailing party has a 27% chance of winning (a third place party could win as well). It is referring to the odds that the party projected to win will win.

Another new feature of the model is the ability to calculate the probability of a party winning the next election as of the date of the projection.

This is also based on the performance of the projection model in the past. In short, it determines the probability that the amount of error in the seat projection will be less than the margin between the leading party and other parties. A margin as large as the one of 40 seats between the Liberals and NDP in the current projection has been overcome in only 2.4% of cases. That means that if this was the final projection call before the election, the NDP would have a 97.6% chance of winning it. This takes into account the potential for an Alberta-sized error, but calling a winner by 40 seats would be right virtually all of the time.

However, this is not a forecast of the future. It is the confidence that can be placed on a call if the election were held the day of the projection. Forecasting the probability of a future event is an entirely different matter.

The greater challenge lies in being able to predict the probability that a party will win an election at a future date based on current information. The seat forecast shows what range of outcomes are plausible, but not which are most likely to occur.

After analyzing almost 6,000 pieces of data from polls conducted in over 20 federal and provincial elections since 2004, I have been able to determine the probability that the margin between any two parties can be overcome in a given period of time. As this analysis was based on the difference between polls and final outcomes, it takes into account both the past amount of error in the polls as well as the degree of real change that has occurred in voting intentions.

Using this model suggests how likely it is for the B.C. Liberals to overcome a 19-point deficit six months from the election, based on how often this sort of shift has taken place in the past. In this case, this sort of margin has been overcome in the six months prior to an election only 4.3% of the time, giving the NDP a 95.7% chance of winning the popular vote in May 2013 based on the polling of November 2012.

The calculations do take into account the role played by third parties and other parties, when they have a significant level of support. When the margin between the leading party and third or other parties is very large, the model assumes a (nearly) 0% chance that they could win. When three or more parties are a factor, the probability is calculated accordingly.

Calculating the probability of a party winning a future event should be very familiar to readers of Nate Silver's FiveThirtyEight blog. In his excellent book, The Signal and the Noise, Silver includes a chart showcasing the probability of a Senate candidate winning an election based on their polling on a given date. This gave me the opportunity to check my math, as it were. Here is a comparison of FiveThirtyEight's probability ratings to the ones that are employed by ThreeHundredEight:

As you can see, the probabilities are almost identical in most cases, and are often lower than FiveThirtyEight's. This may be a reflection of the complications caused by our three-or-more-party system. The numbers in the above chart assumes a two-party race, which of course is not always the case in Canada. But it is also possible to use these numbers to calculate the odds of a party winning in a multi-party race.

This calculation is based solely on the probability of a party winning the popular vote. In U.S. Senate races, that is also what determines who wins the election. That is not the case in Canada, where a party can win the most seats with fewer votes. The amount of variables at play to determine the winner of the most seats based on popular support in polls months before an election is, of course, enormous. But generally speaking, the party with the most votes will win the most seats, so while the probability of a party winning the popular vote may not necessarily determine their probability of winning the election, it is a close enough proxy. This is particularly the case in British Columbia, where neither the Liberals nor the NDP appear to have an intrinsic advantage in vote efficiency.

The projection will be updated continuously as the next vote approaches, with the forecast probabilities changing as new information becomes available. The B.C. model has been launched six months from the start date as it is the only upcoming election actually scheduled. Nova Scotia will be heading to the polls at some point soon, and that province's model will launch in the first half of next year if an election isn't already called. Ontario and Quebec could be heading back to the polls potentially even earlier than British Columbia, and this model will be tweaked accordingly and applied to those campaigns. But as B.C. is the only province we know is having an election in the next six months, the site will focus on that province's politics.

It should be an interesting campaign. The odds heavily favour the New Democrats, as a swing of 19 points even six months before an election is very rare. Much of the Liberals' hope lies with their ability to put the Conservatives back in their place. They aren't in much danger of being overcome by the Conservatives anymore, but at 12.8% the party makes it impossible for the Liberals to beat the New Democrats. Christy Clark will need to get those disaffected Liberals back into the fold if she is to have any hope of staying on as Premier.