Thursday, April 15, 2010

Polling in Canada

I raised this issue sometime back, but I raise it again, there is an ongoing large error with the polls conducted in Canada - a lot more people say they are going to vote than will actually vote in an election.  Most pollsters are finding 80-90% of respondents as decided in how they will vote in election but we can be quite certain that less than 60% of people will vote in the next election.  

This means at a minimum, one in three responses the pollsters are getting are WRONG.  

Educated guessing would do as well.   Anecdotal surveys are not much less accurate.

The pollsters list the statistical margin of error, they should also be listing much bigger source error, false answers.   It is safe to assume that one in three answers is not accurate which means we can assume there is a +- 15% value in each of the party results - this means a 40% result in a poll would be a range of 34% to 46%.

I can here some of you say "but the polls reasonably reflect the voting in the election".  I would argue they are not nearly as close as people assume.   As an example, the Conservatives managed to close to 10% more votes than the median of the polls in the week before the vote.   The Greens managed to be managed to be 20% lower than they were polling.   Given the large number of polls and the narrow margins of statistical error, the aggregate of pollsters were outside their 95% confidence level.

In each election some polling company happens to be close to the final result, but it should be all of them if their work is done correctly.

One can also look at the polls lately and see how things are bouncing around.  They are not staying within the bounds of the statistical error.  Polling is seen as a scientific endeavor, but given that the companies do not come to within each other in margins of error 95% of the time, there is something wrong.

If we do use the wonderful world of the science and document the sources of error and assign a value of to the known turnout of voters at the last election and compare that to the number of decided voters in the poll, we get a number we can use as a measure of this error.  If we use the 87.3% decided voters from the latest Ekos poll, and use a 59% turn out in the next election based on the last one, we get an estimated error of 32.4%, or plus or minus 16.2% of the value each party got in the poll.   This gives us the following national results for the poll:

  • Conservatives -  26.3% - 36.5%
  • Liberals - 24.3% - 33.7%
  • NDP - 13.7% - 19.1%
  • Bloc - 7.5% - 10.1%
  • Green - 9.4% - 12.9%
  • Other - 2.8% - 3.8%

The error we have to assume from the wrong responses in decided voters means we have a much large error that the statistical error.  In fact, the statistical error is lost in the noise of the much larger error in decided voters.   If we add the statistical error from the latest Ekos poll, the numbers only move marginally, though the range does get bigger

  • Conservatives - 25.5% - 37.3%
  • Liberals - 23.5% - 34.5%
  • NDP - 13.3% - 19.5%
  • Bloc  - 7.3% - 10.3%
  • Green - 9.3% - 13.2%
  • Other - 2.8% - 3.8%

The ranges are large enough now that all the polling company results are within the margin of error.  All the pollsters in 2008 also manage to be within this margin of error.

It is because of this large margin of error that I have made very few projections for any election.   It is all a crapshoot at the moment unless there is some large movement in the polls.  Polling between elections also has an added problem, with no campaign, most of the public does not have a strong opinion on political parties and their responses are weaker.

I assume one of the major reasons people tell pollsters they are decided voters when they are not going to vote is that people still feel a civic duty to the idea of voting.   They do not wish to publicly state they do not plan on voting.

Pollsters should add another question to set of questions to their polls:

  • Did you vote in the last election?
  • If they did vote - how did they vote?
  • Did you vote in the 2006 election? and for whom?
  • Did you vote in the 2004 election? and for whom?

From this the pollsters could set values to the responses of decided voters.   Someone that did vote in 2004, 06 or 08 would be assigned a lower value for their response, as an example their response would be counted as 80% not planning on voting and 20% for their choice.  Other people you weight based on how often they vote.  The weighting needs to reflect the fact that more than four in ten people will not be voting.   The weighted results of a 2000 person poll should have no more than 1200 decided voter results.

Asking who people voted from in past elections gives you another margin of error to measure.   The results of the responses to that question should match the actual election results within the statistical error.  If not, the pollster needs to adjust the relative weights of the responses to match the previous election result and account for error because of this.  The reality is that a lot of people will lie about their past votes in elections and there has to be a mechanism to capture this.

Pollsters have to come to terms with the fact that a lot of people are lying to them.  Not to do that means they are producing reports that are at best misleading, in fact I would call them fraudulent because they know there is something wrong and they are not correcting for it.

Thoughts?  Comments?
Post a Comment