Thursday, April 15, 2010

Polling in Canada

I raised this issue sometime back, but I raise it again, there is an ongoing large error with the polls conducted in Canada - a lot more people say they are going to vote than will actually vote in an election.  Most pollsters are finding 80-90% of respondents as decided in how they will vote in election but we can be quite certain that less than 60% of people will vote in the next election.  

This means at a minimum, one in three responses the pollsters are getting are WRONG.  

Educated guessing would do as well.   Anecdotal surveys are not much less accurate.

The pollsters list the statistical margin of error, they should also be listing much bigger source error, false answers.   It is safe to assume that one in three answers is not accurate which means we can assume there is a +- 15% value in each of the party results - this means a 40% result in a poll would be a range of 34% to 46%.

I can here some of you say "but the polls reasonably reflect the voting in the election".  I would argue they are not nearly as close as people assume.   As an example, the Conservatives managed to close to 10% more votes than the median of the polls in the week before the vote.   The Greens managed to be managed to be 20% lower than they were polling.   Given the large number of polls and the narrow margins of statistical error, the aggregate of pollsters were outside their 95% confidence level.

In each election some polling company happens to be close to the final result, but it should be all of them if their work is done correctly.

One can also look at the polls lately and see how things are bouncing around.  They are not staying within the bounds of the statistical error.  Polling is seen as a scientific endeavor, but given that the companies do not come to within each other in margins of error 95% of the time, there is something wrong.

If we do use the wonderful world of the science and document the sources of error and assign a value of to the known turnout of voters at the last election and compare that to the number of decided voters in the poll, we get a number we can use as a measure of this error.  If we use the 87.3% decided voters from the latest Ekos poll, and use a 59% turn out in the next election based on the last one, we get an estimated error of 32.4%, or plus or minus 16.2% of the value each party got in the poll.   This gives us the following national results for the poll:

  • Conservatives -  26.3% - 36.5%
  • Liberals - 24.3% - 33.7%
  • NDP - 13.7% - 19.1%
  • Bloc - 7.5% - 10.1%
  • Green - 9.4% - 12.9%
  • Other - 2.8% - 3.8%

The error we have to assume from the wrong responses in decided voters means we have a much large error that the statistical error.  In fact, the statistical error is lost in the noise of the much larger error in decided voters.   If we add the statistical error from the latest Ekos poll, the numbers only move marginally, though the range does get bigger

  • Conservatives - 25.5% - 37.3%
  • Liberals - 23.5% - 34.5%
  • NDP - 13.3% - 19.5%
  • Bloc  - 7.3% - 10.3%
  • Green - 9.3% - 13.2%
  • Other - 2.8% - 3.8%

The ranges are large enough now that all the polling company results are within the margin of error.  All the pollsters in 2008 also manage to be within this margin of error.

It is because of this large margin of error that I have made very few projections for any election.   It is all a crapshoot at the moment unless there is some large movement in the polls.  Polling between elections also has an added problem, with no campaign, most of the public does not have a strong opinion on political parties and their responses are weaker.

I assume one of the major reasons people tell pollsters they are decided voters when they are not going to vote is that people still feel a civic duty to the idea of voting.   They do not wish to publicly state they do not plan on voting.

Pollsters should add another question to set of questions to their polls:

  • Did you vote in the last election?
  • If they did vote - how did they vote?
  • Did you vote in the 2006 election? and for whom?
  • Did you vote in the 2004 election? and for whom?

From this the pollsters could set values to the responses of decided voters.   Someone that did vote in 2004, 06 or 08 would be assigned a lower value for their response, as an example their response would be counted as 80% not planning on voting and 20% for their choice.  Other people you weight based on how often they vote.  The weighting needs to reflect the fact that more than four in ten people will not be voting.   The weighted results of a 2000 person poll should have no more than 1200 decided voter results.

Asking who people voted from in past elections gives you another margin of error to measure.   The results of the responses to that question should match the actual election results within the statistical error.  If not, the pollster needs to adjust the relative weights of the responses to match the previous election result and account for error because of this.  The reality is that a lot of people will lie about their past votes in elections and there has to be a mechanism to capture this.

Pollsters have to come to terms with the fact that a lot of people are lying to them.  Not to do that means they are producing reports that are at best misleading, in fact I would call them fraudulent because they know there is something wrong and they are not correcting for it.

Thoughts?  Comments?


Election Watcher said...

A few comments:

- When pollsters cite a range of +/- 3%, they mean 3 percentage points. That is, 30% means 27-33 (not 29.1-30.9).

- Polls are inherently a snapshot of the electorate. Thus, even if there were no undervoting (or sampling or dishonesty) issues, using polls to predict election results means that one should take a larger margin of error into account.

- Your methodology for calculating the error due to undervoting is unfortunately invalid. If actual voters are randomly drawn from the population that pollsters sample from, the answer should be much much lower. If not, then it is more of an issue with the representativeness of the sample rather than pure statistical error due to undervoting (which is what the 3% is about).

- I wouldn't be surprised if some Canadian pollsters actually do weight people according to their voting likelihood. Pollsters in the US do so in "likely voter" polls, and some UK firms do it as well. We don't know if Canadian pollsters do since they reveal less about their methodology.

Bernard said...

Actually, your statistical error bars change based on the number you are at. The +- number they state is for a result at the 50% level, if a party comes in at 30%, the error bars change.

My point is that the results do not reflect the reality of who is voting, they are not representative samples. Until pollsters fix this major error, one has to calculate how wrong they are. The methodology I have shown here is the best I can come up with to reflect the error of the pollsters.

I have yet to see any Canadian pollster indicate they are weighting likely voters in any way of form. Ultimately asking that question is not a good one to ask, asking when and how often people have voted is a much better one to ask. No one asks that question.

Given that the results of the different polling firms are not as narrowly clustered as they should be based on the statistical measure, there is something wrong. You can calculate the 90% confidence level for each poll or the 75% confidence level. The margin statistical error becomes very small once one goes to the 90% level.

The polls nationally should be very, very close together. The reality is that they are not and they are way out based on the pure statistical model. There is other error going on and that is the error of representativeness in the polls.

Where does the error come from? People lying to the pollsters.

Election Watcher said...

True, the error changes according to where you are - but not proportionally to your vote share. (For example, at 30%, the margin for a 1,000-person poll is about 2.9%.) Just thought it was confusing (and a bit misleading) that you used percentages rather than percentage points, which people are most used to.

The problem with trying to quantitatively to account for sampling bias is that, almost by definition, there's no good way of doing so without very strong assumptions. The problem with your methodology is that it doesn't have a quantitative interpretation: are you trying to get the 75% confidence interval or 99%? As a result, comparing your numbers to the margins given by polling firms is like comparing apples and meat pies.

I like your suggestion of new questions for applying a likely voter screen. Some US pollsters use it already. However, almost invariably, a greater proportion of people "recall" voting than actually voted...

Btw, two 1,000-person polls are within the margin of error if their numbers are within about 4% of each other (multiply the margin for a poll by sqrt(2)). Thus, a poll showing 34% for a party is not statistically significantly different from a 38% poll.

In terms of the margin between two parties, the relevant figure is about 6% (margin of a single poll multiplied twice by sqrt(2) - this is actually an underestimate due to the negative correlation of mistakes on figures for different parties). So the difference between a 33-30 poll and a 36-27 poll are again not statistically significantly different.

I don't think we get results that are farther apart far more than once out of 20 (on polls that are in the field at the same time). A little more, maybe: after all, methodologies do vary a bit. But I disagree with your assertion that national polls results are "way out" (except for Greens and Others).

Bernard said...

I was meaning to go to the 90% or 75% confidence level to show that the companies should be closer together in their results with the more precise result of the lower confidence level.

At the 90% confidence level only one poll in 10 should be outside of the margin of statistical error, an error that is smaller at 90% than the normal 95% that is reported.

The margins between the companies is too large to be accounted for on the statistical error alone. I have done the math to certain, and I am unlikely to do it, but just looking at the polling numbers, there are ongoing inconsistencies in the numbers. The margin of error means for all of the parties, one party out means this poll is not within the one in 20.

Without some sort of screen for people not voting and a way to have a moderate degree of confidence in the answers, there is a big problem in how the polls are run and the results that come out of them.

Even when the polls are 'accurate' with each other, we know that at least one in three people is not responding how they will actually act in an election. If it was a small amount, no big deal, but this is large number.

Unknown said...

I think the 'likely voters' issue is the big one. If I recall correctly in the US, the results were often quoted as being 'amongst likely voters'. These were determined not by questions about previous voting, but by demographics - folks in age groups, social standing, income level and so on. Judgment calls about likely voters probably make up some of the spread in results, assuming random sampling is done correctly. The other source of error I think is that sampling 1000 voters (say) with very small numbers in some communities which have strong demographic biases to one party or another may skew the results and not give a truly random sample.

I'd also argue that you can't really tell whether the fact that more people have voting intentions than will actually vote tells you anything about whether the rest of their answers are false. It only says that you can't tell - without additional information - which of the respondees will vote. I can't see how you can assume anything about the errors on this.

Bernard said...

All we can assume is that there is a much larger margin of error in the results than just the statistical one.

We do know that the answer from the people indicating they are a decided voter and they will not be voting is false because they will not choosing the party they indicated at the polls.

I think you are asking if the non voters answering are of the same distribution as the actual voters. We have no data to indicate that this would the case. In fact we have nothing to indicate anything about them other than they are lying when they say are decided voters.

Once again, think about this, we are reasonably certain that 1/3 of the respondents to a political poll in Canada is lying. One in three lies, that has to be some sort of impact on the results

Election Watcher said...

Just want to respond to a few points:

- "I was meaning to go to the 90% or 75% confidence level"

Sure - just multiply all the margins I gave by 0.84 or 0.59 respectively.

I still don't think that you'd get results that are much more erratic than statistical error suggests after correcting for fixed house effects, at least for the 3 major parties (Green and Other numbers do seem more erratic). But I guess we'll have to agree to disagree, since neither of us is inclined to crunch the numbers :)

Also, your perception of polls being all over the place is essentially unrelated to the undervoting issue: all pollsters may be polling the wrong population (overall instead of voting), but they are polling the same wrong population, so they should get consistent results!

- "The margin of error means for all of the parties, one party out means this poll is not within the one in 20."

No. If you look at how the margin is calculated, it is just for one party.

- "All we can assume is that there is a much larger margin of error in the results than just the statistical one."

One really needs to distinguish between bias and precision. The loss in precision from 1/3 of people lying randomly about whether they'll vote is EXTREMELY small.

The resulting bias can indeed be big. But "margin of error" is not a concept that's meant to account for bias, because you can't statistically tell how big various biases are.

Bernard said...

On the wrong population, it is more than that. They are all report people as decided voters that have not voted in the past and will not vote in the next election. These people are unlikely to have a consistent answer a group since they have no personal attachment to the voting process.

Including non-voters in their sample of peoples' voting intentions will automatically skew any result. It is like asking non-smokers their favorite cigarette brand.