Thursday, January 26, 2012

The weakness of polling - two polls released in Alberta at the same time

I have a lot of skepticism of polling because the results are so often such a very weak reflection of how the public actually votes.    Pollsters all act as if their results are good and that there are no systemic problems with how they poll the public.    Most political polls are commissioned by the media so they have an incentive not to find fault in the poll they paid for.

This week we see a classic example of the problem in polling with the release of the results of two polls in Alberta.

Forum Research and Leger Marketing both polled in Alberta last week

  • Party  Forum Jan 17 Leger Jan 13-18
  • PC         38%         53%
  • Wildrose   29%         16%
  • Liberals   14%         11%
  • NDP        13%         16%
  • Alberta     3%          2%
  • Others      4%          6%
  • Poll size 1077         736
  • error     2.99%       3.61%

I am not sure if the Forum sample size is for the whole poll or only the decided voters.   The Leger poll has 900 respondents and 736 decided voters.  Since I am not certain of what the Forum sample represents, the math I do below could be somewhat different.

How do you explain two pollsters get things that different?

Polling margin of errors are statements of how likely the result is to land on the part of the bell curve representing 95% of the probable outcomes.   If a result on one survey just overlaps with another survey the combined area is a very low probability outcome.

This is what a bell curve looks like for different sample sizes - it shows why polls really need
to have larger sample sizes than 1000 to reflect the public.

So is there overlap an in the results?  Applying the margin of error to show the 95% confidence level to each of the top four parties we get the following:

  • Party       Forum      Leger
  • PCs      35.1%-40.9% 49.4%-56.6%
  • Wildrose 26.8%-31.2% 13.3%-18.7%
  • Liberals 11.9%-16.1%  8.7%-13.3%
  • NDP      11.0%-15.0% 13.3%-18.7%
If we only look at the NDP results where there is some overlap, how much of the bell curve of the two surveys does the overlap represent?   10% of the total possible outcomes for each survey fall into the overlap range.     What this means is that the two surveys are saying 90% of each of their ranges will not overlap with the other survey at all.   Basically we have two bell curves that only share a small tail either at the upper or lower limits.

There is an overlap for the Liberals as well, but the range is only 7.5% of the probable outcomes, a one in 13 chance.

In theory, because the curves go out to infinity, the PC and Wildrose results overlap.  In the case of the Wildrose party the overlap is 0.0001% of the time, this is a probability of  once in a million.   For the PCs it is  0.005% of the time or a one in 20,000 chance. 

The NDP and Liberal results are close enough that people will assume the mid range between the two results is the correct result, but statistically it is improbable that this is correct.   It highlights the danger of how people view and manipulate the data from polls.   People want to see a pattern where the statistics say there is no pattern.  People assume that this improbable event of the overlapping range is what happened.   Much more likely is that that margins of error in the poll from methodology is many times larger that the statistical one.  The real range for the NDP is more likely from 10% to 20%, but this is only a rough guess on my part because I have no quantifiable data to show how much error each pollster causes with how they do their polling.

People are also driven by what they expect the result to be.   The NDP getting 14.5% sounds realistic and I have no data to prove that is wrong.   People's assumptions about public electoral behaviour is very conservative, past results being considered the most likely future results.

Too often the public seems to accept that a poll could be that one time when the poll is outside of the 95% confidence range on the bell curve.   This should be a rare occurrence.  Since October 2009 there have been 20 provincial polls in Alberta, only one of those polls should have had an anomalous result.   The problem is that most of the time we have no data to test how accurate the pollsters are, this only happens on election day.  

In Alberta there have been some significant disagreements between pollsters in the last 20 polls, not just these latest two.

Environics and Think HQ Public Affairs both polled in July 2011, the differences between the PCs and Wildrose between those two polls was as bad as in the latest two polls.  One of the two is outside of the 95% confidence range

From early October to early November of 2011 Lethbridge College, Angus Reid Strategies and Environics all polled.   The ranges are closer, but the overlapping parts of the curves are small, at least one of the three has to be outside of the 95% confidence level.

What this means is that there is fundamentally wrong with the polling process.  Over and over again the "1 in 20" polls crop up much more often than ever should.  Of the last nine polls, at least three of the polls are in that rare category.   These "rare" polls occurred at least 6.7 times more often than they should have.

Effectively polls are little more than manufactured news.   These two latest polls were commissioned by two different newspapers which means neither paper has any interest in reporting the other poll or questioning the accuracy of their poll.  It explains why pollsters get no scrutiny in the media even when though they are wrong a lot more than anyone is admitting.

No comments: