Tuesday, November 26, 2013

Problems with pollsters, again

I think at this point accurate polling in Canada that can predict elections on the large scale is no longer possible.

Forum Research polled people in the four recent by-elections four or five times during the by-elections and their results were not really very close at all to the final results.   There is a systemic problem with polling in Canada and I am not convinced pollsters have any real interest in addressing it other than Frank Graves at EKOS.

So people understand that I do have some idea about I am talking about, in the past I have run numerous telephone polls and came close to launching a polling company in the late 1990s.  My educational background used a lot of statistically methods to analyse topics from Canadian history.   I started university with the intention of a physics or math degree before choosing history.

The idea with a poll is that you are getting a random sample which will reflect the opinion of your whole population.  The two important factors at work are: knowing what population you want to sample and how to get a representative random sample.

Random Sample
Telephone polling worked reasonably well for close to 50 years when almost all homes had a phone and were in the phone book.   You could use the phone book as your randomizer and be fairly sure you would get a decent representative random sample.   Since most homes had more than one voter in them, you did have to ask for specific people when calling such as the youngest female voter or oldest male voter to ensure the random nature of the sample.

One reason in 1948 there was the famous Dewey Defeats Truman headline in the Chicago Tribune is because they relied on polling that indicated Dewey had a strong lead over Truman.  One of the problems with the polling at that time was that many homes, especially poorer ones, did not have phones and were not polled.  1948 was too early for telephone polling.

With the rise of mobile phones, the phone book and landlines are no longer a reasonable random sample of the public.   Without a real random sample you have no idea if you sample reflects the overall population.

To get a good sample you also need have most people answer the call.  Caller ID and voice messaging have reduced the number of people that will answer calls.  Interactive Voice Response polling (aka robocalls) has further reduced response rates to polls.   IVR polls can have as much as 95% refusal rates.   What this means is that the majority of random numbers you call do not respond making your sample not random any longer.

These two factors have made telephone polling not a way to get a random sample of the public and therefore not useful as a way to poll.

I will not go into the use of Bayesian credibility intervals as a substitute for a statistical margin of error other than to say the AAPOR is not too keen on it.

Population You Want to Sample
Elections are decided by the people that vote so those are the people you want to sample.   When voter turnout rates were close to 80%, a random sample of the general public was a reasonable substitute for the voting public.

These with close to half the people are not voting, and in by-elections 2/3s are not voting, the general public is no longer a reasonable representation of the voting public.   The problem is how do you know who will vote and who will not vote?  How do you demographically weight the sample?  Do you base it on the voter demographics of the previous election?

Most pollsters get 80-90% of their respondents expressing an opinion on how they will vote.  This is clearly not accurate when only 55% of the people vote.  Either the pollsters are only reaching the voting public, which I do not believe for a moment, or they are ignoring the fact that 1/3 or more of their respondents are not telling them the truth.  OK, that is not entirely true, many people will answer with an opinion and are intending on voting but then do not vote.

Ideally you want to sample the people that will vote in this election, but as work by Elections BC shows, a significant portion of the public that intends to vote will change their mind on election day.

What all this means is that we have no functional way to figure out who should be in the population we want to sample.

Online Opt-in Panels are not a Solution
The problem with all the online panels is that they are a self selecting part of the public.   They are in no way a representative sample of the public overall or the voting public.  Online panels making surveying easy, but this does not mean the data is at all useful.

I have been trying to work out any functional online model for polling, but I can not find a functional way to get a random sample.

Frank Graves at EKOS has a probability based panel - this means they choose you to in the panel pool and not the other way around.   I do not have data to know if his model works well or not.

Hybrid Approaches
This is using a number of different sources to get respondents.   It is one way to get around some of the issues but the added complexity of the hybrid models will make them much more prone to possible problems in the sampling process and how the data is weighted from the different parts of the survey.  

The hybrid model should make the survey more closely reflect the population you are sampling but the problem is how do get anything approaching a random sample?   I do not know if the second issue can be overcome.

That said, I suspect any functional model moving forward will likely be hybrid in nature.  Not only that, the methods I am trying out are hybrids.

My Current Best Possible Solution
What I am working on testing is a model that combines home visits and street interviews.   The idea is that in neighbourhoods with single family homes you have people canvas a specific number of the houses randomly selected.   You then supplement this by street level interviews of random people at  random times in random locations where you find apartments and condos.

This model seems to be able to produce a reasonable random sample of the general public, but it is unclear if there is a way to only capture the voting public.   What I am trying is to sample according to past election turn out rates by age.    I am not certain this will capture the actual voting public or not.

This model should work for polling within a single city but is not easy to scale up to the provincial or nationwide level.

No comments: