Wondering why US pollsters got it wrong? Non-response bias is the primary reason.
In 2024, the vast majority of election polls indicated that Kamala Harris would win and become the next president of the United States, although all indicated that it would be a very tight race. The main reason why US pollsters got it wrong was non-response bias.
What is Non-Response Bias?
In opinion surveys, like election polls, phone numbers are randomly selected and individuals invited to participate. There is always a proportion of the randomly selected individuals who choose not to respond to a survey. A non-response bias is introduced into the survey data if these non-respondents disproportionately share a common attitude. Non-response bias is invisible, unknown and unknowable, since we cannot know the attitudes of individuals who did not participate in a survey.
Pollsters reactions to a potential non-response bias
The pollster story begins in 2016. Donald Trump was the surprise winner of the 2016 U.S. presidential election. It was a surprise because the polls had overwhelmingly indicated that Hilary Clinton would win the election. It turned out that pro-Trump voters disproportionately did not participate in surveys prior to the election, but they showed up to vote.
In the 2024 presidential election, most pollsters believed that Trump voters were again being undercounted, but they felt that it was by much less than in 2016. That is, the pollsters believed felt that the the stigma of being a Trump voters had diminished. Some pollsters believed the 2016 issue no longer existed at all, and a few thought that Harris voters might be underrepresented in the 2024 polls.
Given that there is no way to know whether there was a non-response bias or to estimate its size, Pollsters had to make a judgement call – not based on fact or evidence or survey science.
-
Some pollsters who believed Trump supporters were underrepresented in the 2024 election polls weighted their survey data using their best guess of the proportion of uncounted Trump voters based on the 2016 election polling and/or the current political climate.
-
Other pollsters who believed Trump supporters were underrepresented in the 2024 election polls chose not to adjust for the underrepresentation of Trump voters by weighting because the pollsters either didn’t want to guess the size of the underrepresentation or felt that the undercounting of Trump voters in polls was negligible.
-
Some pollsters didn’t believe there was a non-response bias in 2024, so did not adjust/weight their survey data.
Everyone got it wrong in 2024. Trump voters again did not participate in election polls and were a much larger proportion of the non-respondents than any pollster accounted for. This created a sizable non-response bias in the polling data, resulting in polls not predicting the magnitude of Donald Trump’s win in 2024.
Science of Surveys
The 2024 US presidential election polls were not incorrect because of survey science. There is a gap between the mathematics of survey sampling and the real world context due to the right of randomly selected respondents to not participate in a survey. The mathematics/statistics behind random sample surveying does not include the possibility of a ‘non-response’ but rather, assume a perfect random sample selection.
Blame the Pollsters?
In the weeks leading up to the presidential election, pollsters who were vocal in the media about the fact that there could be a systematic bias in their polling data and it could not be relied upon.
And still… these pollsters continued to produce election polls, and the media continued to discuss them daily.
Because the two candidates were neck-in-neck in the surveys (within the statistical margin of error) and because pollsters believed a non-response bias was likely, surveys ceased to be a scientific tool for predicting the outcome of the 2024 U.S. presidential election.
It is my view that pollsters who believed their polling data might be subject to a non-response bias should have stopped publishing polling data of unknown accuracy. Surely, professional pollsters have a professional responsibility not to publish data they believe may be biased.
Blame the Media?
For the most part, I blame the media for publishing polls with unknown accuracy. In the final weeks of the US presidential election campaign, pollsters stated clearly that their polls may have a non-response bias, but the media spent a lot of time discussing the poll findings anyway.
The media need to start treating election polls (and all survey data) like any other information they publish. It needs to be verifiably accurate.
Elevate your Game
Every time there is an Attempted Census survey or a Random Sample survey in which the response rate is less than 100%, you must consider whether the non-respondents are a random group of people, which will not impact the findings, or a group of people who disproportionately hold a particular attitude, which will skew the data.
A common action to address a potential invisible non-response bias is to weight the sample data to known population parameters (e.g., age, gender identity, region, education, gender within region, etc.) in understanding that attitudes and behaviours are very often related to demographic characteristics. Fix the demographic skews in the survey sample data –> fix the attitude/behaviour skews in the survey sample data.
Real World Example
MEDIA ARTICLE: Iowa Poll: Kamala Harris leapfrogs Donald Trump to take lead near Election Day. Here’s how, Des Moines Register, Nov 2, 2024
A few days before the November 2024 presidential election, highly regarded pollster Ann Selzer published a poll showing that Kamala Harris had leapfrogged past Donald Trump to take the lead in Iowa. This change turned out to be erroneous. Ann Selzer subsequently wrote: “Polling is a science of estimation, and science has a way of periodically humbling the scientist. So, I’m humbled, yet always willing to learn from unexpected findings.”
I disagree with Selzer. I don’t think it was the survey science that tripped her up.
Selzer’s polling error likely comes from three sources, the most significant of these being ignoring non-response bias. Selzer uses a consistent approach to election survey sampling and weighting
- Selzer publishes her survey sample data without adjusting for any unknown non-response bias. As Selzer explained, she doesn’t make any “assumptions of what is or isn’t going to happen and then […] weight down the minority vote because you don’t think they’re going to show up.” Selzer’s view is that pollsters should stick to the science of surveys.
I agree with Selzer’s point of view that it is best not to guess what the unknown response bias is and then weight sample data based on that guess. But… I also hold the view that if a professional pollster believes there may be a substantial skew in the data due to a non-response bias, they should not publish the polling data. That is, if a professional pollster is uncertain about the accuracy of their data, they shouldn’t publish it. - Selzer applies minimal demographic weighting that excludes education. In the 2024 election, males with lower levels of education males skewed heavily towards Donald Trump. Had Selzer weighted by education, or even better by gender within education, the Selzer poll skew towards Kamala Harris may have bee minimized.
- Selzer defines “likely voters” as those who say they have already voted or say they will definitely vote, and does not include respondents’ past voting experience to identify “likely voters”. In election polls, pollsters typically survey “likely voters” not all adults because they are interested in predicting how people will actual vote. In their surveys of “likely voters, most pollsters include whether the respondent voted in a past election. This is because behavioural psychology studies have consistently shown that past behaviour is the best predictor of future behaviour. Despite Selzer’s extraordinary accuracy in previous presidential election polls, this her definition may have contributed to the anomalous result.