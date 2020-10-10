A discussion prompted a thought that there are lots of misconceptions about polling and what it is designed to do, which in turn leads to incorrect suggestions that polls are wrong and therefore useless. So independent on specific polling discussions about this year's elections, I thought it would be useful to think more broadly about how polls work and how we should approach reading them. I don't claim to be a mathematical expert, so others more steeped in stats and probability will have more to add than I do, and will probably correct some of my thoughts. But given that polls remain the best source of evidence to determine how an election race is going, I have invested a bit of time in understanding how they work, and why they sometimes have perceptions of being wrong, and why sometimes those perceptions are correct and other times they are incorrect.



The key lessons for me in reading polls are...



Polls =|= Punditry

Polls and punditry are different things. When people tell us how wrong the polls were in 2016, what they actually mean very often is that the consensus among political commentators was wrong. That is probably correct. The polls, when we look back on them, showed considerable uncertainty for a few reasons. First, they were quite volatile through the campaign. Second, neither candidate could break beyond about 45% average (at least not for very long) in swing states or nationally, so there were always large pools of undecided voters that created uncertainty. It is true that pundits probably didn't properly reflect that uncertainty in their commentary or in many of the models that they built. But a failure of pundits to properly analyse and reflect what polls are telling us is not a failure of the polls, it is a failure of punditry.



Polls are snapshots, not crystal balls

Polls measure sentiment now, not in the future. They cannot predict what events might occur to shift polls. If I'm running in an election and I'm 47-40 up three weeks out, and then I'm 45-44 one day out from election day, and then I lose 44-45 that doesn't mean the poll three weeks ago was wrong. Three weeks ago, 47% might have planned to vote for me, and then over the course of three weeks, they changed their minds. When we look at polls and try to predict what that means for the election, we are entering the realm of punditry - we can use our knowledge of the campaign and candidates to guess whether the polls might shift, but the polls themselves cannot tell us that. So in October 2016, polls in early October couldn't predict that Comey would reopen the Clinton investigation only to close it again a couple of days before the election. That fact, and the effect it had on the polls, doesn't make the polls in early October wrong.



Polls don't claim to be perfect

Polls have margins of error. Decent polls usually have MOEs of around 3%. Worse ones sometimes come out with MOEs of 5% or 6%. And those MOEs apply to each candidate's vote share, not to the lead. And even then, polls also have confidence intervals - usually 95% - which means the poll is saying that it is confident that the poll is accurate within the margin of error, 95% of the time . So a poll with a 3% MOE that shows Biden at 50% and Trump at 40% is actually saying that there is a 95% chance that Biden is somewhere between 47 and 53, and that Trump is somewhere between 37 and 43. So the 10% headline lead could actually be anything from a 16% lead to a 4% lead, with 95% confidence. And of course there is a 5% chance that the result might fall outside that.



Vote share matters more than leads

If it's election day, would you rather a 10% lead at 40% to 30%, with 30% still making up their mind. Or would you rather a 5% lead at 50% to 45% with 5% still making up their mind? Clearly the second is a better position to be in. The big takeaway from 2016 is that we should pay attention to the undecideds. Polls don't predict how they will vote. Again, we can use some punditry here to take a guess. For example, we knew in 2016 that the undecideds were (on average) older, more conservative voters who hadn't made up their mind. We could take from that, using what we knew about the campaign, that they might have been the type of folk who would have supported Ted Cruz, and who heard his Convention message to 'vote your conscience', or Republicans who flirted with never Trumpism. So punditry might have told us there was a good chance that undecided voters would break heavily for Trump. But polls can't tell us that, and they don't claim to. So if you have a poll that shows Candidate A getting 45% and Candidate B getting 35% but 20% of voters are undecided, the poll can't predict how that 20% will vote, and if they break three to one for candidate B, so that A gets 49.9% but B gets 50.1%, that doesn't make the poll wrong (even leaving aside the MOE) And that in most states in 2016, the undecideds was considerably higher than the margin between the candidates, and as it turned out, that vote broke for Trump. The lesson in 2020, I think, is that we should be looking at (a) how big is the undecided vote, and (b) what are its demographic characteristics. The size will tell us if it has the potential to change the outcome of the race, and the characteristics might help us to predict, or guess, how the undecided vote might break.



Averages > Individual Polls

Individual polls have individual issues. First, not all polls are created equally. Some polls are conducted using more reliable methodology. For example, in 1948, the polls showing Dewey beating Truman were basically conducted by media types contacting friends and friends of friends. They then weighted according to some demographic characteristics, but the fundamental flaw remained: they were mainly Republicans, their friends were mainly Republicans and their friends of friends were mainly Republican. If we were to put that poll beside a poll conducted using random dial, live interview polling, it would be nonsense to say they should have equal weight in our assessment. But even among high quality polls, they can get it wrong. They all have margins of error, and they all have that 1 in 20 chance of being wrong outside of the margin of error. So by averaging polls and looking at the average rather than placing too much weight on any one poll, we get a better picture of the overall race, because the effect of outlier polls is smoothed.



Look at the quantity and diversity of polling

Compare polls in Wisconsin in 2016 to 2020 and there's a massive difference. In 2016, Wisconsin was not perceived as competitive. So polling companies didn't invest money in polling it very much. So the polling average comes from about thirty polls conducted by about five or six pollsters over the course of the campaign. In 2020, there are, I would guess, over a hundred polls conducted by a much wider range of pollsters. That diversity and extent of polling suggests that the averages should be more reliable.





So with all of those limitations, are polls useless because there are so many caveats? No. They have considerable value. And when read with those caveats in mind can tell us lots about the state of the race. That is why campaigns, and not just media/political geeks, use them extensively. Because they tell them where to invest resources, what messages or decisions are working well or working badly etc.



I know there is a category of Trump supporter totally uninterested in any of this: the evidence doesn't support their claim that Trump is winning, and his messages are landing well. So the evidence has to be dismissed as wrong. Those posters have no interest in actually dissecting polls. But for others of us, who actually do want to follow the election over the coming few weeks, and want to understand what the polls tell us (and what they don't tell us), it might be worth a discussion about how polls actually work.