Data crunching is only as good as the data you use.
If Barack Obama’s 2012 presidential victory proved big data’s triumph for accurately predicting elections, Donald Trump’s 2016 presidential win could demonstrate the opposite.
Prior to Trump’s upset win, virtually all national polls showed the businessman and reality television star trailing Democratic nominee Hillary Clinton. Her win was considered inevitable, with prominent pollsters and pundits merely arguing
about how big her guaranteed victory would be.
And then on Tuesday, voters proved the experts wrong.
But before you lose faith in statistics, data analysis, and basic math, it’s important to realize a fundamental truth about crunching numbers. The results are only as good as the data that is used.
In computer science, there’s a saying, “garbage in, garbage out,” that highlights the dangers of bad data. This is why behind every landmark achievement by companies like Facebook and Google in training computers to recognize objects and understanding language, is meticulously organized data.
These companies, clean, add context to, and refine the data they use to feed their algorithms to help computers better recognize cats in photos, for example. Polling data, on the other hand, is quite a different animal.
According to the The Washington Post, Clinton’s campaign used a custom algorithm called Ada that staff fed “a raft of polling numbers, public and private” to help Clinton’s team decide where they should dedicate their resources. But while Ada could help the Clinton campaign best determine when and where to trot out pop stars Jay Z and Beyoncé to campaign rallies, it apparently overlooked “the power of rural voters in Rust Belt states,” the report said.
Trump’s campaign, as The New York Times reported a few days prior to the election, seemed to have relied on much more primitive methods for determining where best to concentrate resources. As the Time’s report describes, they seemed to base their decisions from the emotions of crowds attending Trump’s rollicking campaign events.
In Pennsylvania, which polls projected Clinton to win, Trump’s digital director was reported to have felt optimistic because, as he put it: “You can almost slice the excitement with a knife. You can feel it in the air there.” Trump ended up winning the Keystone State by a thin margin.
At its core, accurate polling data depends on whether the person being questioned is truthful. We’re left to speculate, but it’s worth noting that many Trump supporters greatly distrust the national media that often helps conduct these polls, and even more of them are deeply suspicious of Clinton.
Would Trump supporters in Michigan lie to pollsters about who they supported for president? Trump’s top pollsters, Adam Geller, did recently cite so-called “undercover Trump voters” as one of the reasons his campaign won.
Another possible problem was with Clinton supporters overstating how likely they were to vote. In the end, turnout was lower for Clinton in certain key areas compared with Obama in prior elections.
Analysts have long debunked the notion of voters who don’t reveal to pollsters their true intentions. But Trump’s rise to president-elect is sure to cause people to rethink everything they thought to be true. And that notion of what makes “data” accurate is what makes reliable data crunching tough for pollsters and politicians.
If a picture has been labeled as a cat, it’s probably correct to say that the picture is indeed of a cat. The truth of accurate polling data, however, is far murkier.