Can social media predict election outcomes? Youve seen this headline before, maybe here, here, here, or here. Though the question remains unanswered, many have tried to predict the outcome of GOP primaries using sentiment analysis of social media mentions. In the wake of Super Tuesday, lets see how some of these social predictions compare to the primary results.
On March 1, media anlaytics firm OhMyGov Inc. released a sentiment analysis of 500,000 social media mentions, gathered between February 22 and February 28. They found Mitt Romney to be the uncontested leader with slightly more than 40% of his social mentions categorized as positive. Breaking it down by state, they found Romney leading in positive mentions in 6 out of the 10 Super Tuesday states: Georgia, Idaho, Massachusetts, North Dakota, Ohio, and Virginia. Ron Paul took the lead in Oklahoma, while Newt Gingrich took the lead in Alaska, Tennessee, and Viriginia. Rick Santorum did not lead in positive mentions in any Super Tuesday state.
Another media tracking company, SocialMatica, provided a prediction for the Super Tuesday winners by utilizing data from Facebook, Twitter, LinkedIn, online forums and blogs. Their results varied from OhMyGovs, but they too had Mitt Romney winning the lions share of Tuesdays delegates. According to SocialMaticas data, 7 of the 10 states in play would go to RomneyAlaska, Massachusetts, Ohio, Oklahoma, Tennessee, Vermont, and Virginia. They predicted that Gingrich would win in his home state of Georgia, that Paul would take North Dakota and Idaho, and that Santorum would fail to secure a single state.
In a social media survey conducted for USA Today, data software company Attensity also predicted that Romney would win 7 states, though the states in question differed. They predicted Romney would take Idaho, Ohio, Oklahoma, Massachusetts, North Dakota, Tennessee, and Virginia. Gingrich would win in Georgia, Paul in Alaska and Santorum in Vermont.
Now that the votes have been counted, lets see how accurate the predictions were. Blue indicates a failed prediction:
As predicted, Romney was the big winner on Super Tuesday. However, when it comes to where he won, there were significant discrepancies.
Alongside their presentation of OhMyGovs data, the Huffington Post offers the following caveat
Although public sentiment on social media is a useful indicator of each campaign’s success in branding their candidate positively and for assessing how affected each candidate is by mudslinging and political advertising, there are issues with using public sentiment as a measure for predicting election outcomes.
USA Today presents a similar point, quoting Carnegie Mellon Computer Science professor, Noah Smith: It’s a fascinating area of research, but it’s not yet mature.
Maturity may not be the only issue. Tweeting about a candidate is significantly easier than heading to a poll to vote for him, and many social media users may not even be eligible to vote for the candidate in question. As researchers are unlikely to be able to divine whether a tweeter is a registered Republican (which is mandatory in many contests) or even a registered voter. Even if every mention were made by a registered and eligible voter, there is little evidence that there is any correlation between the people who discuss a candidate online and the people who cast ballotsmaking social media metrics a woefully inaccurate measurement of potential electoral success.
This question is particularly pertinent to this primary, as voter turnout has been shockingly low, significantly lower than in the 2008 primaries. At this point in 2008, 5.5 million people had voted in GOP primaries, while this year only 4.6 million people have votedalmost one million more people had cast primary ballots than have done so in this years contest.
There is one other glaring flaw in these Super Tuesday social media predictions, hinted at by Forrester analyst Zach Hofer-Shall. Social media, he says, is not perfectly projectable to the whole overall population.
That is certainly true. Twitter users, for example, are younger and more likely to be a minority than the average American. A given Twitter user is almost twice as likely to be African-American or between the ages of 18 to 24 as an Internet user chosen at random. African-American and Hispanic users are also significantly more likely to use the platform daily than Caucasian users.
Compare that to the average GOP primary voter, who looks even less like the average American.
In a February column entitled, “The Electoral Wasteland,” New York Times columnist Timothy Egan said this about the GOP primary voters:
[W]hen you look at the numbers, its stunning how little this Republican primary electorate resembles the rest of the United States. They are much closer to the population of 1890 than of 2012.
Of those who voted in the South Carolina primary, 98% were white (the state is only 66% white) and 72% were 45 or older (the states median age is only 36). In Florida, 78% of primary voters were 45 or older. To compare, in 2008 only 59% of voters were 45 and over. A mere 5% of Nevada primary voters were Latino (the states population is 26% Latino), and a whopping 99% of Caucus voters in Iowa were white.
The primary voters are significantly older and whiter than the average American, while Twitter users are slightly younger and more likely to be a minority. Back in January, another sentiment analysis was conducted proclaiming Ron Paul winner of a hypothetical Twitter-based primary. So, why has Ron Paul failed to win so many actualprimaries? Twitter skews young, as do Ron Paul supporters. Exit polls thus far have told us that Ron Paul frequently sweeps the youth vote, and his performance on Super Tuesday was no exception. It should come as no surprise then, that his Twitter success doesnt translate to primary wins. The young people who support him online are not voting in the primaries.
No matter how sophisticated the algorithm, factors such as influence and context are insufficient to predict actual voter turnout and sentiment analysis is woefully poor predictor of election success. Any similarities between mention-based predictions and real results are probably little more than coincidence.