Does Brexit Herald A New Era For Big Data-Driven Forecasting?

As the world woke up Friday morning to the news that Great Britain had voted to leave the EU (the “Brexit” option), it also marked a stark failure of all the modern trappings of data-driven forecasting: opinion polling, prediction markets and even “superforecasters” - nearly all of which offered overwhelming odds that the UK would Remain. Even the forecasting luminaries appear to have been caught off guard. FiveThirtyEight called the results “a surprise,” while PredictWise asked “where did prediction markets go wrong” and a myriad blog posts were written pondering the fate and future of prediction. As the myriad of postmortems have come rolling in over the last few days, what can we learn about the future of using data for forecasting?

Public opinion polls have become increasingly unreliable in forecasting the outcomes of major elections. In the case of Brexit, such polls tended to be sharply divided, with telephone polls showing a large margin of victory for Remain, while online polls showed a closer finish, some with a slight advantage to Leave. Bloomberg’s poll compilation going into the day showed Remain with a 2% lead, while it placed the likelihood of Brexit at just 25%. YouGov’s day-of poll showing Remain at 52% to 48%.

Thus, the old fashioned approach of just walking up to people on the street or ringing them up at home and asking them their thoughts doesn’t seem to be working liked it used to. What is intriguing is that at least some of the online polls seem to have done a markedly better job of estimating the actual outcome. This is even more intriguing given the similar stark difference between telephone and online polls in the US presidential race.

In contrast to polls, where there is no penalty for a wrong response, prediction markets have emerged as a growing alternative, the idea being that when actual money is at stake, participants will invest in outcomes more faithful to what they actually believe will happen, rather than what they would like to happen. Yet, as with polls, betting markets got things quite wrong. Betfair put the odds of Remain at 88%, while Ladbrokes had 90% as the polls closed. Even PredictWise offered 75% odds of Remain.

Financial markets were not far off, with most indicators pointing to Remain, with the pound gaining and fear index calming. Currency trading suggested 90% probability of Remain. This suggests that it was not a small number of bettors massively skewing prediction market outcomes, but rather a more endemic process at work.

In lieu of the general wisdom of the crowd, so-called “superforecasters” are a small cadre of individuals known for high accuracy forecasts of major societal events. Bloomberg notes that by mid-May they were offering 77% probability of Remain, which was still 72% just a week before voting. As one of the superforecasters predicted, “I would say ‘expect the status quo’ reasonably confidently … Most of the time, independence campaigns do not succeed.” Of course, that is until they do.

Intriguingly, in his postmortem, PredictWise’s David Rothschild pondered whether “traders do not have the pulse of working masses” and thus reflected a fundamentally different worldview from that of the average voter. Bloomberg echoed this sentiment, arguing that experts “have become more removed from the rest” and missing the issues that are proving most important to everyday voters. This raises critical questions about the kinds of biases that may be creeping into expert and market-driven forecasts.

Yet, the online world offered many hints of a provocative outcome. On social media, Twitter showed a nearly two-to-one ratio (66% to 33%) of Leave to Remain tweets in the leadup to the referendum, while in terms of the number of unique user accounts discussing each, the ratio was much closer: 53.8% to 46.2%. Of course, it is impossible to know whether those tweets were for Leaving or against it, since volume only demonstrates interest, not support.

That last point cannot be emphasized enough. Discussion does not imply support. In the Iowa caucuses, Sanders led Clinton in Facebook mentions by 73% to 25%, while actual voting had them nearly tied, while in 2012 Twitter showed Obama dominating the Southern states ultimately won by Romney. Most recently, Facebook showed Sanders beating Clinton by a landslide in Facebook discussion, though it did also show Trump leading on the Republican side. Of course, social media data is also becoming increasingly difficult to access as a data source.

Web searches are increasingly being used as a metric to understand society. Google Trends published a map looking at searches across the UK in the first week of June, showing that Leave dominated searches across the entire country outside of a handful of pockets. Even Scotland was overwhelmingly searching about Leave. In reality, the final voting results looked quite different. As with social media conversation, heavy search interest simply implies that people are intensely interested in the topic, not that they support or condemn it.

Interestingly, the timeline of search intensity for the two terms within the UK offers a slightly different picture. UK searchers were searching for Remain and Leave nearly neck and neck up until the morning the polls opened, at which point Remain climbed to 8% more than Leave. Yet, around 4:30PM local time, Leave suddenly surged to 15% greater and by 8:30PM local time Leave was 59% ahead and by 10:30 it was 79% ahead, before beginning to head back down.

Before one uncorks the champaign, as with social conversation, there are caveats to this. Search data did not show Leave pulling ahead until polling was nearly over, meaning it was not a useful early indicator of outcome. More importantly, the candidate or outcome with the most intense searching is not always the winner. In 2012, Ron Paul received twice the search interest as Mitt Romney, while in Egypt presidential candidate Amr Moussa dominated domestic Egyptian Arabic searches, even though Muhammad Morsi eventually won.

Putting this all together, what can we learn from Brexit about the state of the big data world of forecasting? Perhaps most importantly, Brexit defied all the traditional data sources, from opinion polls to prediction markets to superforecasters. Yet, online indicators offered a strong hint of the outcome, with some online opinion polls offering odds not that far from reality and social media favoring Leave, while Google’s search volume timeline in the week prior to voting was relatively close, even if the geography of those searches was not. At the same time, social media volume and search interest have also favored losing candidates many times in the past.

Perhaps the biggest takeaway is that when it comes to “real world” forecasting (predicting close outcomes, rather than whether Putin or Assad will win reelection or whether France will undergo a military coup in the coming week) it can be quite difficult to get the right result. Here, nearly every tool in the forecasting toolbox got things wrong. Yet, the suggestions of social and search that Brexit would win are highly intriguing and fall into a growing literature suggesting that new kinds of “big data” measurements may offer powerful benefits over traditional forecasting metrics. At the very least, we can say that Brexit offers a powerful glimpse into how new forecasting approaches are offering provocative glimpses into our “big data” future.

More From Forbes

Does Brexit Herald A New Era For Big Data-Driven Forecasting?