Can algorithms automatically determine the trustworthiness of news stories?

The Internet is full of fake news stories. Tons of content containing false information spreads across news websites and social media every day, whether for entertainment purposes, propaganda or just plain trolling. And with news being distributed across many channels (online, TV, paper, social media), the snowball effect of this toxic information avalanche is not just affecting trust in news agencies and newspapers, but impacts society and the public’s view on key issues.But is there a way to “detoxify” information and automatically flag each news story as being “true” or “fake”?

According to the Reuters Institute for the Study of Journalism, the issue of trust in news, combined with the huge amount of information produced every day at high velocity, has such an impact that “[…] in the face of an expanding universe of information, every day increasingly feels like April Fool’s Day”. The 2016 Digital News Reportsheds a grim image of how trust in news is now a problem that severely affects news brands, with only 50% of Brits trusting traditional media outlets. In Greece, the trust lowers to 20%! And trust in newspapers, whether in their printed or digital form, influences the media industry as a whole (readers are buyers, buyers only buy from trusted brands), affects businesses and influences the public’s opinion on critical matters.

The tremendous volume of news items created every day in multiple languages and distributed across many channels makes schema-based fact checking or human editorial review absolutely useless. There is no way that any of these approaches can solve the issue of determining the trust of new stories developing in real-time, with impact on how fake news or propaganda spreads and influences readers. To back-up this claim, a very prominent example of a “journalism fail” comes to my mind: On Sept 24 2015, the German tabloid AUTO BILD published a story falsely claiming the BMW falsified the exhaust emissions data, same as Volkswagen, a story that was later retracted, but was picked up by 500,000 newspapers and blogs around the world (AUTO BILD exclusive: BMW diesel exhaust emissions exceed limits significantly). As a result, BMW stocks fell by almost 10%! Even if stocks have since reclaimed value, there is no knowing how many readers worldwide were led to believe (and still do!) that BMW falsified the emissions tests on their premium automobiles!

There is a critical need for determining the trustworthiness of news stories automatically, through specialized algorithms. I’m not saying that a computer program should determine what is true or not (“truth” is still a vastly complicated, philosophical and religious concept), but determining elements within a news item or a story that lay the foundation of trust. Concepts of critical reading and investigative journalism can be automated; mass data can be collected, structured, categorized and labeled. Articles can be linked between each other, do determine where information originated, how it built up into a story and how the content was altered or enhanced. Text Analytics algorithms can extract named entities, determine similarities and common citations, odd writing techniques, label the sentiment and put every single news article into perspective, understanding if it has those key elements that will brand it as being “trustworthy” or not. And most important, every article that builds on a story that originated from a news item that is not trustworthy, will be marked accordingly, so that the spread of false information can be contained and avoid from becoming viral.

TrustServista promises to do exactly that. Funded by Google through the DNI Innovation Fund, this unique software solution will be available as a prototype early 2017 and cater to news agencies and investigative journalists (for English only content, for starters). TrustServista will determine the trustworthiness of news items in a fully automated way and allow users to perform investigations on trending stories or individual articles, uncovering “the hidden part of the information iceberg”, as its tagline states.