Project Aims to Unmask Disinformation Bots

Threat Intelligence

BotSight, a machine learning research project, rates Twitter users based on the likelihood that there is a human behind the keyboard. Could such technology blunt the impact of disinformation campaigns?

Robert Lemos, Contributing Writer

May 14, 2020

5 Min Read

Aiming to combat disinformation on social media, a research team published plug-ins for major Web browsers on Thursday that give users a score as to the likelihood that a Twitter handle is a bot or a human.

Dubbed BotSight, the project is the brainchild of machine learning researchers at security firm NortonLifeLock, formerly a subsidiary of Symantec, which aims to help users determine which accounts are valid and which are not. The tool uses 4TB of data collected by the researchers over the past six months and looks at 20 features — from the randomness of the Twitter handle to the rate of follower acquisition — to classify the handle as a bot or human.

Through the project, the researchers hope to help people understand when they might encounter bots, says Daniel Kats, principal researcher with the NortonLifeLock Research Group (NRG).

"People have a conception that bots exist," he says. "But ... when you go on your timeline and you look around — people don't have a sense of where they are most likely to find bots. We want to give people that intuitive sense of where bots live and what they are."

The tool is the latest attempt to apply machine learning and artificial intelligence techniques to the thorny problem of the often-viral propagation of false information on social networks. While Facebook, Twitter, and other social networks have their own technology for flagging posts, tweets, and messages for further scrutiny, giving users their own ability to evaluate messages is important, says Chris Meserole, the deputy director of the Artificial Intelligence and Emerging Technology Initiative at the Brookings Institution in Washington, DC.

"The more users of Twitter are aware that bot networks exist, the more aware they can be of whether the accounts they are interacting with are human or bot, the more they will understand how trustworthy that the information they encounter might be," he says. "It also provides a service by giving users tools to evaluate the claims of public figures who are retweeting misinformation from bot accounts."

The BotSight plug-ins insert a human or bot icon next to each Twitter handle, along with color indicating whether the algorithm believes the account to be human (green) or bot (red), as well a measure of confidence. (BotSight has classified this reporter — perhaps, not inaccurately — as only 88% human.)

A beta feature allows users to also see some of the criteria — currently, somewhat cryptic — for the machine learning model's determination.

The technology uses 20 different attributes and metadata from tweets and user accounts to determine whether the user is more likely to be human or a bot. While the technology is focused on detecting bots that retweet and amplify certain messages and disinformation, the system can also detect certain types of human-operated disinformation networks, such as sock puppets, Kats says.

"You have these central accounts that collect a lot of followers — they are influencers in the network," he says. "And then you have a bigger network of amplifier accounts. The larger network, the amplification network, is what BotSight specifically targets. However, it is also capable of capturing some of the influencers accounts."

Bot networks on Twitter are responsible for amplifying the information of specific "influencer" accounts. Often, certain hashtags are included to push a certain political issue. During the current epidemic, Twitter has been really careful on finding and deleting bad information on symptoms and treatments for COVID-19, but the company has had less of a focused approach to bot activity on the platform in general.

"There is a background radiation of, oh, around 6% of bot activity in general on Twitter," Kats says. "But when you look at certain trending hashtags, it can go up to 20% when you look at some hot-button issues, but it really depends on the issue."

However, identifying Twitter bots is not the same as identifying the sources of disinformation, since bots typically only relay or amplify information. Bot Sentinel, a project that attempts to identify untrustworthy Twitter accounts and "trollbots" using a machine learning model built from some 5 million tweets.

Such tools can help users better gauge what information online can be trusted, says the Brookings Institution's Meserole.

"Right now, everything we are doing is dancing around the margins and putting Band-Aids on things," he says. "The question is how do we insure the credibility of our political discourse. For that, these types of tools can be helpful."

NortonLifeLock researchers intend to add more features to the tool, but trying to categorize the content of the post based on machine learning analysis is not a currently a focus, Kats says.

"We don't want to block people from accessing certain content," he says. "We want to provide people with enough information to help them make wiser decisions for themselves."

About the Author(s)

Robert Lemos, Contributing Writer

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline Journalism (Online) in 2003 for coverage of the Blaster worm. Crunches numbers on various trends using Python and R. Recent reports include analyses of the shortage in cybersecurity workers and annual vulnerability trends.

Related Topics

Related Topics

Related Topics

Related Topics

About the Author(s)

Editor's Choice