AI Tool Predicts Whether Online Health Misinformation Will Cause Real-World Harm

A new AI-based analytical technique reveals that specific language phrasing in Reddit misinformation posts foretold people rejecting COVID vaccinations

By Joanna Thompson

Photo illustration, interface of 'Reddit' is being displayed on a mobile phone screen in front of a red background — Ahmet Serdar Eser/Anadolu via Getty Images

The flood of misinformation online inevitably produces adverse consequences in key measures of public health—and death from COVID among unvaccinated people stands out as perhaps the most prominent example. That cause-and-effect relationship—that scrolling through endless postings about hydroxyquinoline, ivermectin and vaccine conspiracies can lead people astray—seems more than obvious. But it isn’t straightforward to determine scientifically.

Clear linkages between misinformation and adverse consequences have proved very difficult to find—partly because of the complexity of analyzing the workings of a public health system and partly because most social media companies do not usually allow independent outside parties to analyze their data. One exception to keeping data off-limits is Reddit, a platform that has begun to emerge as a place where, with the company’s blessing, social media research can flourish. Now studies using Reddit posts may be putting scientists closer to finding the misinformation missing link.

A new analytical framework that combines elements of social psychology with the computational power of a large language model (LLM) could help bridge the gap between online rhetoric and real-world behavior. These results were recently posted to the preprint server arXiv.org and presented at the Association of Computing Machinery CHI Conference on Human Factors in Computing Systems in Hawaii this week.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Eugenia Rho, a computer scientist at Virginia Tech and senior author of the new study, wanted to pin down whether a tie exists between people’s behavior and the type of language encountered on a site such as Reddit.

Along with her Ph.D. student Xiaohan Ding and their colleagues, Rho began her research by first tracking down thousands of Reddit posts from banned forums opposing vaccines and COVID prevention measures. Next, the team trained an LLM to recognize the “gist” of each post—the message’s underlying meaning, as opposed to the literal words it is composed of. “That’s sort of the secret sauce here,” says Valerie Reyna, a psychologist at Cornell University and a co-author of the study.

“Fuzzy-trace theory” suggests that people pay more attention to the implications of a piece of information than to its literal meaning. This helps explain why people are more likely to remember an anecdote about someone getting robbed than a dry statistic about crime rates or why gamblers are more apt to place a bet when folding is framed as possibly losing them money rather than potentially gaining it. “People are more moved by certain kinds of messages than others,” says Reyna, who helped pioneer fuzzy-trace theory in the 1990s.

This careful choice of wording enhances persuasiveness. “Over and over and over, studies show that language in the form of a gist is stickier,” Rho says. Her team’s analysis found that in the context of social media, this seems to be especially true for causal gists, or information that implies a direct link between two events. A post might link vaccinations with getting sick in a specific format, using wording that packs a lot of rhetorical punch. For example, one Reddit user posted, “Had my Pfizer jab last [Wednesday] and have felt like death since.” Rho’s team found that every time the causal gists in anti-COVID posts grew stronger, COVID hospitalizations and deaths spiked nationwide, even when the Reddit forums were subsequently banned. The researchers pulled their data from nearly 80,000 posts spanning 20 subreddits active between May 2020 and October 2021.

By using this newly developed framework to monitor social media activity, scientists might be able to anticipate the real-world health outcomes of future pandemics—or even other major events, such as elections. “In principle, it can be applied to any context in which decisions are made,” Reyna says.

But such a framework might not make equally good predictions across the board. “When there is no discernible gist, the approach might be less successful,” says Christopher Wolfe, a cognitive psychologist at Miami University in Ohio, who was not involved in the study. This could be the case for studying the behavior of people seeking treatment for common health issues, such as breast cancer, or trying to view sporadic ephemeral events, such as auroras.

And the approach doesn’t necessarily distinguish what specific type of cause-and-effect relationship exists. “It seems that gists from social media may predict health decisions and outcomes, but the reverse is true as well,” says Rebecca Weldon, a cognitive psychologist at SUNY Polytechnic Institute, who did not contribute to the new research. Rather it suggests that the relationship between social media rhetoric and real-world behavior may be more of a feedback loop, with each one strengthening and reinforcing the other.

Both Wolfe and Weldon praised the authors for their innovative analytical approach. Wolfe calls the framework a potential “game changer” for helping navigate complex information ecosystems online. And Rho’s team hopes that it can help large social media companies and public health officials come together to develop more effective strategies for moderating content. After all, being able to identify the type of misinformation most able to influence people’s behavior is the first step toward being able to combat it.