THIS CONTENT IS BROUGHT TO YOU BY University of Oslo - read more
AI makes fake news more credible
Fake news generated by AI is often perceived as more credible than texts written by humans. That worries linguist Silje Susanne Alvestad.
In 2017, ‘fake news’ was chosen as the new word of the year by the Language Council of Norway.
But what are the linguistic features of fake news? And can fake news be uncovered based on linguistic characteristics?
This is what linguist Silje Susanne Alvestad has taken a closer look at. She and her research colleagues have studied the language of fake news in English, Russian, and Norwegian.
Lied in the present tense, told the truth in past tense
The Fakespeak project is based in part on research from the University of Birmingham. There, they examined articles by the former New York Times journalist Jayson Blair.
He lost his job in 2003 after it was revealed that he had written fake news.
“The researchers compared his true and false articles to see whether they could find differences. An interesting finding was that he predominantly wrote in the present tense when he was lying, and in the past tense when he was writing genuine news,” says Alvestad.
They also found differences in the use of pronouns. Furthermore, the genuine articles had a higher average word length.
More informal style in untrue articles
The fabricated texts had a more conversational and informal style.
They found extensive use of so-called emphatic expressions in the fabricated texts, such as ‘truly,’ ‘really,’ and ‘most.’
Alvestad and her colleagues have compared the Blair texts with similar collections in which one and the same person has written both genuine and fake news.
They see that the linguistic features of fake news vary according to the writer’s motivation for deceiving readers.
“Blair says in his autobiography that his motivation was primarily money. We found, for example, that his fabricated news contained few metaphors. When the motivation is ideological, on the other hand, more metaphors are used, often from domains such as sport and war,” says Alvestad.
More categorical in fake news
Another key finding is that fake news can have a more categorical tone.
The researchers have examined stance – the ways in which the writer expresses attitudes, perceptions, and thoughts.
“In fake news, the writer often gives the impression of being absolutely certain that what's being reported is true. This is called epistemic certainty. There's an overrepresentation of expressions of such certainty, for example ‘obviously,’ ‘evidently,’ ‘as a matter of fact,’ and so on," says Alvestad.
This tendency is stronger in Russian than in English.
“We asked ourselves whether there's a universal language of fake news. We concluded that there isn’t. The linguistic features of fake news vary within individual languages and between languages. They depend on context and culture,” she says.
Created a fact-checking tool
This makes it even more challenging to develop fact-checking tools for fake news based on linguistic features.
This was something the researchers aimed to do together with computer scientists from the research institute SINTEF. They have nevertheless managed to build a fact-checking tool. It can be tested on SINTEF’s website.
“From a linguistic perspective, we have been critical of the fact that the definition of fake news has, in practice, encompassed too many genres. That means you don’t really know what the differences between fake and genuine news are due to," says Alvestad.
She adds that good and balanced datasets are needed to develop robust fact-checking tools. We also need a targeted and sophisticated linguistic approach.
Disinformation from AI
While the researchers have been working on this project, developments in artificial intelligence have accelerated rapidly. The landscape of fake news has changed.
That laid the groundwork for a new project, which focuses on identifying AI-generated disinformation. Alvestad and the other researchers use material from Fakespeak to find linguistic features of AI-generated disinformation.
“Purely fabricated news, which there was quite a lot of six or seven years ago, may not have that much influence. A bigger problem is fake news that can be a mix of true and false,” says Alvestad..
In the new project, NxtGenFake, they move away from the term fake news and instead talk about disinformation.
“Some of the information is true, but the whole truth is not included. It's sharpened, often placed in the wrong context, and frequently overlaps with propaganda," says the researcher.
This mixture makes it easier for it to slip under the radar of online mechanisms meant to verify content, meaning disinformation is especially challenging to uncover, she explains.
Less variation in AI
The new project will run until 2029, but the researchers already have some findings.
One example is that there is less variation in the use of persuasive techniques in AI-generated propaganda, compared with propaganda written by humans.
There are two types of techniques that stand out in the AI-generated texts, says Alvestad. One is what we call Appeal to Authority, which refers to where the information is taken from.
"We notice that these references are generic, meaning they typically appear in the indefinite form. It might, for example, say ‘according to researchers’ or ‘experts believe’," she says.
Appeals to values
"Large language models likely make such moves because they have no relationship to the world and do not know what's true and what's not. In this way, the claims become very difficult, if not impossible, to verify,” says Alvestad.
The second technique is that AI-generated news with elements of propaganda end differently than propaganda produced by humans.
"They often end with formulations the researchers call Appeal to Values. Here the argument is that something must be done to ensure, for example, increased growth, greater fairness, or greater public trust," she says.
Compared disinformation from AI and humans
How do people respond to AI-generated disinformation compared with disinformation written by humans?
The researchers conducted a test with Americans who were asked to rate AI-generated texts and texts written by humans on three parameters: credibility, emotional appeal, and how informative they are.
The people who were tested did not know where the texts came from.
The AI-generated disinformation was rated as both more credible and more informative than disinformation written by humans.
The researchers also asked which of the excerpts the respondents would prefer to continue reading. Significantly more people said they would continue reading the AI-generated texts.
Preferred AI-generated disinformation
The research team were not surprised that the respondents preferred the AI-generated texts.
"But I was personally a little surprised that the AI-generated texts did not score highly on emotional appeal. Instead, they were perceived as both more informative and more credible than texts written by humans,” says Alvestad.
This suggests that AI-generated disinformation may be harder to detect. Large language models can embed misinformation and disinformation within formats we trust by default.
Alvestad believes it's important that we are aware of this.
“I hope the results can help raise awareness of the risks associated with large language models, especially at a time when such tools are increasingly being adopted,” she says.
This content is paid for and presented by the University of Oslo
This content is created by the University of Oslo's communication staff, who use this platform to communicate science and share results from research with the public. The University of Oslo is one of more than 80 owners of ScienceNorway.no. Read more here.
More content from the University of Oslo:
-
What do our brains learn from surprises?
-
"A photograph is not automatically either true or false. It's a rhetorical device"
-
Queer opera singers: “I was too feminine, too ‘gay.’ I heard that on opera stages in both Asia and Europe”
-
Putin’s dream of the perfect family
-
How international standards are transforming the world
-
A researcher has listened to 480 versions of Hitler's favourite music. This is what he found