THIS CONTENT IS BROUGHT TO YOU BY NTNU Norwegian University of Science and Technology - read more

Why ChatGPT is bad at imitating people

Large language models like ChatGPT are useful for many things. But they are still not good enough to imitate the way humans talk.

Researcher Lucas Bietti finds that language models often mess things up for themselves. In doing so, they reveal that they are not human.
Published

It is easy to be impressed by artificial intelligence. Many people use large language models such as ChatGPT, Copilot, and Perplexity to help solve a variety of tasks – or simply for entertainment purposes.

But just how good are these large language models at pretending to be human?

Not very, according to recent research.

“Large language models speak differently than people do,” says researcher Lucas Bietti from NTNU's Department of Psychology.

Bietti co-authored a recently published research article

Tested several language models

The researchers tested the large language models ChatGPT-4, Claude Sonnet 3.5, Vicuna, and Wayfarer.

First, they compared transcripts of real phone conversations between humans with simulated conversations generated by the large language models.

Then they checked whether other people could tell the difference between the human phone conversations and the models' conversations.

For the most part, people aren't fooled – or at least not yet. So what are the language models doing wrong?

ChatGPT and other large language models are truly useful. But they're not entirely human just yet.

Too much imitation

When people talk to each other, there is a certain amount of imitation that goes on. We slightly adapt our words and the conversation according to the other person. But the imitation is usually quite subtle.

“Large language models are a bit too eager to imitate, and this exaggerated imitation is something that humans can pick up on,” says Bietti.

This is called exaggerated alignment.

But that's not all.

Incorrect use of filler words

Movies with bad scripts usually have conversations that sound artificial. In such cases, the scriptwriters have often forgotten that conversations don't only consist of the necessary content words. 

In normal conversations, most of us include small words called discourse markers.

These are words like ‘so,’ ‘well,’ ‘like,’ and ‘anyway’.

These words have a social function because they can signal interest, belonging, attitude, or meaning to the other person. They can also be used to structure the conversation.

Language models are still terrible at using these words.

“The large language models use these small words differently, and often incorrectly,” says Bietti.

This helps to expose them as non-human. But there's more.

Opening and closing features

When you start talking to someone, you probably don't get straight to the point. Instead, you might start by saying ‘hey’ or ‘so, how are you doing?’ or ‘oh, fancy seeing you here!’ 

People tend to engage in small talk before moving on to what they actually want to talk about.

This shift from introduction to business happens more or less automatically for humans, without being explicitly stated.

“This introduction and the shift to a new phase of the conversation are also difficult for large language models to imitate,” says Bietti.

The same goes for the end of the conversation. We usually don't end a conversation abruptly as soon as the information has been conveyed to the other person. Instead, we often end the conversation with phrases like ‘talk to you later’ or ‘see you soon.’

Language models don't quite manage that part either.

Better in the future? Probably

Altogether, these features cause so much trouble for the large language models that the conclusion is clear:

“Today’s language models are not yet able to imitate humans well enough to consistently fool us,” says Bietti.

Developments in this field are now progressing so rapidly that large language models will most likely be able to do this quite soon – at least if we want them to. 

Or will they?

“Improvements in large language models will most likely manage to narrow the gap between human conversations and artificial ones, but key differences will probably remain,” says Bietti.

For the time being, large language models are still not human-like enough to fool us. At least not every time.

Reference:

Mayor et al. Can Large Language Models Simulate Spoken Human Conversations?Cognitive Science, 2025. DOI: 10.1111/cogs.70106

Powered by Labrador CMS