LLMs Are Average Machines
An LLM does not have an opinion, it has an average.
When you ask a model to write a story, it does not reach for the strange choice or the risky one.
It reaches for the most probable one, the centre of the distribution…the safe middle of everything it has ever read.
This is easy to say and hard to prove…but a new study called StoryScope proves it.
Humans are rare. Machines are central.
Background
Something I realised from this study is that unlike AI writing, human writers are comfortable with introducing ambiguityinto their stories.
Also flashbacks, time that folds back on itself.
Think of a Christopher Nolan movie, his work is particularly good at bending time and keeping the viewer guessing where they find themselves in the story.
A human generated mystery opens at the funeral and spirals back and forth through decades.
The human introduced elements adds a particular pleasure to reading, you as the reader is not sure if some insight you gleaned was intended by author or not. But this is what brings you amusement and pleasure from reading.
There is also the element of allowing the reader to draw their own conclusions as a reader…something which is absent in AI writing.
The machine typically starts at the beginning and ends at the end.
The setup
Researchers from the University of Maryland and Google DeepMind took 10,272 human written short stories. For each one they reverse engineered a writing prompt, then handed that prompt to five models.
Claude, DeepSeek, Gemini, GPT and Kimi.
Six versions of every story were created…one human, five machines.
61,608 stories in total, each around 5,000 words.
Then they did something most detection work avoids, they threw away the style.
Why style is the wrong place to look
Most AI detectors hunt for surface tells, for words like delve or tapestry and the em dash…or the tidy three part rhythm of a machine sentence.
This works until it does not.
GPT 5.4 quietly cut its em dash habit, and when you fine tune a model on real authors, creative writing detection collapses from 97 percent to 3 percent.
So style sits on the surface and style can be manipulated, LLMs also adjust their style of writing over time. Tell tale of AI generated content now will not be down the road.
Forget how the story sounds & look at how it was conceived
Plot structure, character agency, time, revelation and the decisions underneath the prose are elements to consider and measure. You can think of this as one level deeper.
They built 304 of these narrative features and trained simple classifiers on them.
The machine gives itself away
Narrative choices alone separated human from AI at 93.2%, almost the same as models that also read the style.
You cannot edit your way out of this…they ran the AI stories through a professional grade rewriting tool, detection dropped by 1.6 points. Because the clues were never the words but the thinking.
What an average machine actually does
AI over explains
The AI narrator states the moral of the story 77% of the time while humans only 52% of the time. So the machine does not trust you to understand, so it tells you what to feel.
A big part of human writing and joke telling is initiative and involves insinuation (not in a bad way) and innuendoes which brings the reader pleasure when detected and picked-up on.
The reader is not 100% certain that it was the intention of the writer, but the reader derives a certain level of personal pleasure from reading.
AI tidies
AI favours single track plots…one thread, pulled straight from first clue to grand reveal…79%of AI stories have no subplots and resolutions are driven by the protagonist 69 %of the time.
So everything resolves and nothing dangles at the end.
AI avoids the open ending
Humans are comfortable with ambiguity, flashbacks, time that folds back on itself. Again, I always think of Christopher Nolan movies, his work is particularly good at this.
A human mystery opens at the funeral and spirals backward through decades. The machine starts at the beginning and ends at the end.
AI performs feeling instead of naming it
I think this part is particularly interesting, AI renders fear as a tightening chest, cold sweat, a dimming lamp. 81% of the time emotion arrives through the body.
Humans just write he felt afraid…the machine is always showing, never trusting the reader to perceive the message and story.
It writes as if no one is watching. Humans break the fourth wall, talk to you, name real books and real places. The machine avoids the real world and avoids the reader. It performs into an empty room, this is something I find particularly interesting…I have heard the question often asked, who are you writing for?
None of these are style, but more a judgement, and the machine keeps making the same judgement.
AI all make the same choices, or closely related
The five AI systems cluster together in narrative space, rather tightly. They are nearer to each other than any of them is to a human.
AI converges
Five different labs, five different training runs, and they arrived at the same narrow region of storytelling. Human stories sit outside that region and scatter widely. A human story was ranked the rarest of all six versions 57.8% of the time.
Humans are rare. Machines are central.
Each model still has a tic
They are average, but they are not identical, each leaves a fingerprint.
Claude keeps it cool
The most distinctive of the five…defined by restraint. Event intensity escalates less than any other source, it honours literary tradition rather than subverting it, it favours epilogues. Avoids dream sequences, quiet endings over avalanche endings.
GPT likes to gossip
Rumour as a plot engine. Stories framed as reflections on events from decades ago. The most willing to subvert expectations.
Gemini writes the tidiest endings and the bleakest settings
88 percent tagged bleak and oppressive.
DeepSeek front loads the crucial context everyone else withholds
Kimi has the fewest fingerprints of all
It sits at the dead centre of the machine distribution with no distinctive choices to speak of. The average of the averages.
Conclusion
We keep asking whether AI writing is good it seems like that is the wrong question.
The right question is whether AI writing is conceived, whether there is a point of view underneath it that chose this story over every other possible story.
StoryScope (the study) says there is not and that the model is not opinionated.
But rather a convergence engine that finds the safe centre of human narrative and stays there, because the safe centre is what probability rewards.
Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.
StoryScope: Investigating idiosyncrasies in AI fiction
As AI-generated fiction becomes increasingly prevalent, questions of authorship and originality are becoming central to…arxiv.org
COBUS GREYLING
Where AI Meets Language | Language Models, AI Agents, Agentic Applications, Development Frameworks & Data-Centric…www.cobusgreyling.com





Hi Cobus. Highlighting the contrast between fragile vocabulary like the word "delve" and deeper structural choices like Nolan-style time jumps creates a sharp lens for evaluation. The StoryScope decision to discard surface patterns and focus on character agency sets a new standard for AI detection. We battle this pull toward the safe middle distribution every day at projecktai.substack.com. What structural prompts or training adjustments will force a model to abandon chronological safety and embrace deliberate narrative ambiguity?