LLMs and the Business of Fiction (bonus): How good are they really?
The WGA have won their strike action, and author Mark Lawrence ran an informal experiment to see how good ChatGPT really is at writing fiction. Here's my summary.
I wasn’t planning to come back to this topic quite so soon, but there’s been some interesting news in the last couple of weeks and I thought I should cover it while it was current. So let’s break it down.
WGA Win
The WGA have won their action against the AMPTP and the news outlets are crowing about how good the new contract is, which includes better pay, better profit share, better benefits and improved prospects for screenwriters in the American industry.
It also includes some provisions that prevent studios from using LLMs to eliminate writers or reduce their pay. Writers are not prohibited from using LLMs in their work, but they cannot be forced to use LLMs, either. Also, work created under the new contract cannot be used to train AIs without the writers’ consent. Of course we already know that work generated by an LLM cannot be copyrighted, which makes it less attractive for a studio to exploit.
This sounds good on paper, but I (again—not a lawyer) am sceptical that it will be so easy to implement. Say you are a studio who has trained an AI to tune some work into a particular style that you like. You can’t force a writer to use it… but that won’t stop you from selecting compliant writers.
Also, there is nothing to stop studios bringing in straw-man “writers” who will use LLMs to generate garbage first-draft screenplays, which can then be used to dilute pay from skilled writers who are then hired to get the work up to standard.
Finally… the screenplays used to train the current generation of LLMs were used without permission from the studios. I can’t see any reason the AI vendors would stop doing that because of a contract between the writers and the studios. Sure, the studios can’t use their writers’ new screenplays to fine-tune their own LLMs, but unless the studios are willing to enforce this against the AI companies, this does nothing to address the broader issue: the LLMs are trained with stolen data.
Meanwhile, the idea of using LLMs to write fiction isn’t going away. I have seen a renewed amount of advertising of products that say straight up that their AIs will turn your ideas into a bestseller. I also see products that will typeset the output of ChatGPT as an ebook, so you can publish said bestseller with a minimum of friction.
This is a fantasy, of course. Writing bestsellers is hard, and publishing a book is more than just formatting an ebook. But these companies aren’t advertising to people who have a realistic understanding of how these things work. Grifters gonna grift.
Just How Good Are LLMs at Writing Fiction?
I have said before that I don’t believe that LLMs can generate quality fiction. I’m less concerned about that than about what the deluge of garbage is going to do to author wages and the already diminished discoverability of work by human authors. But in the meantime, well-known fantasy author author Mark Lawrence has gone looking for evidence.
In addition to being a bestselling SFF author, Mark has a PHD in mathematics and has worked as an AI researcher. Last month, Mark conducted an experiment to try to gauge how good LLMs are at writing flash fiction. This is not a formal, peer-reviewed study, but Mark knows what he’s doing when it comes to stats. So let’s have a look.
In the experiment, Mark posted eight flash fiction stories about ‘meeting a dragon’ and asked readers to identify which they thought were written by a human, and which by an AI. Four of the stories were written by professional writers and four by ChatGPT.
I won’t rehash the results in detail here, but the short version is that readers correctly attributed 5 of the 8 stories. Two of the stories were incorrectly attributed—one each way—and the eighth story did not yield a statistically meaningful results. Mark breaks down the nuances in a lot more detail and I won’t repeat that here. Instead, I want to look at the AI stories and see what features are common there.
I expected the AI stories to have minimal dialogue, but some of them had just as much as the human-written stories. The dialogue was also generally coherent and mixed well through the action. None of the AI stories had good dialogue, but for the most part, what it does have a generic fantasy melodrama feel. One of the AI stories, which is in a whimsical tone, contains words like “blimey”. Not all writers are good at writing dialogue, but but in the eight stories, the only pieces that did have good dialogue were ones written by a human.
Thematically, most of the AI pieces have the dragons representing a metaphorical experience of wonder or magic, which I guess is a well used trope in dragon-related fiction. The other popular trope is dragons burninating the countryside, and the remaining AI piece (which was incorrectly identified as being human-authored) used this as a plot twist.
Style-wise, only one of the AI pieces really stood out. This was a piece which Mark prompted ChatGPT to render in an 19th century language. This was also the only piece with no dialogue in it. A statistically significant proportion of readers identified it as AI-written, however, which suggests that asking an AI to do a style does necessarily yield results a more convincing result. I’m sure LLM fine-tuned to a style would do a better job, but it wouldn’t address the other problems.
There is a vague lack of coherence in the more complex AI stories. Characters appear to deliver a line and then disappear. None of the characters feel real or show any kind of an arc. The AI piece that readers mistook for being authored by a human has was the most complex, in terms of action, but the story is very stilted and you need to squint a bit to follow it through.
It’s also worth noting that this is flash fiction, which, due to its short nature, gives ChatGPT its best shot to look good. The bigger a work, the less likely it is to be coherent. Flash is also a very constrictive format for human writers, and to make a flash fiction piece work we must often eschew style or grace in the doing. Flash fiction gives LLMs their best shot and humans their poorest. But I admit the LLM has done better than I expected.
What does this all mean?
Well. Even when an LLM produces a coherent story, it’s going to feel very generic. AIs know what stories look like, but even with your prompting, but they have no understanding of what a story is. Stories are complex systems that must arrange and communicate the interactions of characters with each other, with environments, and with history… and they must communicate it to the reader is a way that is dramatic. Writing is a story is more than just formulating some grammatically correct text. Writing is hard—ask any writer.
I have no doubt that LLMs will improve at these tasks, but remember, they are trained to generate convincing text. Although they can give the appearance of being smart, they have no ability to plan or to reason. They can’t do arithmetic. All they can do is try to adapt similar inference processes they may have seen elsewhere. If I was a writer looking for ways to compete I would be honing the skills that LLMs are never going to be good at.
LLMs do not understand character and cannot write good dialogue. They’re unable to choreograph an action scene or maintain continuity through a carefully designed environment. They can be fine-tuned to match individual authors’ styles, but they will never have a style of their own. It sounds like basic new-writer advice, but if you want to stand out, develop your own voice and style. Learn to write good dialogue. Most of all, craft original stories in which the plots are driven by, and essential to, character arcs. Many sectors of publishing try to homogenize writer voices, but I have never believed that is a winning gambit and now even less so. If you want to stand out from the noise you need to offer something new.
If you can write comedy, I think you’re way out in front. Jokes are the conjunction on words or ideas that don’t normally go together, but, when aligned, create a logic that surprises you. This is the opposite of how LLMs work—they find the words and phrases that are most likely to go together and if they surprise you it’s almost certainly due to an error. But even if they did know what was funny, they lack the planning capabilities to set up and deliver a punchline.
If you write erotica, well, you’re doing fine also, because most commercial LLMs have been constrained from emitting anything risque. The same applies for the more splattery forms of horror. Psychological horror, I expect, is uncontested ground.
The primary threat that LLMs offer to us is in terms of volume. It’s already difficult to discover good new books because of the increasing volume of garbage, and LLMs are going to amplify that problem. So learn to write well, because the only other way to distinguish yourself is to spend a shit ton of money on advertising—and nobody can compete against the big players when it comes to that.
Conflux
In other news, I had a blast at Conflux, hanging with old friends and making some wonderful new ones. There are far too many people to name, but it was a particular pleasure to meet Lee Murray and Ellen Datlow in the flesh, and to panel with Amie Kaufman again after, I think, 6 or 7 years.
My panels all went well, and, when folks learned about the comic I’ll be kickstarting shortly, I was invited to join a panel about, um, cannibalism. Which was pretty cool.
Another unexpected pleasure was being invited to do a reading for the Narrative Library, who are collecting audio of a great many wonderful Australian writers doing a bit from their work. I read a five minute slice of Bloody Waters, and there’s a very short interview to go with it. You can find my reading under F.
Gourmand Go
Thanks to everybody for your feedback on the Gourmand Go rewards. It looks like the most practical and popular option will have a POD offering for backers outside of Australia, with book plates available. We’re looking good for a 3 week campaign, launching on the 25th of October, so hang in there. Perhaps next week I’ll show you guys a preview of the campaign.
Once again, the pre-launch signup is here:
Franksly yours,
— Jason