A Large Language Model Use Case

If you are not familiar, Dungeons & Dragons (D&D) is a game of storytelling and adventure. It is perhaps ultimately open ended. You do not “win” a game of D&D, so much as you engage in a story as one character within it, shape it from within, and take action within new and ever-changing circumstances as the game continues over weeks, months, or years. One of the players is tasked with being the Dungeon Master (DM). Really this person is just responsible for doing the greater share of establishing the environment, or the world in which the story takes place. The DM also plays the part of every other character in the world not played by the other players, such as the village blacksmith, the wizard that lives on the hill, the innkeeper, and the band of murderous Orcs. The DM establishes the overall story. Between all the players, and some dice, you jointly discover how the story turns out.

This can create the most delightful burden on the Dungeon Master to come up with backstories for different characters, create locations, interesting villains, plots, social and political hierarchies, geography, and so on and so on. “Delightful” because it can be a very enjoyable, though never ending task. I have been running D&D games for decades, and my latest game was becoming very detailed and in depth. Certain storylines revolved around elements of the history of the world being uncovered. This created a LOT of background information to generate. One thing that computers are really good at doing is a lot of work with little complaint.

Large Language Model AI systems such as ChatGPT had recently been in the news. So, I decided to try using ChatGPT to help flesh out the bare bones of what I had already mapped out. These systems use a kind of predictive logic to determine what would be a pleasing response to what it has been asked for. After a little experimentation I established with ChatGPT that we were talking about creating content for a D&D game, and the style of responses I wanted. It has been trained on much of the text on the internet. And it turns out there is a lot of information about Dungeons & Dragons as well as similar games on the Internet. So, it was able to generate a lot of information that was suitable to the genre.

This was also an advantageous use of the platform because I was asking for ideas, rather than facts. The problem (which Microsoft and Google have both spectacularly suffered from in recent weeks) is that just because something is repeated many times on the internet, it doesn’t make it more “factual” than something repeated fewer. Giving these systems ways to determine what is fact and what is not is crucial in not just ranking search results, but in presenting data driven responses. Something that is crucial to those use cases.

For example: what is the height of the Eiffel Tower? It turns out Wikipedia and https://toureiffel.paris both agree on the current total height. That is probably correct, they are good sources. If a different height appeared on 3 or 300 other websites, training these language models which to treat as fact and which not to, is obviously a potentially challenging process.

My interest was only in facts regarding a world that was, itself, imaginary. So, completely avoiding that issue. These things I could specifically instruct ChatGPT to know, and to take into account. It could then use them as established context for responses it gave me. This is actually something that has specifically been improved in releases in late December and mid January. As a result, I have a saved chat session in my account that is many dozens of “pages” and has a huge established context. Though, it is an experimental system and it will, from time to time, need reminding of certain facts.

This has become a fairly pleasing, and surprisingly functional, ongoing discussion in which I instruct it with details of plots or characters that are upcoming and I can ask it to fill in blanks. In this way it acts like a tireless cowriter, always willing (outages aside) to come back with yet another idea. Willing to create yet another list of name options for a new important character or faction or place, and willing to suggest a whole new list of important landmarks for a new town to be added to the map. I take these, select what I want, adjust, and fold it into my planning. if I don’t like what comes back, I can ask for adjustments, or change my approach, or discard. Or I simply use it as a springboard for my own design.

Because ChatGPT uses vast amounts of existing text as a basis for what it generates one could argue it can tend to lean in to tropes of the genre. Even tending to come up with similar names again and again. For example, it is not hard to sense influences of Tolkien and his languages often, as well as previous publications for D&D, as you would expect. But, I see this as suitable for its role in the process. If you defy the genre too much you are simply telling a story in a different genre. Having a co-writer that will adhere to the expectations of the genre allows me to choose when and how to subvert it, or take surprising turns from it.

Further, the rapidity with which this can be done has allowed me to plan elements further into the future and flesh out more of the invented past than I normally would, and in a level of detail that I had not before. I can still feel free to improvise as the story unfolds (and still sometimes have to, when story turns are unexpected). And, because the work overhead is low, anything I discard or radically change or use only a small part of is perfectly fine. No great cost in time or effort is wasted. I can delve far into the backstory of a character that has turned up a couple of times to give myself a greater understanding of them, even though it is possible the story will never lead back to them. If it does, I have plenty of details filed away to draw on as they become relevant.

One interesting experience I had was with an upcoming storyline. I had laid a lot of groundwork for a certain event to happen to the characters, at a particular place and time. And had planned for several other characters to be present, both as antagonists, observers, and potential allies. After laying out the characters (some of them built with the help of ChatGPT) and the location and event they would be attending I informed it of the specific plot turn that was planned to occur. As part of it’s typical upbeat and affirming response it went on to outline how it felt each of those characters would react, and what it thought their priorities would be as they responded. Very pleasingly, this was very close to what I had planned for those characters. Responses I had been careful not to forecast to it previously.

One thing I believe about good storytelling is that characters should act consistently within themselves, even when their circumstances are unexpected. When characters make choices we don’t understand it is hard to empathise with them or to take what they are responding to seriously. This indicated to me that the characters we had a shared understanding of, we had the same conclusions about their responses. That is, they would act consistently. Hopefully the plot turn itself will be the surprise, but the world around it will seem all the more real because it’s inhabitants will respond in character.

Of course, it is possible that this response had been seeded in the information I had provided, and I was experiencing a mirroring in this response. Even if so, I felt it indicated a degree of internal consistency in my plans, characters and world-building.

There has been a lot of talk about the power of Large Language Model AI Chatbots, what they can do and whose jobs they will replace. Because we as humans like to have interactions in language, in text, the very convincing way these pieces of software respond to us makes it easy to believe they are capable of far more than they truly are. What they are capable of, really, is emulating things that they can absorb vast amounts of. Like an infant generating a first attempt at a word after listening to months of human speech. When all you are asking of it is to generate content of a certain style within some parameters, it is very well suited for purpose. That process of absorption and emulation is ideal for that task, as you want the output to be unexpected in some ways. When asked to make real world decisions or determine specifics based on the sum total of the text on the internet, it becomes very problematic. Large Language Model developers are already working on ways they can exclude earlier outputs from Large Language Model AIs from new training sets. They don’t want newer versions of the AIs to be swayed by the incorrect output of older AIs. I think this is indicative of how Large Language Model training can be useful, and where it should be avoided.

Some References: