(These are a series of postings I made in the textadventures.com (Quest) forums almost three years ago. The ideas put forth here eventually led to my Quest “Response Library”, which was then rewritten in JavaScript as ResponsIF. Not everything in it is still relevant for me, but I’m including here because it has some nascent ideas I think are worth preserving as well as including in this sequence of blog posts. I’m concatenating the posts together, as they’re all related.)
(Intro)
I wanted to start a dialogue (no pun intended) about conversations. Part of this is philosophical/theoretical, which can be fun on its own, but I also wanted to possibly work toward the creation of something new – a Quest conversation engine.
First, the idea bantering phase… So, what exactly do I mean by a conversation?
Part of that should be fairly obvious: the player having communication with other characters (also known as “non-player characters” or “NPCs”). There is already code in Quest – plus an extended such library written by The Pixie in the “Libraries and Code Samples” forum – which supports basic “ask” and “tell”. This takes the form of something like “ask Mary about George” or “tell janitor about fire”. You have a target character and what’s known as a “topic”. Typically, this basic implementation responds with a fixed response (e.g. look up string “topic” in character’s “ask” string table), with an occasional implementation of selecting randomly from a set of fixed responses.
But we usually want more…
There have always been questions, wishes, requests for NPCs to be more interactive. Now, this is black art at this point, and I don’t propose to have a definitive answer (maybe we can move toward that through this), but the point is, game author’s want more dynamic responses, based on any number of variables including previous topics, the time of day, what has occurred in the game, the state of play, and possibly even the emotional state of the character.
My own (limited) game design and wishes took me further.
Consider related desires. It has often come up that author’s want, say, room descriptions to change based on, well, more or less the same things listed above (though emotions come into it less). Similar design choices could be made for object descriptions, NPC descriptions, etc.
I had this revelation one day, as I began to see patterns in this. It finally occurred to me:
In a text adventure (or “interactive fiction”), everything is a conversation.
Let’s list some examples.
1) NPC conversation. The player wishes to ask characters in the game about things, requesting responses to topics. The character responds with information (or some non-committal shrugs or “I don’t know”, if not).
2) Room descriptions. The player makes queries to the game about the room. We ask the room, “Tell me about yourself,” and it responds. This extends as well to scenery. We often will put scenery objects into a room just to hang descriptions off of. These are not interactive objects as such – they exist in the game framework just so the user can type something like “x carpeting” and get a response. In the mindset of conversations, this could be thought of as asking the room about its carpeting. (In other words, the room is the character we’re conversing with, and the “carpeting” is the topic.)
3) Object descriptions. Similarly, when we “x” an object, we are asking it to describe itself.
4) NPC descriptions. Sort of like object descriptions but with a different feel.
5) Hints. Hints are really a bit of a one-way conversation, a running commentary by the game on what you’re doing. Certainly, there could be a “hint” command that forces a response, but there is also the system whereby hints are offered unbidden at the appropriate times. This, in a real sense, is the game conversing with you based on the state of play.
So you might be wondering why I’m trying to lump everything into conversation. The reason why was mentioned before: because we want all of the above things to have a certain dynamic to them, and that dynamic is common across them. We want them to not be static. We want them to change over time, not only randomly but in response to the game world. In a very real sense, as we are playing a game, we are in a conversation with the game itself, and we want this conversation to be alive and engaging.
Currently in Quest, the above areas are handled separately. Room, character and object descriptions (well, rooms and NPCs are objects, after all) are handled via a “description” attribute, which can be a string or a script. Ask/tell are handled via string dictionaries. And hints are basically “roll your own.” Which means that if you wanted to add dynamic responses for all of them, you’d need to implement separate code in separate places.
I’d like to change that. I know in my games (largely contemplated and only minimally implemented, mainly due to wanting too much), I have made stabs at implementing the above pieces.
One example: I wanted my rooms to have “directional descriptions”. By that, I mean that you would get a different description depending on which way you came into a room (or “area”). As an example, let’s say you’re outside in a clearing and you see a path to the north, so you type “n”. A stock description once you get there would be “You’re on a path. To the north, you can see a pond. A clearing lies to the south.” That bugged me a bit. Of course a clearing lies to the south – I was just there. I have walked north. In my head, I’m facing north.
What I wanted was that if you came north onto the path, it would say something like, “The path continues north. You can see a pond in the distance.” Whereas, if you came onto the path from the north, heading south, it would say, “The path winds further south, leading to a clearing.” Now, we can get into debates about where the full set of exits is listed. My option was to have a ubiquitous “exits” command that would dump the out (e.g. “You can go north and south.”) There are other ways to do it. But the point is: the game responds in context. To me, that is much more immersive than “push level B and get response Q”.
Now even if directional descriptions don’t make sense to you, other things undoubtedly will. For example, if you have a game that spans days within the same environment, you might want the descriptions to change based time of day or what the weather is. A whole host of possibilities come up. The same might be true even of NPC descriptions: in the beginning, the sidekick is immaculately dressed in blue. After the flight from the enemy agents, her top is torn and her hair is askew. The next day, having changed, she’s in black jeans, and she has donned a graying wig and wrinkled skin prosthetics like in “Mission Impossible.”
How would you do all that? Lots of “if” statements? I must confess – I love writing code, but it makes my skin crawl to write lots of special-case code, because it’s incredibly brittle and requires all sorts of mental gymnastics to keep straight. I’d rather write some nice general, data-driven code that seemingly by magic does the right thing.
So now we’re down to the proposal part – I’d like to begin working on a community Quest “conversation” engine. Not a library. Not “ask/tell”. I mean a full-on, topic-based, evolving over time sort of engine that anyone can use and enjoy. To that end, I’d like to get any thoughts that people have on the subject, any design considerations, possibly even any code designs that you’ve tried yourself. For anyone who’s interested, let’s put our heads together (and chickens are welcome as well as pigs – if you don’t know what that means, either skip it or look it up).
There are different approaches to NPC conversation, but I want this engine to be something that can be used universally in a game as well as (on some level) for all those things listed above. Perhaps that’s too big an ask, but I want to set the bar high going in. Then we can see where we get to.
I’m going to follow this post up with some of my own (soon), but I wanted to get this started.
(First of several parts)
What we will call “conversing” (I’m going to stick with that, as perhaps it will someday become more like that) is in reality selecting text to display to the user – choosing something for a character to “say” – either in response to a query (direct trigger) or based on conditions (indirect trigger).The trigger might have a topic or topics associated with it (e.g. a query about a topic), or the trigger might be neutral (e.g. a hint engine that runs on each turn searching for phrases to present, or a particularly chatty NPC who offers up information in his/her own when the time is right).
How do you select what to say? For things like object/room/character descriptions, the standard is either a hard-coded string or some code which typically selects and assembles the output based on whatever specific logic is programmed in. For something like ask/tell, the implementation is a string lookup in a dictionary.
That is the current state-of-the-art in Quest. There’s any number of ways to go as far as how we can implement string selection. I’ll cover a couple of directions I have explored.
Boolean Logic
The most straightforward is to implement some sort of hard “yes or no” selection. That is, we look through the available phrases and see which ones match. Either a phrase matches or it doesn’t. What to do if more than one matches depends on the situation (and so complicates things a little bit). For interactions with an NPC, you’d probably want it to spit out a phrase at a time in response. You could randomly select one, for example. Another possibility is to show all matching phrases.
The latter is what I did in my room description implementation. Each room could have various “topics”. A topic would then have conditions. Standard topics included “base” (the base description), “first” (text to show the first time), “later” (text to show when not the first time), “north”/”south”/”west”, etc (text to show based on arrival direction), etc. Custom topics could also be added, depending on conditions and needs.
A sample room might have text like:
<baseDescription>"This is the second floor landing."</baseDescription>
<westDescription>"Further to the west you can see an archway leading to another hallway. A broad staircase leads up and down."</westDescription>
<eastDescription>"The hallway continues east. A broad staircase leads up and down."</eastDescription>
<upDescription>"The stairs continue up. A hallway heads off to the east, and through an archway to the west, you can see another hallway."</upDescription>
<downDescription>"The stairs continue down. A hallway heads off to the east, and through an archway to the west, you can see another hallway."</downDescription>
The description would be a combination of the base description and any directional ones. So, for example, if you move west to get the landing it will show:
This is the second floor landing. Further to the west you can see an archway leading to another hallway. A broad staircase leads up and down.
which is a combination of the “base” and “west” descriptions. (The fact that strings are surrounded by quotes is due to the fact that I would “eval” the attribute to get its real value. This allowed me to put expressions in the strings.)
Here are some of the conditions (in the base room type) that trigger the various pieces:
<baseCondition>true</baseCondition>
<northCondition>lastdir="northdir"</northCondition>
<northwestCondition>lastdir="northwestdir"</northwestCondition>
<westCondition>lastdir="westdir"</westCondition>
<firstCondition>not GetBoolean(room, "visited")</firstCondition>
<laterCondition>GetBoolean(room, "visited")</laterCondition>
Now, this is all well and good and works fine for rooms. The conditions can vary, and the text shown will vary accordingly. But it is very static. What it’s missing is any sense of history, any inclusion of what has been shown (to some extent, what has been “discussed”). For NPC interactions, I wanted more.
Fuzzy Logic
Moving beyond the yes/no, hot/cold, show/don’t show of binary, Boolean logic, we arrive at another alternative – that of fuzzy logic. I have only toyed with a test implementation of this; with no real world uses, it might end up needing some more design to make it all work properly. I’ll describe what I have considered so far.
The basic idea behind this is the notion of a current “conversation context”. This context holds information about the current set of topics. This set of topics changes over time as different topics are explored.
Each NPC would have its own conversation context. Each possible phrase would also have related topics. Triggering a phrase will update the conversation context, which will then influence subsequent phrase selection. Let’s see how this might work with an example.
Here are the phrases. The first part is the string. After that is a list of topic weights associated with that phrase. Weights range from 0 to 100. (There might be a use for negative weights as well.) I hope this isn’t too contrived…
[id=1] “My father was a farmer.”, father=100, farming=100, family=50, history=80
[id=2] “I grew up on a farm.”, farming=100,history=80,home=100
[id=3] “My brother’s name is John”, brother=100, family=50, john=100
Let’s look at these a bit. For the first one (“My father was a farmer.”), we have the topics “father” and “farming” being a strong match. The topic “family” matches 50%. The weights have two purposes:
1) They help in matching by showing how much a topic matches the phrase.
2) They are used in modifying the conversation context when they are used. (More on this below.)
The idea behind 2) is that our minds are very associative – when we discuss topic “A”, it brings in other related topics (“B”, “C”, and “F”). We want to have a similar behavior in our phrase selection.
Initially, the conversation context is empty. Nothing has been discussed.
Context = {}
Let’s say the player enters “ask about father”. Due to this, the “father” topic gets injected into the current conversation context:
{ father = 100 }
Now, we search. The search is done by generating scores for each phrase via a sort of “dot product”. (If you don’t know what that is, don’t worry.) Basically, we multiply the context with each phrase’s topics and generate a score. In the following, the first number multiplied is the value in the context; the second number is the number in the phrase weights. A missing weight has value 0.
[id=1] score = (father) 100*100 + (farming) 0*100 + (family) 0*50 + (history) 0*80 = 10000
[id=2] score = (father) 100*0 + (farming) 0*100 + (history) 0*80 + (home) 0*100 = 0
[id=3] score = (father) 100*0 + (brother) 0*100 + (family) 0*50 + (John) 0*100 = 0
In this case, phrase 1 is the clear winner. If the topic were “brother”, I hope it’s clear that phrase 3 would be the winner.
Let’s see what happens if we have “ask about family” as the topic. In this case, the context would be {family=100}. Running scores, we get:
[id=1] score = (family) 100*50 + (father) 0*100 + (farming) 0*100 + (history) 0*80 = 5000
[id=2] score = (family) 0*0 + (farming) 0*100 + (history) 0*80 + (home) 0*100 = 0
[id=3] score = (father) 100*50 + (brother) 0*100 + (John) 0*100 = 5000
In this case, both phrases 1 and 3 match “family” about the same. What happens is to be defined. Either it could spit out one phrase (chosen randomly perhaps, or in order), or it could spit the both out (“My father was a farmer. My brother’s name is John.”).
Let’s go back to the “father” case. In that case, we would have a result of phrase 1. So the NPC would say, “My father was a farmer.” Based on standard human conversational associations, we would want to update the conversation context with any related topics brought in due to that phrase. I’m not sure of the ideal algorithm for that. For this case, let’s assume we generate averages. (A better weighting scheme might make more sense.)
Context = {father = 100}
Phrase = {father = 100, farming = 100, family=50, history = 80 }
new “father” = (100 + 100)/2 = 100
new “farming” = (0 + 100)/2 = 50
new “family” = (0 + 50)/2 = 25
new “history = (0 + 80)/2 = 40
New Context = {father = 100, farming = 50, family = 25, history = 40)
This is now the current conversation state. What I had wanted was for NPCs to be able to initiate conversation as well, not just respond to player queries. If the player were idle in this case, the NPC might decide to come back with its own search for something to say. Performing the search again with the current conversation context (let’s assume we won’t repeat a phrase), we get these weights:
[id=2] score = (father) 100*0 + (farming) 50*100 + (family) 25*0 + (history) 80*40 + (home) 0*100 = 8200
[id=3] score = (father) 100*0 + (farming) 50*0 + (brother) 0*100 + (family) 25*50 + (John) 0*100 + (history) 40*0 = 1250
In this case, phrase 2 matches, so the NPC will follow up with: “I grew up on a farm.” After this phrase is uttered, the conversation context will be updated to:
New Context = {father = 50, farming = 75, family = 12.5, history = 60, home = 50)
Note that we didn’t discuss “father” again, and so its weight has decreased. Also note that farming has been emphasized. And we now have a new topic (“home”) which may trigger additional, subsequent phrases.
There are many variants to this, possibilities for manipulating the parameters. Perhaps the new conversation context should not be a simple average but a weighted one. Perhaps the conversation context should “decay” over time, if there is no conversation (that is, if you cease talking, the weights gradually move to 0 and disappear with each command). Perhaps a new topic injected should enter into the context with some lower weight than 100. As far as not repeating phrases, perhaps the “memory” of a phrase being spoken decreases over time (possibly at a rate determined by the age of the NPC), such that it will eventually be repeated. There would also need to be some determination of what to do if either no phrases match (“I don’t know about that”), or all matching phrases have already been spoken (“I don’t have any more to say about that.”)
There is one downside to these weights which needs to be addressed. A queried subject might want to be searched more forcefully than one where the NPC follows up on its own. For example, after “ask about father”, if the player enters “ask about the moon”, it wouldn’t make sense for the NPC to come back with “I grew up on a farm” (which it would if straight weights were used and the priority of the topic wasn’t taken into account). One way to work that out is that a new query from the player generates a new temporary search context with just that topic, with the weights from any chosen phrase adding into the current context afterwards.
Next: Bringing in the world state, the best of both worlds, and some emotions.
(Continued)
(Note – this article is referenced below: http://emshort.wordpress.com/how-to-play/writing-if/my-articles/conversation/)
Assuming what we have been discussing so far actually works, then we now have some sort of scheme for modeling an evolving conversation. It does mean that you as the author have to actually go in and create all the response phrases – and decide all the topic weights. While the thought of having NPCs actually generating utterances on their own (putting words together to form coherent sentences) is a wild pipe dream, that is not what we’re talking about here.
(Aside: In Emily Short’s article listed above and in her other ones, she uses the word “quip” for what a character says. While that is a bit of a fun word, it had a connotation to me of levity which I didn’t think was generally applicable. So, being the programmer that I am, I am going with the more generic “phrase”. Even that might not be entirely accurate, but we need something.)
What we have gone over attempts to address selecting phrases based on topics, but it’s fairly self-contained. The only inputs are topics. We also want to interact with the game world, to somehow have the world state influence phrase selection.
By “world state,” I mean pretty much anything of interest in the game world. It can be things like:
– where the player is
– where they have been
– what they possess
– what they have possessed
– whether a murder (or some other significant event) has occurred
– the time of day
– the presence or absence of other NPCs
– the emotions of the various NPCs
– who has just walked past in the corridor
– choices the player has made in the past
– anything else relevant to the story being told (or the puzzles therein)
You could, in theory, use weights for that, and inject state in as topics. I tried that for my room descriptions at first but quickly abandoned it. There are two problems:
1) You need to create a topic for each piece of world state (“has_stake”, “visited_crypt”, “saw_vampire”, “killed_vampire”, etc or, in my case, “east”, “west”, “north”, etc ), which can quickly become unwieldy.
2) Such state is typically binary – we only want certain phrases to be eligible when certain conditions are true. In order to do binary in a weighted scheme, you have to not only add weights for what you want but also (short of some notation which I tried and gave up on) add negative weights for what you don’t want, to prevent the phrase from being selected when those topics are present. It’s possible but, again, quickly becomes painful.
What I have come to is that we want to somehow utilize both schemes, “binary” and “fuzzy”. The “binary” part would be used for phrase eligibility, based on world state; the “fuzzy” part would be used for selecting from among the eligible phrases, based on topic or other fuzzy considerations. This is the next area of thought for me, so I don’t have any ready answers yet. Perhaps some sort of uniform notation could be adopted. I’m open to that if it would work; in fact, I’ll probably revisit it myself, as I like to unify things.
One piece of “world state” we might want to consider is the emotions of the NPC. For example, if I ask a character about her father, if she loves me, she might respond, “I can’t wait for you to meet Papa!” whereas if she is feeling antagonistic towards me, such queries might be met with, “That’s none of your business!” or something even more venomous. Whether such state is “hard” (binary) or “fuzzy” is really a question of design, depending on how the emotions are modeled. That’s another discussion to be had. It could be implemented as simply (but crudely) as a boolean “inLove”, if that’s all you care about, or as a variable “loveState” ranging from 0 to some upper number, or as a scale ranging from hate to love. You can also have different ranges for different kinds of emotions. The point is: you will need to determine how that is going to be modeled before you can decide how to integrate it with a conversation model. Hopefully, if this all comes together, you’ll have plenty of flexibility to do it the way you want.
(Continued)
A random, somewhat related thought: this one’s about room descriptions.
Typically, room descriptions are centralized. When I say “look”, it goes to the room object (in Quest) and says, “Give me your description”, and that description is either a string or a script. And there is boilerplate code that dumps out the objects in the room, the exits, etc.
Now, let’s say that conditions can change in the room. Maybe a chair is there sometimes and other times not. How do you handle it?
The typical way in Quest is via an “if”. You make your room description a script, you dump out the stock part of the room description, and then you put in an “if”: “if the chair is in the room, also print *this*”. It’s all a bit clunky, and it’s also rather centralized. The room has to know about the possibility of the chair. It’s all very static, very hard-wired, very brittle.
Let’s turn the notion on its head a bit. What if we look at the command “look” as an invitation from the player for description, which goes not just to the room but to all within it. Now, instead of the room having to know that a chair is possibly there in order to add in custom text, we will have the room respond with its description and we’ll let the chair chime in as well with its own description (with such description being the minimal “There is a chair in the corner” sort of thing, with a full description forthcoming if the player directly queries the chair further). And if the chair isn’t present, well, then nothing is there to say it. Imagine the room saying in a booming voice, “You’re in the library” and then you hear a small voice from the corner say, “There’s a plush chair in the corner.” They get written out together in one block of text. Now, instead of a single voice, we have mutiple voices all responding together. Instead of having to put “To the east, there is a foyer”, the east exit will itself add in the text to alert the player to its presence – if it wishes. “Hey look! You can go this way.”
A bit bizarre on first thought, perhaps, but I really like this sort of thing…
(Finally)
I’m in the process of working on a general “response” library, which I’m using in a fledgling game. The game is forcing me to actually use it in a real-world way, and it’s helping to point out where the design has flaws or can be expanded. Lots of uses cases being folded in…
To back up, I had a bit of turn of thought, based on (what else?) terminology. I came to realize that the word “phrase” was not very good, as it just didn’t fit with what it actually was, since a phrase is a piece of a sentence. So I went in search of a better word. I came across “response” and it stuck. But that word opened up new directions in thought. A response need not be verbal. It could be a reply, but it could also be a shrug, a smile, or even a punch in the face.