Narrascope 2019 - David Kuelz - Designing Games that Listen
I'm going to talk about a game that I did a lot of the writing and narrative design for recently called Starship Commander, a virtual reality game that is driven entirely by human speech, which means that there is no controller. Instead you talk out loud to each character via the microphone. We run that audio through our system and then the character you spoke to responds, so it's a game that you play by having an elongated conversation with the script.
This speech recognition technology we're working with was disruptive. It gave players a lot of agency. They could say anything they wanted to at any point they wanted to. There's no point in which the player can't talk. That both opened up a lot of cool doorways of things that only we could do. But it also opened up a couple of Pandora's boxes of real design and production problems that we knew we had to approach carefully.
So almost every decision I made about the narrative design was trying to work with the technology and figuring out how we could highlight the parts we wanted to and either avoid or at least contain potential problems that were coming up.
THE DIALOGUE STORY IS THE GAMEPLAY
We built Starship Commander using the Unreal engine because it had two features that we really wanted. First was the blueprint system. If you guys aren't familiar with blue, It is kind of like having twine embedded into the engine, and I say kind of because it behaves very similarly. Similarly, it's about creating a flow of logic. Once you're used to nodes and branches and flow, you can learn blueprint very easily. The trick is that while twine artists draft really for story, logic, and story flow blueprint is really for game logic and gameplay flow. It's about what buttons the player is pressing and how that changes all the math and the numbers we're keeping track of and what that means for our animations and sound effects and enemy AI and all of that. So you build it in a similar way, but it's for a different type of logic. And usually those two logics are kept separate from each other.
In most games the combat or whatever it is you're doing, usually doesn't affect the story. So we'll pause the gameplay and nothing's going to attack you, and you can decide what dialogue options you want to make.
But in our case the only way that we had for the player to interact with the game was to speak to a character or their ship… So we needed one spot where we could build both of those in the same place and see the entire flow all at once. And blueprint was the answer to that for us.
The other thing that Unreal did that we really wanted is it allowed us to build our own tools. We actually initially attempted to build our own voice and speech recognition system. before eventually deciding, let's just use something someone else has already built, that's cleaner and quicker. But in that case, Unreal allowed us to build in those outside sources to the code to the engine and a way that was effortless.
As soon as the Oculus microphone picked up what people were saying, we would send it out to Microsoft and Facebook's wit.ai program, we use those two services because they had the least amount of internet latency. And then it would send back text.
NATURAL DIALOGUE AND MANAGEABLE INTENTS
We really wanted the dialogue to feel natural, which meant there were Hundreds of thousands of potential things that anybody could say at any point in time, so we needed to funnel those down to a limited number of things we could actually script.
So our programmers built this tool kit and this sort of flow of logic that we called the intent system. It works this way. any individual idea or concept of something that the player could say, was labeled an intent. So up there, you're looking at a question, Where are we going? As an intent? It could also be, who are you? It could be a comment, like, I think that's a bad idea. But any individual thing that somebody said and that we wanted to have a unique response to was an intent. Inside each intent. There was a list of utterances and utterances were essentially all of the different ways somebody could phrase that intent. So if we were going to use Who are you? as an example, some of the utterances inside that might be Tell me about yourself or What's your story? Or Tell me your story? Those are all different ways of phrasing that same idea. So ultimately, utterances became all of the different ways a player could activate this one intent.
And the last thing we needed to do was take those and phrase them in the context of a specific conversation. Because what we're saying can mean very different things depending on the context, like whether or not a provocative comment is a joke, or an insult. So, somebody would speak into the microphone. We would send it out to Facebook and Microsoft get back text, the computer would then figure out what model am I in, which is something I would determine for it in blueprints, then it would check all of the intense inside that model, and all of the utterances inside all of those intents and that would run an accuracy test and the utterance that was the most similar to the text that Microsoft sent us back determines which intent would activate. And then we had all of that system hooked up to a custom node here in blueprint that depending on what the player said, would take it off on different branches of logic and that's how our game would react to the players speech.
FULL MOTION VIDEO
We created our content using full motion video, partially because it was really quick and cheap and easy to do fast
BUILDING STORY TO FIT THE TECH
Once the tech was in place I was hired to build a plot to match. There were a lot of really cool, exciting things that we could do with our system and we wanted to make sure we had a story that capitalized on it. And there were also a lot of pitfalls that we thought we could use the story to kind of help mitigate. So I ended up coming up with a plot that was not necessarily about how excited I was as a writer to tell the story, but that I thought would fit the technology. And I was aiming to do it by thinking about these four things, which I will go through one at a time.
IF YOU”RE USING BRANCHING DIALOGUE KEEP YOUR EXPERIENCE SHORT
If you have a story that branches heavily a section that takes the average player roughly 15 to 25 seconds to get through demands a lot of writing and recording and programming. So we knew we weren't going to be able to keep this up for a long experience. Instead we're aiming for a short 90 minute to two hour experience.
YOUR BUSINESS MODEL CAN BE A CREATIVE PROMPT.
We wanted players to be comfortable dropping a little bit of money on out game, even though it was really short. SO we made it re-playable because we already had all of this alternate dialogue anyway, just by virtue of the technology. So, we did things like enabling players to find branches just by saying something specific not because they were presented with a choice. This was simultaneously a design and business decision.
TRACK AND PLAY TEST REAL CONVERSATIONS AND USE CONTEXTUAL CONSTRAINTS TO HELP MANAGE THOSE CONVERSATIONS:
During every play testing session we recorded all conversations and we would be uncovering a lot of utterances that I hadn’t thought of – and that worked, but it was still a time intensive process. So I also needed to find ways to contain the number of intents that we would be plausibly expected to have. So I created specific contexts, around scenes. Essentially, I decided to write a thriller, where every single scene has this imminent point of conflict and danger with a ticking clock element because in that situation, it does not occur to players to ask, “What's your mom's favourite colour?” This also gave us an easy out anytime someone made a comment or had a question that we hadn't thought of, so a very logical character response would be 'What does that matter right now? Can you please focus on the bomb?'
BTW we would play test in New York and because of the way people in New York naturally spoke, a scene would flow well. And then we would take it out to the west coast, and people would have different ways of putting the same idea together and it would get stumped. So actually, a big part of what we discovered with the play-testing process was to take it around to a lot of different local areas.
The biggest problem that we had was ambient noise in the room, disrupting what people said.
There was also delay in the dialogue, we couldn't avoid that but we disguised it a little. Because you are in your ship, and the entire show we disguised it as static. Anytime we needed to buy a second, the screen would fizzle and then they heard you and talked to you.
USE SURPRISE BRANCHES.
When your player can say anything at any time, at some point, some player is going to say something that disrupts the plot. Now on one hand that was a problem, but on the other hand, it can be an opportunity. Because if a player is clever enough to be able to disrupt what's supposed to happen with genuinely good story logic, we wanted to be able to reward them for that by having a concrete change in the story result. But that also put us in a weird situation, where all of a sudden, we were at the risk of players creating endless new branches, so I wanted a plot structure where we could add branches as we needed without having to build a whole lot of new content.
We also needed to account for the knowledge gap between the player and the avatar character. In our situation, the player was writing their own lines, without knowing any of their avatar’s backstory. And we needed to make that plausible or help the player figure out how to be this person.
So I created a MacGuffin. This is the anomaly…
At the beginning of the story humanity controls the MacGuffin, but they don't know what it is, or does. And in the opening scene, players role play a scientist trying to figure out what it is. They fly a one person ship into the anomaly in the scene, merging with it, potentially explaining any strange memory loss and also potentially explaining why the player know things about the future because the characters shouldn't. but maybe they've experienced alternate timelines.
In the middle of this experiment, it is disrupted, everybody is attacked by a race of war like aliens and the anomaly disappears, it blinks out of existence. It goes somewhere else within the Milky Way galaxy. They rest of the story is about you the one person with this mysterious connection to the anomaly and your military escort Sergeant Sara Pearson, exploring the Milky Way trying to figure out what the anomaly is, where it's gone, and what the human race want to do with that when they find it.
This structure gave us a lot of freedom to improvise later on. ... A mystery is made up of clues. And those clues don't necessarily make sense individually, but when you put them together, they coagulate into the bigger picture of what is happening and why it matters. As long as the clues are structured in a way that's clever, you don't necessarily need to have all of them either. When you start, you start out in the prologue scene every single time you have an experiment with the anomaly. And then depending on the choices you make in that scene, we send you to one of four different tier one scenes. In any individual playthrough you will only ever play any one of these missions and these missions are all isolated from each other. The plot of each mission you go on is dependent on where you are in the Milky Way galaxy, what is happening in this part of the galaxy, and they don't directly relate to one in each other.
So once you get to the rescue mission, that mission will proceed as usual, regardless of what missions you've happened to play before it. But the reward for each one of those missions is a clue in the greater story of what's happening with the anomaly, and why.
There were different combinations of five clues that I had to make sense and I always had to get the player to the finale of where the anomaly was and why it's there. But once I knew there's only ever going to be one tier one clue. There will only ever be one tier two clue I could create a sense of progression. A good mystery isn't, Oh I have one clue. And it doesn't make sense, only when I have five can it be solved. Instead you want to slowly build and curate a picture of what's happening and then start to correct it. Now that I have two clues. I get the sense that these are connected in this way, but I don't know how yet. And now that I have a tier three clue Oh, that's different than what I thought that it's starting to get clear. And depending on the progression the player took through the game, they would get a slightly different version, a different perspective of what was happening and why you couldn't understand everything about the story until you played it through multiple times.
Let's walk through an example. Let's say the player has played one playthrough of the game, and in the first playthrough, they participate in the alien ship mission. And they get that clue about the anomaly. When they restart the game, technically, that fact doesn't exist in the game, but the player still knows it. The story is the same every time the player just gets different pieces of it. So potentially, the clue that the player learned in the alien ship, they might be able to use to change the way the traversal Memorial pans out. They might want to create a surprise branch there. And we could actually then reward them for that by making a branch that goes from the traversal Memorial sit down to Caltech runes, that would otherwise not be connected. So we can rearrange the flow.
Also at any point, if we're running out of time, or there's a tech problem that we don't know how to solve, we can just say, you know what, let's get rid of that scene. …and the reality of the story structure is still intact. You could make loads of cuts and as long as you don't make too many cuts from any one branch, the actual story is still playable. So we could sort of decide as we went and adjust ourselves from there.
Certain types of stories are more meandering than others and you do not want that. And when I think of sci fi and fantasy, I usually think of the setting as a huge star, but ultimately, we found that meandering was not a fun activity and very difficult to build. So in terms of certain genres of story, I would say, plot driven stories, like thrillers specifically, or maybe a very strong character driven story like a romance might work really well for this approach.
BUT PLAY TESTS SURPRISED US
As it turns out the players have a much better experience when you actually do have a lot of control, and you really can guide them through something specific.
One thing we had learned from actually our very initial brief tech demo scene, was that we really needed to have a way of allowing the conversation to continue regardless of what the player said if they happen to get stuck, because in a system that is this open ended, we really didn't have a plausible way of suggesting to the player what they should say to another human being right now without breaking the fourth wall in a way we didn't want to do.
But when it is that open ended, and when you are waiting for the player to say something specific, like I'm ready or Let's go, ultimately, it's not going to occur to a lot of people to do that.
There's a lot of people that are going to get stuck not know what to say and loop around and endless spiral. So we tried to have a way of progressing automatically. At some point if enough time had passed we would have the character, “You know what, the anomalies out there, we got to keep going and we got to move”. Down here, that's a different face of the conversation with different contexts and different reactions to what you might say. We did that in a very linear way.
But it is very, very difficult to write compelling dialogue when you only know what one half of the conversation is going to be.
I didn't know what my character were reacting to. I didn't really know what the player just said, I knew the idea. I knew the intent. But I didn't know the words. I didn't know the tone. I didn't know the utterance. And so all of the dialogue that I was putting out was very generic. It was very vanilla. I wasn't in love with these people. It wasn't good writing.
Eventually I realised that this was that this was a gameplay problem, and not a writing problem. So it wasn't that the writing was bad. It's that the flow of the activity was not fun. I took the time to do some extra drafts and try to find voice. But what actually I should have been doing what would have been more productive would have been treating my initial draft in twine, not like a draft of a script, but like a paper prototype for the gameplay to try and figure out if the actual activity of this specific conversation and the way it moves is fun.
When we did start play testing the scenes, we discovered that players are actually very bad at coming up with things to say. Most of the things that they said were incredibly short, very simple. A couple of words of one word answers.. 90% of players just did whatever they were told, without getting into it. Only about 10% of players actually explored.
Unlimited freedom shuts down our ability to actually make a decision.
So in open ended sections in the script, we would say, do you have any questions? Do you want to know about me? Do you want to know about your ship? Do you want to know about this planet? The aliens? What do you want to know about ask? People go? I don't know. I don't have any questions. And we keep moving.
But the concept of asking anything turned out to be an inherently unfun activity. The more intents we have, the more utterances we have to have for each one. The more directions we go, the more general we have to be. But as it turns out, that the freedom wasn't what players wanted, they wanted to be listened to, very, very specifically.
Instead, the consistently successful part of our play test was a very brief moment early in the briefing scene, where Sergeant Pearson is filling out some of your information and she would ask How old are you? And we had written out a response of every age from like five to 95. Nobody came in with an answer. We weren't prepared. But what worked about that moment is because the constraints of the scene were very tight, so we could build out 90 different personalised answers to this one question.
And in that moment, players felt listened to. And more importantly, because we knew what was going to happen we had a funny little joke for every age. If you came in and said, I'm 43. We have a funny little joke about 43 year olds for you. Now that did a couple of things. This call and response dialog did a couple of things for us. But most importantly, it gave us a feedback loop.
The idea is that it's the physical concrete process by which people play a game, which usually means that they push a button. That action has an effect inside the world of the game, but the loop isn't complete until the player is informed about the effects of their action. Through feedback, the player presses a gob or presses a button and attacks the Goblin. Once we have an animation where the Goblin staggers back and a little sound effect where it yells and a blood texture shows up on the floor that's when the player really understands what they did and are equipped to now take another action.
This is where controlling the context of each scene became very important. Your character could legitimately take the reins to focus in on this one thing that's very important that we have to do. But then because we're not doing anything else, we can create all of these different options and listen to the specific things that each player says, each utterance can become its own intent, and then we can respond to exactly what they said and the exact words that they used, in a way that's immediately engaging and rewarding. Once I did this, I immediately knew how to write again. I immediately had the ability to look at exactly what a player had said and bounce off of it in a way that uses their own words, and I could create dialogue that was engaging. And the writing became a reward for the players creativity. When the player said something specific, a little bit unique to them, we could respond to it in a way that was actually entertaining, we could respond in a way that had value in and of itself. And when we rewarded players for being creative, all of a sudden player started being creative. They started opening up they started talking a little bit more as if they were talking to an actual person.
So overall lessons:
- Pick technology that you can control pick technology, you can build your own tools,
- Pick a plot where you can tightly control the context of a scene.
Kuelz, David. 2019. Designing Games that Listen. U.S.: Narrascope 2019/YouTube.
The USW Audience of the Future research team is compiling a summary collection of recent research in the field of immersive, and enhanced reality media