The audio mechanics of Quantum Break
John Broomhall talks to Remedy Entertainment about the sound of the long-awaited Xbox One title that combines gaming with live action TV.
John Broomhall talks to Remedy Entertainment’s audio director Richard Lapington about the sound of the long-awaited new Xbox One title that combines gaming with live action TV.
Quantum Break is a big deal involving a serious audio production, all funded by a massive investment from Microsoft. This futuristic interactive/linear hybrid entertainment offering for Xbox One marries videogame with live action TV-show featuring a parallel story in which hero, Jack, has to survive ‘Stutters’ – time breaking down and making different game world objects jump backwards and forwards. He can also make everything slow down around him and use his powers to stop enemies and certain objects in their tracks – all interesting propositions for sound design, which unsurprisingly played a key narrative role.
Yet according to Richard Lapington, ace game developer Remedy Entertainment’s audio director, by far the most challenging aspect of the project’s development was the branching dialogue system via which players can navigate the story in different ways as they make their own choices of verbal responses to questions and observations thrown up by the various game characters they encounter.
“From a purely audio perspective the branching dialogue wasn’t all that problematic. We marked up which dialogue belonged to which branch in the screenplay, then recorded and named the files accordingly,” he says. “We created a bespoke internal tool called Dialogue Writer, which proved invaluable for organising dialogue lines, wav files, facial animation files, and integration into the game. It helped speed up and keep track of production and implementation. Actually the branching dialogue was more of a headache per se for the level designers and quality assurance testers – and of course for the writers, who had to ensure all narrative threads made sense in both the game and the TV show.
“For me, the bigger challenge was dealing with the time travel elements and that cross-pollination with the show – because as you progress through the game, you re-visit scenes. But you re-visiting them from alternative perspectives. It’s entirely possible to hear the same scene’s dialogue from two or even three perspectives. For instance, there could be a main scene in the show that you can ‘accidently’ overhear in the game. Another example would be a scene encountered early in the game, which you later witness from a completely different spatial perspective. This made things quite complex – say recycling dialogue originally recorded for the show on-set for use in the game – needing to make it all work together sonically.
“Plus ADR from the show, speech captured on the mocap (motion capture) stage for the game, some recorded in our own in-house studio and then some more traditionally recorded at a VO studio for the game. Of course, we tried to match mics and signal chains across as much of it as possible – but perhaps unsurprisingly it all sounded slightly different. There was plenty of cleaning-up and match EQ’ing…
“Logistically, working with such high-profile actors and actresses could be challenging. They’re all incredibly professional and talented but making a new game with new IP including time-travel and branching dialogue inevitably required iteration. Sometimes getting studio time with our super-busy cast when it suited us wasn’t so easy and that meant we always had to nail it – no second chances – for either voice or simultaneous facial capture, the majority of which we did in-house using a specially created comprehensive pipeline for recording and editing VO.
“We had a very specific software and hardware setup for this. It had to be robust and reliable. We also had a lot of ADR to do in-house with the facial movement capture system, which always involved a super-quick turnaround. We could not have lived without our Sound Devices 744T (which we used for every single I/O in conjunction with our facial recording system). Add to that EdiCue and EdiPrompt, both invaluable in VO shoot prep and keeping our sessions on track. Then we used Reaper – combined with some custom Python scripts – it made managing all these recordings less painful.
When it came to sound design for the project’s aforementioned time stutters, Lapington’s aim was clear – that players should recognise instantly what ‘state’ the game-world is in – even with their eyes shut: “We needed ‘Stutters’ to totally contrast aurally with the normal game-world, so we decided to run two sets of audio side-by-side – a ‘normal’ set and a ‘stutter’ set, and switch between them. Aesthetically, this was shall we say interesting – trying to concoct sounds that resembled what you see yet were ‘fractured in time’. Music also plays a huge part in communicating time ‘breaking’ – music and rhythm are based on repeated patterns and by breaking those patterns in sync with the game-play we can emphasise time breaking and create dis-ease. Music stretches and filters, as does dialogue and most other sounds when time shifts.
“We went through many variations and concepts along the way – the audio team would make assets and we’d see which ones worked and which didn’t. We created an audio reference guide of descriptors for Stutters, for example ‘violent’, ‘unpredictable’ and, even more important, words that described what Stutters were not, such as avoid ‘sci-fi’ or ‘digital’ – really important with quite a few sound people working in different locations, for instance MS Redmond Central Media and later Soundcuts in London.”
The Wwise choice
Lapington’s creative direction cross-pollinated to the live action show’s audio team too, helping ensure consistency in environment and time-related audio treatments. Meanwhile, delivering his audio ambitions in-game entailed some bespoke augmentations to chosen middleware Wwise to assist with the time manipulation aspects and audiovisual sync: “We created Q-grain, a real-time granular synth plug-in. It went through several iterations but really took shape when we started working with Vesa Norilo from the Sibelius Academy. We wanted to closely link animation timing with sound. Right from the outset we knew we had this crazy challenge of matching sound to objects moving backwards and forwards in all sorts of different timescales, all at once. Granular synthesis was the obvious choice for a plug-in, our criteria being that it should ’sound natural’ and that we wanted to control every parameter in real-time using RTPCs.
“We also needed to run multiple versions simultaneously in any scene, using compressed sound files not PCM to conserve memory. Designing sounds for use in the synth to get the outcome you wanted proved something of an art form - especially considering you’re matching some mad animations, for example a ship crashing into a bridge. Certain ‘shapes’ of sound just don’t work, and you have to be pretty clever with frequency movement-blending to make it sound convincing.
“We call the second plug-in we created Q-analyzer. It analyses audio in real-time and passes a resulting signal out of the audio system for use in the game. Visual manipulation of an audio signal is something our art director Janne Pulkkinen has been working with for a while. From an audio perspective it’s really simple – we add a plug-in to a sound in Wwise, and with a line of script in our game engine we can drive visual effects, for example, drive an animation timeline from a sound’s RMS value. Driving visual effects from audio forms the cornerstone of the game’s Stutters. All the wavy visual distortions you see in the environment are driven by sound effects – and audio and visuals are always in sync, which makes the game feel really holistic and connected.”
Planning the mix and target audio levels was an early consideration, Lapington reveals: “We were bringing in external freelance sound teams early in production, so set guidelines for them for different asset types, such as a background ambience should have an LU value of X and a peak of Y, an explosion should be a louder X and Y etc. Our overall target was -24 LUFS over a half-hour’s game-play – a number that turned out to be really important for ensuring consistency with the live action show.
“To aid our final mixing, we consciously designed the structure of the audio assets in Wwise so we could mix linear-style, scene by scene through the game. Scheduling mixing time at the end of the project was a challenge (and involved a calculated risk we’d actually get the time to get the mix right after the game was fully playable at target frame rate!). We monitored on multiple speaker systems as a team, both on consumer and pro-audio equipment – stereo TV, 5.1, 7.1 etc. We aimed for the best all round mix for the majority of people – we wanted the game to sound great on a 7.1 system – but we still wanted it to feel compelling on a normal TV.”