At Melodrive we are constantly trying to push the limits of the sonic and musical experience in modern games and particularly in the next generation of VR immersion. But game audio has come a long way since the early frontier days of tapes and cartridges, and in this post we take some time to look back at the history of machine-assisted, machine-generated or procedural music, highlighting its many challenges and innovations with respect to some key examples.
Procedural Content Generation
In general terms, procedural generation refers to any aspect of game development that is deferred to computer algorithms rather than manual creation by a game developer or designer. For instance, in a space simulator one might decide write code that generates a huge galaxy automatically with some element of randomness, instead of exhaustively determining what each planet or system looks like as well as its physics and attributes. In fact, this is exactly how the ground-breaking Elite series managed to create its rich, sprawling planetary systems for player exploration. In these early games processing power and memory were at a high premium, and procedural generation enabled huge experiences to spawn from a single floppy disk.
Nowadays, as game designers, we are less limited by computing resources than by our imaginations, so procedural generation is mostly used to create unique, dynamic and individual experiences for the users. Maxis’ Spore takes inspiration from DNA sequencing and fractal theory to create exotic creatures on the fly. Minecraft takes Perlin noise formulae and to generate vast swathes of landscape and terrain elements with complex flora and fauna.
Procedural Generation of Music
Procedural music – you will also hear this term in conjunction with non-linear, dynamic, interactive or adaptive music – variously refers to programmed music within a game that can change or respond to different states or events at varying degrees, usually in realtime. In games we want the game audio experience to be non-linear as opposed to say, a film, which is a decidedly linear experience. A player is probably going to spend a lot more time in a game environment, possibly retracing and replaying specific stages multiple or consecutive times. A linear soundtrack that never changes over time can easily become boring, or even worse irritating and distracting, for the player.
Karen Collins distinguishes between interactive audio and adaptive audio. Interactive audio she defines as being directly influenced by the players’ actions or input in game (for example Super Mario hitting a question mark block to reveal a coin accompanied by that instantly recognised sound effect). She differentiates this from adaptive audio that is not directly influenced by the player’s actions, rather more complex states and events such as the environmental time, location or other seemingly unknown factors that are not always transparent to the player.
Of course in Melodrive we describe our approach as deep adaptive music, to reflect how our system can dynamically react to deeper multi-dimensional states that encompass the very human facets of emotion and arousal. You can read and explore more about our ongoing work and research related to that on this very blog or by signing up for our free whitepaper.
As a final aside, in wider academic and musical contexts, procedurally adaptive music is inextricably related to algorithmic or generative music that also use a formalised set of rules or procedures, to relinquish musical and composition decision making to some machine. Many techniques exist for generating music automatically, often drawing inspiration and referring back to natural or biological phenomena. For example Al Biles borrows from natural selection to trade jazz fours using genetic algorithms, and Google recently made headlines with a deep learning system that simulates neural processes to spit out piano phrases!
The biggest challenge with developing games is having to compromise your great ideas with the limitations of the platform you are developing for at the time. While early video game sound and music may sound crude and primitive to our sophisticated ears of today (although chiptune culture revels in its retro appeal), a lot of groundbreaking innovations arose out of this era that have proven to be a profound influence on subsequent generations of game audio methods.
Any glimpse into the past regarding procedural music invariably begins with LucasArts, the video gaming arm of George Lucas’ interactive media empire that also encompasses Skywalker Sound and Industrial Light and Magic. In the early 90s the studio released a string of point and click adventure games with rich storytelling, fiendish puzzles and unique humour. Building on prior work done by Peter Langston on the BallBlazer games in the 80s, composers Michael Land and Peter McConnell developed the iMUSE system that enabled them to smoothly transition through variations on themes depending on different game scenes. You can hear iMUSE across many titles such as X-Wing, Sam & Max and Grim Fandango, but perhaps most memorably on the Monkey Island series, where it served up endless pastiches of Carribean easy listening to accompany the adventures of hapless pirate Guybrush Threepwood.
Horizontal and Vertical Stem Mixing
Part of the reason why early games sound dated today is that the audio was synthesised artificially using the hardware of the time rather than played from existing recorded or sampled sound. With the proliferation of digital audio, the compact disc and MPEG compression, “real” audio soon became available for consumer PCs and consoles. I can still remember to this day my amazement hearing for the first time on my PlayStation the big beat sounds of actual Leftfield and Chemical Brothers tracks in the high octane racetracks of Wipeout or Public Enemy as I jumped 360s in Tony Hawk’s Pro Skater. But after a while hearing the same tracks umpteen times grew tiresome, and I soon availed of the trick of swapping out the disc for another in order to something different.
Game developers, after the novelty wore off, quickly realised that fixed, linear tracks of existing music were not flexible or interactive enough for game purposes. To remedy this, they quickly worked with their composers and sound people to work with stems of music rather than final mixes.
A stem is simply audio parlance for a group of one or more tracks that contributes to a final mix or track. So the drums could be considered one stem, but so too could all the guitars, all the keyboards or all the guitars and keyboards together. The exact contents of a stem is up to those using them. Stems can be combined in many different ways to come up with endless arrangements in a process known as vertical layering, since we are stacking different stems on top of each other to be heard at the same time.
Stems can also be repeated (in small loops or sections) or changed and branched sequentially in various combinations, and this is what is known as horizontal mixing. By combining horizontal and vertical layering we can begin to envisage building an extended, dynamic arrangement of ever changing music from a very small set of elements. This is exactly how they do it, for example, in the series of car racing games Forza Motorsports (full disclosure: I worked on several of these titles). Audio Lead Mike Caviezel says:
The last couple of Forza titles, 3, 4, 5 were very EDM-friendly. Electronic music lends itself well to being remixed. So if you’re an electronic artist who wants to pitch your music to a game, make sure that you have your individual stems available, so that if we ever would want to create alternate mixes or pull out the drums in the middle of a racing game, we can do that on the fly. That’s very important for us.
Composer Lance Hayes has a good video of how this works in practice:
But it need not be just for EDM! One of the most compelling applications of stem mixing in recent years has been Rockstar’s Red Dead Redemption, and documented beautifully in the video below. Gathering a host of legendary musicians and a wealth of authentic instruments they perfectly recreate the sepia-tinged deserts of Leone and Morricone’s Wild West fantasy. But, if you watch the video you will notice that they constrain everything to one single key: Am.
Modern audio middleware systems such as FMOD and Wwise are there to serve as a bridge between developers and audio people. One of their unique strengths is being able to facilitate exactly the kind of non-linear and responsive musical demands of interactive games, features that are inherently lacking in traditional timeline focussed environments such as Pro Tools or Logic.
Pure Procedural Music
Stem mixing and layering of audio assets certainly increases engagement and variety in the game music experience, but can still fall short in comparison to ‘purer’ procedural audio systems that can link up all parameters of the music generation process to in-game events. Game audio expert and researcher Andy Farnell has been espousing for many years a revival of early “embedded” synthesis techniques controlled procedurally. He argues that with its finely grained control over all parameters of a sound, it can potentially offer the richest integration with a complex game world.
There are some examples of modern games that use procedural content generation to also completely create the music composition. Demoscene shooter ..krieger harnesses PCG in all aspects of the game, and the soundtrack generates streams of MIDI data for its own V2 synthesiser. Rez Infinite quantises note events to ensure that the EDM heavy music stays in sync with the player’s actions. Spore teamed up with Brian Eno and used a heavily customised version of the interactive media software Pure Data to create its immersive ambient musical soundscape. Finally, No Man’s Sky’s composer Paul Weir joined forces with UK post-rock outfit 65 Days of Static, who used their prior knowledge of live coding and modular synthesis to create a truly unique music experience.
The Best of Both Worlds
To build truly rich and compelling soundtracks for modern games and interactive media, we need to have interactive, ever-changing experiences that react to myriad in-game minutiae and player progress. They also need to draw from a wealth of musical knowledge and a variety of sources that comprise high quality sampled and synthesised sound.
At Melodrive we’re hard at work offering a marriage of both worlds: combining sophisticated algorithmic composition and parametric synthesis of new sounds along with complex samples of instruments for the real world processed with state of the art DSP effects. Not only that, hundreds of gamers have told us that they’d love to create their own music in VR and games, if they were given the means to do it. We’re excited to share more about that with you in our next post…