The famous entrepreneur Ray Kurzweil predicted that by 2029 brains will merge with machines, making people smarter than ever. Even if most of the time we don’t realise it, machines and artificial intelligence (AI) are already extending our capabilities.  Think of the last time you visited a website in a language you can’t speak. I would guess you understood its content anyway, thanks to the decent translation provided by Google.  What about the last time you asked an AI assistant (Siri, Alexa, Cortana etc.) to find information for you?

In this blog post series, I outline how AI can augment human composers.  In particular, I’ll touch on the techniques and the opportunities that AI opens to games composers for adaptive music.  (If you don’t know what adaptive music is, have a look at this post I wrote a few months ago for a brief introduction).  This first post is going to prepare the field, discussing some of the limitations composers face when working with adaptive music.

As creative people, we’re often reluctant to accept the idea that machines can generate things that show any artistic value . There’s a whole field called computational creativity devoted to prove this wrong — and I proudly contribute to this lively research area! But even if the argument that a machine can’t create real art held true, computers would still have an edge over humans. Machines can create almost infinite content with near-zero effort. And that’s very handy for adaptive soundtracks.

The issue is that it’s impossible for a human composer to create a score that changes dynamically to address all the possible decisions and behaviours of a player in a game.  To put it bluntly, it’s too much of  a request! Sure, a composer can use traditional adaptive techniques like vertical remixing and  horizontal re-sequencing.  To me, these are musical shortcuts that have been developed to overcome composers’ limitations. What kind of limitations? Limitations in creating effective transitions between cues, limitations in the level of adaptation a composer can put into a soundtrack and limitations in the musical variation a composer can reasonably hope to create.

Let’s start with music transitions. In adaptive music, transitions are used to bridge two musical cues. Let’s assume we need to shift from a calm village cue to a tense fight cue, when the player bumps into some enemies.  As good musicians, (most of the time) we’d like for the transition to be as smooth and as musically meaningful as possible. This means that the melodic, harmonic and instrumental content of the village cue should (slowly?) morph into that of the fight cue. The main problem here is that one can move from any moment in the village cue to the beginning of the fight cue and this explodes the possibilities for the music. To achieve a smooth musical link, you ideally need loads of different transitions depending on where you’re at in the village cue.

An example can help us understand how this can affect the compositional workload of a composer. Let’s say the village cue is 120” long at 60bpm.  If we want to shift from the village cue to the fight cue at any beat,  the composer has to create 120 transitions.  Of course, they could prune some of these by writing the music in such a way that it would make sense to use the same transitions at different points in the cue. But this would drastically constrain the  musical possibilities offered to the composer. The main point is that transitions should be granular enough to achieve  natural-sounding musical bridges between cues. Now let’s imagine we have 3 cues in a game level, say village, fight and palace cues; and that we can freely move from any cue to any other. Let’s assume the fight and palace cues have the same length and tempo of the village cue. The composer is in big trouble! They now have to compose 720 transitions. To calculate the total number of transitions for a soundtrack we need to know how many levels are in a game. If we have 5 levels — with 3 cues each, with same length, tempo used so far — we need 3600 transitions. Writing thousands of transitions isn’t feasible for a composer.

Another major issue is that the level of adaptation in human-composed soundtracks is limited. Composers can pass from one cue to another depending on some logic (horizontal re-sequencing). For example, if the character is attacked by enemies the music moves from the village cue to the fight cue. Composers can also add multiple instrumental layers depending on some game parameters (vertical remixing), like the character’s health. For instance, if a character is healthy the music can consist of  a very light guitar arpeggio only. If the character is wounded, then some percussion instruments kick in to raise the musical tension. Finally, if the character is in a near-death state orchestral hits with strings and brass instruments can be added on top of guitar and percussion to make the music even more dramatic.

The problem with these techniques is that this type of adaptation is very high level and it considers only a few musical parameters. What adaptive music should do is to follow/score the action second by second. This should involve changes in the harmony,  melody and instrumentation at a very low musical level, say beats and notes. We’re  used to this in films, where the music depicts the emotional setting of a scene on a very granular level. You may think for movies the task is easier, because a film is linear and the composer already knows what’s going to happen next with 100% accuracy. You’d be right! But this shouldn’t stop composers to aspire to the same level of granularity in the mapping between music, visuals and storyline in non-linear experiences, like video games.

I’ll try to clarify how deep adaptation in the music should look with a couple of examples.  In a game, if a character takes a hit by an enemy, the next chord could be highly dissonant to highlight the distressful event.  If a character, who has their own theme, randomly encounters two enemies with their specific themes, the music should dynamically blend these themes together in a sort of Wagnerian fashion. And that should work for all the possible combinations of characters and enemies with assigned themes. As you can easily get from these examples, such a deep level of adaptation is very difficult to obtain in human-composed scores, simply because there’s no way a  composer can create music that’s able to cope with all of the possible in-game situations.

This brings us to the third issue. Composers can’t provide the necessary amount of musical variation needed to keep the player engaged in a vast interactive experience. Think of the Lord of the Rings Online, where players can spend hundreds of hours playing. In these massive games, the soundtrack, which at most is a few hours long, gets repetitive after a while. The music, rather than reinforcing the game, becomes annoying. The player may even turn it off. What the player experiences is called listener fatigue. This concept isn’t new to video games. In the 1980s, game soundtracks usually consisted of a few cues that were looped endlessly during the gameplay. Because of these insistent musical repetitions, listener fatigue was common among players.

In the Lord of the Rings Online players can spend hundreds of hours playing, but the amount of original musical content is quite limited.

The ideal situation for an adaptive soundtrack is to continuously vary over time the musical materials of a cue to keep it fresh, but memorable at the same time. This means that micro-variation in musical parameters like note duration, pitch, chords and instrumentation should be employed all the time. Again, it’s impossible for a composer to create infinite variations of the cues they create. The methods  which use randomised playback of musical phrases and events only mitigate the issue. Music variation should be continuous, in order to avoid listener fatigue.

All of the limitations outlined so far imply that, as of now, composers cannot really create deeply-adaptive  soundtracks. What do I mean by deeply-adaptive soundtrack?  It’s  a score that can respond dynamically to all possible events and emotional cues in game, and which is perpetually varied to keep its interest alive to the player. So far, all of the dynamic soundtracks released on the video game market are only shallowly-adaptive. So, how can composers make the leap and get to deeply-adaptive scores? They have to embrace AI. As Ray Kurzweil would put it, they must blend with the machine.

Composers shouldn’t think of AI as a threat to their jobs, rather as an opportunity to augment their compositional skills. The composer will always have the upper-hand, retaining control over the machine and directing it to augment her musical materials.  Think of AI as a technological innovation similar to that of the piano in the early 18th century. When the piano came out, it opened up a series of compositional/performance possibilities (e.g., expressive dynamics) that were unthinkable on previous keyboard instruments.

Several game composers I spoke to welcome the advent of AI in music . When I interviewed Guy Whitmore, for example, he said that AI is going to have a huge impact on the compositional process. For the better! Regardless of our personal convictions, it’s likely that AI is going to change how we compose/experience music, as the growing number of startups focused on generative music seems to indicate. In any case, composers should be prepared for the eventuality and be ready to embrace the change!

In the next posts of this series, I’ll give practical examples of how AI can help composers to create deeply-adaptive soundtracks. Stay tuned!

Do you think AI can be used to augment composers?  Is AI going to change music  composition?