In the first part of this blog series, I talked about the streamer subculture and significant figures that have risen to notoriety. I looked at the common genres of music used, and discussed some key issues that have started to be connected to the niche activity. This blog will look more deeply into one of those problems, the use of automatic filtering, which is a game-changer for many users. I want to explore the creative, economical, and philosophical implications of this technology in more detail, and suggest possible solutions.

Copyright Issues

As noted in the first part of the blog series, streamers are going to have great difficulty using musical content over coming years, owing to copyright restrictions. Some of the songs used by Twitchers have YouTube view counts in the hundreds of millions, so could potentially make a huge profit for their owners.

It’s easy to see why streamers are starting to run into trouble — owners naturally want revenue for their work. But many streamers seem not to understand copyright laws, which prompted a senior manager at Twitch to Tweet the following:

Some big and small streamers have been given 24-hour suspensions because of using copyrighted music on Twitch, such as Jay Won (read about this here). The problem is likely to grow in coming years, when the music industry becomes wiser to the issues of using copyrighted music in streaming — not just in VODs, but live streams.


Automatic Filtering

Indeed, more and more streamers with VODs and live streams that feature copyrighted material are being temporarily banned on Twitch. The way they are detected is fascinating, and an important development in the history of music AI.

Twitch and YouTube use an automated audio filtering system that uses acoustic fingerprinting, a landmark approach to dealing with copyright, developed by Audible Magic. Audible Magic specialises in automated content recognition, deterministically providing a condensed digital summary of an audio signal that is then checked against digital files. The tech for acoustic fingerprinting is known as Content ID. Audio files are reduced to a spectrogram and put into a database, files are checked and matches are flagged, the flagged matches are automatically removed. However, the technology is not yet able to check live streaming. Part of the reason for this is down to current problems in music copyright in general: there are too many new technologies, extensive overlaps in electronic media, and an abundance of mixed media platforms for any notion of intellectual property to be properly serviceable.

The use of Content ID to automatically check and remove content, without human supervision, has resulted in controversy, with users responding that background music in streaming comes under ‘fair use’. While live broadcasts are not subjected to the automated filters as VODs are, recent reports say that streamers can be ousted from Twitch if they are found live streaming copyrighted material. An online Twitter spat on the use of copyrighted music in streams/VODs testifies to this. Ninety9Lives, a royalty-free provider of music who prompted the Twitter controversy, wrote the following on respecting copyrighted music:

Those that say the removal of videos is unfair, point out that the music is in the background — it’s not the aim of the video to showcase the music — so streamers shouldn’t be penalised. Lots of folks are complaining that their content is being pulled down, on VODs and live streams, when they think they have perfectly legal musical content. Some streamers say that their video content contains only similar material to copyrighted material. However, the bottom line is, if you’re streaming musical content that might be only remotely similar to copyrighted material then you might be in trouble.


A Music Theorist’s Perspective on Automatic Filtering

Now, being an expert in music theory and cognition, and working in AI, I’m sceptical of an AI’s ability to find the essence of a piece of musical content, which is what is required if it is to regulate it. While the technology supposedly detects content with a high tolerance for variation, what counts as musical essence and what counts as acceptable variation seems to be difficult if not impossible to determine. Perhaps the AI’s effectiveness should be taken with a pinch of salt. Without wanting to get too philosophical here, there may not be any such things as essences (see Lyotard or Wittgenstein).


However, I think we should be careful about declaring that there are no essences off-the-bat, because of the simple fact that we human beings do have a knack for identifying specific content. But either way, essences are very difficult to pin down. And even if essences can be pinned down, there is a deep suspicion that only humans can pin them down, through introspection, i.e., machines can’t do it yet. If machines do solve the problem of essences, this would herald a new light on the subject, and show that previous suspicions were ill-placed, and that there was a mechanical, naturalist way to specify essences or content after all.

At present, even the copyright courts have difficulty with the notion of intellectual property. The US Court of Appeals has recently upheld a 2015 verdict which found that Robin Thicke and Pharrell Williams’ 2013 controversial hit, Blurred Lines infringed on the copyright of Marvin Gaye’s 1977 song Got To Give It Up. Thicke and Williams now have to pay five million dollars to the Gaye estate. When it comes to demarcating essence, this is not even a clear case of blurred lines. Many commentators, particularly musicians, are up in arms about the decision. Briefly, the bottom line is that the songs are nothing alike, except with respect to the odd rhythm or use of cowbell, etc. Most musicians unconnected with the case agree that while the songs have a shared style, they do not share content, or essence. They have different melodies, harmonies, and rhythms, which are primary musical features. Case closed. Yet, according to the ‘expert’ witnesses arranged by the Gaye family — payrolled musicologists — the song has been plagiarised.

The importance of introspection is my main issue with Content ID, so I’ll belabor it a bit if I may. Say someone comes through the door, says he has been outside and the people are zombies. Only through introspection can you determine if there’s actually been a zombie apocalypse or if said person is being subtly critical of the people he has in mind. That is, we need to introspect to understand what his beliefs are regarding said people. And we need to be able to consider the non-local context; that is, information that might at first glance seem to have nothing to do with what he’s talking about. All sorts of things could be required to be understood which might seem to be off the menu when trying to work out what he means by zombies. Such as his attitudes towards life, whether this person is a joker, what he knows about current affairs, and his mental state: Is he happy? Is he being ironic? You can potentially add to these infinitely many other things that might be needed by mentality for understanding but which cannot be straightforwardly accounted for given the superficial circumstances.

AI music (and AI in general) has only come so far in approaching issues of intelligence and introspection, bit by bit. It still seems to be some way off solving the pretty hard problem of artificial general intelligence. In terms of Content ID, then, anyone should be rightly sceptical of an algorithm’s ability to find the essence of content in the same way that humans do. Cognition has a way of directly finding the relevance of content that goes beyond the abilities of any current machine.

If your vid gets pulled from YouTube or Twitch, you have to go through the painful process of contesting the DCMA takedown. Anything that is similar to a piece of musical content is at risk. It’s useless to litigate for at present, because the bottom line is, there’s no overall rule yet for defining the similarity of different types of things. This echoes Jerry Fodor’s ideas on the subject in Language of Thought 2 (2008): the ways in which different things are similar are not similar. Or, the ways we classify different things are different depending on the things being classified. Yet another way of saying this is that knowledge has structure. And so the fact that “[a]utomatic filtering systems are unable to tell a transformative mix from a copyright infringement”, as said by Fred von Lohmann from Electronic Frontier Foundation, is a problem that we can’t hope to fix until we work out how we’re really good at working out how things are differently worked out. That means, until there are big improvements in AI.


So, what can streamers do?

The music industry will eventually become savvy to the threat imposed by the streamer subculture, and acoustic fingerprinting will soon become sophisticated enough to counter live streaming. The streamer is in a precarious situation, and should consider alternatives for background music. Now more than ever we need to have better sources of music that address the problems of copyright. In Part I of this series I discussed what the streaming platforms are offering, but in an earlier post, we discovered the following gamer preferences that aren’t covered by these:

  • Unlimited music that is quick to produce
  • Music that is free from copyright
  • Music that is adaptive to mood and the situation of the game
  • More control over musical content

Well, what if we flip the Content ID problem on its head, stop worrying about discovering the essence of music with AI and instead use AI to help produce the music itself? This of course doesn’t solve the problem of existing, infringing content, but makes us produce our own content without any worries.

A musical AI could help tailor-make songs to the requirements of streamers. The resulting music must be appropriately-styled, appropriately mood-ed, and appropriately convenient for their needs . What’s more, music AIs would offer ways for streamers to collaborate with them, regardless of the users’ musical ability, so they can easily produce their own stuff and avoid all copyright issues.

An AI musical companion could provide the perfect background music. It could provide whatever is required to the level required, and be co-created by the streamers themselves. These are the most formidable challenges facing the streaming industry at present and must be addressed at the earliest opportunity.