Using Generative AI to Create a Hit Record?
Introduction
I spend much of my day listening to music, but I am far from having any talent when it comes to creating music. I’ve been fascinated by the new advancements in generative AI, and have worked on many projects (either at work or for fun) with large language models (LLMs). By this point, many of us have used or heard of ChatGPT. Maybe you’ve have the opportunity of working with GPT-3.5, GPT-4, or any of the other LLMs. Hoping that I could find a model to help me become the next big music producer, I stumbled upon Meta’s (or Facebook’s) new model called MusicGen. In this post, we’ll explore MusicGen to see if I can actually create a hit single with it.
So… How does the model work? The model (or family of models) has a few different configurations that allow for us, as musicians (or aspiring musicians), to generate new music. The MusicGen models are broken up into melody models and non-melody models. The main difference between the two sets of models is melody models accept a melody as input into the model (in the format of a .wav file). Both model types accept descriptions as input (think of a description as a prompt if you are familiar with language models). Descriptions change the generated music by conditioning the output on the input description.
Setup
Everything from this post was generated by this notebook. If you want to try out using generative AI to create music, feel free to give the notebook a shot. There are some extra features in the notebook that are not described in this post too, so feel free to experiment and share your findings!
Results
Let’s not beat around the bush any longer and see if I can quit my day job. I tried the MusicGen model conditioned on a few melodies (from samplefocus.com). I first wanted to explore the model by creating an EDM track from a simple melody to see how well it can do with transforming simple melodies, and honestly, the results weren’t that bad. I still think I need to keep my job, but overall, it’s better than what I could create on my own. We’ll also see how the model performs when given model complex melodies.
Acoustic Melody
The first track has a very simple melody to hopefully allow more creativity by the model (conditioned on the description/prompt).
After applying some prompt engineering magic to the system by adding the following description, “Create a deep house edm track with deep bass and tech house snares.” The model generated the following track.
Underwater Key Melody
The second track had a melody described as underwater key melody. I’ll be honest, I don’t know what that means, but the general melody resonated with me.
After applying a more complex prompt, “Create an edm song that invokes a sense of freedom. Makes the listener feel like they are transported to the ocean overlooking the waves crash onto the shore. The track should leverage heavy bass and light synths.” The final output is a little hard on the ears.
Conclusion
Progress in the generative AI space never ceases to amaze me, and this new model, MusicGen, is no exception. Generating music is no small task, and to create sounds that actually sound like music is very impressive. I think we are a bit farther away from having AI produce music solely on its’ own, but this model is a great step to garner more excitement in the generative AI art/music space.
Appendix
- Code: https://colab.research.google.com/drive/1FML7xUwQhd0P8DluUfgneXK-h_OfVZJ0?usp=sharing
- MusicGen website: https://musicgen.com
- MusicGen paper: https://arxiv.org/abs/2306.05284