Featured Article

AI music generators could be a boon for artists — but also problematic

Stability AI, the company behind Stable Diffusion, is tackling music

Comment

Dance Diffusion robot
Image Credits: DALL-E 2 / OpenAI

It was only five years ago that electronic punk band YACHT entered the recording studio with a daunting task: They would train an AI on 14 years of their music, then synthesize the results into the album “Chain Tripping.”

“I’m not interested in being a reactionary,” YACHT member and tech writer Claire L. Evans said in a documentary about the album. “I don’t want to return to my roots and play acoustic guitar because I’m so freaked out about the coming robot apocalypse, but I also don’t want to jump into the trenches and welcome our new robot overlords either.”

But our new robot overlords are making a whole lot of progress in the space of AI music generation. Even though the Grammy-nominated “Chain Tripping” was released in 2019, the technology behind it is already becoming outdated. Now, the startup behind the open source AI image generator Stable Diffusion is pushing us forward again with its next act: making music.

Creating harmony

Harmonai is an organization with financial backing from Stability AI, the London-based startup behind Stable Diffusion. In late September, Harmonai released Dance Diffusion, an algorithm and set of tools that can generate clips of music by training on hundreds of hours of existing songs.

“I started my work on audio diffusion around the same time as I started working with Stability AI,” Zach Evans, who heads development of Dance Diffusion, told TechCrunch in an email interview. “I was brought on to the company due to my development work with [the image-generating algorithm] Disco Diffusion and I quickly decided to pivot to audio research. To facilitate my own learning and research, and make a community that focuses on audio AI, I started Harmonai.”

Dance Diffusion remains in the testing stages — at present, the system can only generate clips a few seconds long. But the early results provide a tantalizing glimpse at what could be the future of music creation, while at the same time raising questions about the potential impact on artists.

Dance Diffusion art
Image Credits: DALL-E 2/OpenAI

The emergence of Dance Diffusion comes several years after OpenAI, the San Francisco-based lab behind DALL-E 2, detailed its grand experiment with music generation, dubbed Jukebox. Given a genre, artist and a snippet of lyrics, Jukebox could generate relatively coherent music complete with vocals. But the songs Jukebox produced lacked larger musical structures like choruses that repeat and often contained nonsense lyrics.

Google’s AudioLM, detailed for the first time earlier this week, shows more promise, with an uncanny ability to generate piano music given a short snippet of playing. But it hasn’t been open sourced.

Dance Diffusion aims to overcome the limitations of previous open source tools by borrowing technology from image generators such as Stable Diffusion. The system is what’s known as a diffusion model, which generates new data (e.g., songs) by learning how to destroy and recover many existing samples of data. As it’s fed the existing samples — say, the entire Smashing Pumpkins discography — the model gets better at recovering all the data it had previously destroyed to create new works.

Kyle Worrall, a Ph.D. student at the University of York in the U.K. studying the musical applications of machine learning, explained the nuances of diffusion systems in an interview with TechCrunch:

“In the training of a diffusion model, training data such as the MAESTRO data set of piano performances is ‘destroyed’ and ‘recovered,’ and the model improves at performing these tasks as it works its way through the training data,” he said via email. “Eventually the trained model can take noise and turn that into music similar to the training data (i.e., piano performances in MAESTRO’s case). Users can then use the trained model to do one of three tasks: Generate new audio, regenerate existing audio that the user chooses or interpolate between two input tracks.”

It’s not the most intuitive idea. But as DALL-E 2, Stable Diffusion and other such systems have shown, the results can be remarkably realistic.

For example, check out this Disco Diffusion model fine-tuned on Daft Punk music:

Or this style transfer of the Pirates of the Caribbean theme to flute:

Or this style transfer of Smash Mouth vocals to the Tetris theme (yes, really):

Or these models, which were fine-tuned on copyright-free dance music:

Artist perspective

Jona Bechtolt of YACHT was impressed by what Dance Diffusion can create.

“Our initial reaction was like, ‘Okay, this is a leap forward from where we were at before with raw audio,’” Bechtolt told TechCrunch.

Unlike popular image-generating systems, Dance Diffusion is somewhat limited in what it can create — at least for the time being. While it can be fine-tuned on a particular artist, genre or even instrument, the system isn’t as general as Jukebox. The handful of Dance Diffusion models available — a hodgepodge from Harmonai and early adopters on the official Discord server, including models fine-tuned with clips from Billy Joel, The Beatles, Daft Punk and musician Jonathan Mann’s Song A Day project — stay within their respective lanes. That is to say, the Jonathan Mann model always generates songs in Mann’s musical style.

And Dance Diffusion-generated music won’t fool anyone today. While the system can “style transfer” songs by applying the style of one artist to a song by another, essentially creating covers, it can’t generate clips longer than a few seconds in length and lyrics that aren’t gibberish (see the below clip). That’s the result of technical hurdles Harmonai has yet to overcome, says Nicolas Martel, a self-taught game developer and member of the Harmonai Discord.

“The model is only trained on short 1.5-second samples at a time so it can’t learn or reason about long-term structure,” Martel told TechCrunch. “The authors seem to be saying this isn’t a problem, but in my experience — and logically anyway — that hasn’t been very true.”

YACHT’s Evans and Bechtolt are concerned about the ethical implications of AI — they are working artists, after all — but they observe that these “style transfers” are already part of the natural creative process.

Dance Diffusion art
Image Credits: DALL-E 2 / OpenAI

“That’s something that artists are already doing in the studio in a much more informal and sloppy way,” Evans said. “You sit down to write a song and you’re like, I want a Fall bass line and a B-52’s melody, and I want it to sound like it came from London in 1977.”

But Evans isn’t interested in writing the dark, post-punk rendition of “Love Shack.” Rather, she thinks that interesting music comes from experimentation in the studio — even if you take inspiration from the B-52’s, your final product may not bear the signs of those influences.

“In trying to achieve that, you fail,” Evans told TechCrunch. “One of the things that attracted us to machine learning tools and AI art was the ways in which it was failing, because these models aren’t perfect. They’re just guessing at what we want.”

Evans describes artists as “the ultimate beta testers,” using tools outside of the ways in which they were intended to create something new.

“Oftentimes, the output can be really weird and damaged and upsetting, or it can sound really strange and novel, and that failure is delightful,” Evans said.

Ethical consequences

Assuming Dance Diffusion one day reaches the point where it can generate coherent whole songs, it seems inevitable that major ethical and legal issues will come to the fore. They already have, albeit around simpler AI systems. In 2020, Jay-Z ‘s record label filed copyright strikes against a YouTube channel, Vocal Synthesis, for using AI to create Jay-Z covers of songs like Billy Joel’s “We Didn’t Start the Fire.” After initially removing the videos, YouTube reinstated them, finding the takedown requests were “incomplete.” But deepfaked music still stands on murky legal ground.

Perhaps anticipating legal challenges, OpenAI for its part open sourced Jukebox under a non-commercial license, prohibiting users from selling any music created with the system.

“There is little work into establishing how original the output of generative algorithms are, so the use of generative music in advertisements and other projects still runs the risk of accidentally infringing on copyright and as such damaging the property,” Worrall said. “This area needs to be further researched.”

An academic paper authored by Eric Sunray, now a legal intern at the Music Publishers Association, argues that AI music generators like Dance Diffusion violate music copyright by creating “tapestries of coherent audio from the works they ingest in training, thereby infringing the United States Copyright Act’s reproduction right.” Following the release of Jukebox, critics have also questioned whether training AI models on copyrighted musical material constitutes fair use. Similar concerns have been raised around the training data used in image-, code- and text-generating AI systems, which is often scraped from the web without creators’ knowledge.

Technologists like Mat Dryhurst and Holly Herndon founded Spawning AI, a set of AI tools built for artists, by artists. One of their projects, “Have I Been Trained,” allows users to search for their artwork and see if it has been incorporated into an AI training set without their consent.

“We are showing people what exists within popular datasets used to train AI image systems and are initially offering them tools to opt out or opt in to training,” Herndon told TechCrunch via email. “We are also talking to many of the biggest research organizations to convince them that consensual data is beneficial for everyone.”

Dance Diffusion art
Image Credits: DALL-E 2/OpenAI

But these standards are — and will likely remain — voluntary. Harmonai hasn’t said whether it’ll adopt them.

“To be clear, Dance Diffusion is not a product and it is currently only research,” said Zach Evans of Stability AI. “All of the models that are officially being released as part of Dance Diffusion are trained on public domain data, Creative Commons-licensed data and data contributed by artists in the community. The method here is opt-in only and we look forward to working with artists to scale up our data through further opt-in contributions, and I applaud the work of Holly Herndon and Mat Dryhurst and their new Spawning organization.”

YACHT’s Evans and Bechtolt see parallels between the emergence of AI generated art and other new technologies.

“It’s especially frustrating when we see the same patterns play out across all disciplines,” Evans told TechCrunch. “We’ve seen the way that people being lazy about security and privacy on social media can lead to harassment. When tools and platforms are designed by people who aren’t thinking about the long-term consequences and social effects of their work like that, things happen.”

Jonathan Mann — the same Mann whose music was used to train one of the early Dance Diffusion models — told TechCrunch that he has mixed feelings about generative AI systems. While he believes that Harmonai has been “thoughtful” about the data they’re using for training, others like OpenAI have not been as conscientious.

“Jukebox was trained on thousands of artists without their permission — it’s staggering,” Mann said. “It feels weird to use Jukebox knowing that a lot of folks’ music was used without their permission. We are in uncharted territory.”

From a user perspective, Waxy’s Andy Baio speculates in a blog post that new music generated by an AI system would be considered a derivative work, in which case only the original elements would be protected by copyright. Of course, it’s unclear what might be considered “original” in such music. Using this music commercially is to enter uncharted waters. It’s a simpler matter if generated music is used for purposes protected under fair use, like parody and commentary, but Baio expects that courts would have to make case-by-base judgments.

Dance Diffusion art
Image Credits: DALL-E 2/OpenAI

According to Herndon, copyright law is not structured to adequately regulate AI art-making. Evans also points out that the music industry has been historically more litigious than the visual art world, which is perhaps why Dance Diffusion was explicitly trained on a dataset of copyright-free or voluntarily submitted material, while DALL-E mini will easily spit out a Pikachu if you input the term “Pokémon.”

“I have no illusion that that’s because they thought that was the best thing to do ethically,” Evans said. “It’s because copyright law in music is very strict and more aggressively enforced.”

Creative potential

Gordon Tuomikoski, an arts major at the University of Nebraska-Lincoln who moderates the official Stable Diffusion Discord community, believes that Dance Diffusion has immense artistic potential. He notes that some members of the Harmonai server have created models trained on dubstep “webs,” kicks and snare drums and backup vocals, which they’ve strung together into original songs.

“As a musician, I definitely see myself using something like Dance Diffusion for samples and loops,” Tuomikoski told TechCrunch via email.

Martel sees Dance Diffusion one day replacing VSTs, the digital standard used to connect synthesizers and effect plugins with recording systems and audio editing software. For example, he says, a model trained on ’70s jazz rock and Canterbury music will intelligently introduce new “textures” in the drums, like subtle drum rolls and “ghost notes,” in the same way that artists like John Marshall might — but without the manual engineering work normally required.

Take this Dance Diffusion model of Senegalese drumming, for instance:

And this model of snares:

And this model of a male choir singing in the key of D across three octaves:

And this model of Mann’s songs fine-tuned with royalty-free dance music:

“Normally, you’d have to lay down notes in a MIDI file and sound-design really hard. Achieving a humanized sound this way is not only very time-consuming but requires a deeply intimate understanding of the instrument you’re sound designing,” Martel said. “With Dance Diffusion, I look forward to feeding the finest ’70s prog rock into AI, an infinite unending orchestra of virtuoso musicians playing Pink Floyd, Soft Machine and Genesis, trillions of new albums in different styles, remixed in new ways by injecting some Aphex Twin and Vaporwave, all performing at the peak of human creativity — all in collaboration with your own personal tastes.”

Mann has greater ambitions. He’s currently using a combination of Jukebox and Dance Diffusion to play around with music generation and plans to release a tool that’ll allow others to do the same. But he hopes to one day use Dance Diffusion — possibly in conjunction with other systems — to create a “digital version” of himself capable of continuing the Song A Day project after he passes away.

“The exact form it’ll take hasn’t quite become clear yet … [but] thanks to folks at Harmonai and some others I’ve met in the Jukebox Discord, over the last few months I feel like we’ve made bigger strides than any time in the last four years,” Mann said. “I have over 5,000 Song A Day songs, complete with their lyrics as well as rich metadata, with attributes ranging from mood, genre, tempo, key, all the way to location and beard (whether or not I had a beard when I wrote the song). My hope is that given all this data, we can create a model that can reliably create new songs as if I had written them myself. A Song A Day, but forever.”

If AI can successfully make new music, where does that leave musicians?

YACHT’s Evans and Bechtolt point out that new technology has upended the art scene before, and the results weren’t as catastrophic as expected. In the 1980s, the U.K. Musicians Union attempted to ban the use of synthesizers, arguing that it would replace musicians and put them out of work.

“With synthesizers, a lot of artists took this new thing and instead of refusing it, they invented techno, hip hop, post punk and new wave music,” Evans said. “It’s just that right now, the upheavals are happening so quickly that we don’t have time to digest and absorb the impact of these tools and make sense of them.”

Still, YACHT worries that AI could eventually challenge work that musicians do in their day jobs, like writing scores for commercials. But like Herndon, they don’t think AI can quite replicate the creative process just yet.

“It is divisive and a fundamental misunderstanding of the function of art to think that AI tools are going to replace the importance of human expression,” Herndon said. “I hope that automated systems will raise important questions about how little we as a society have valued art and journalism on the internet. Rather than speculate about replacement narratives, I prefer to think about this as a fresh opportunity to revalue humans.”

AI is getting better at generating porn. We might not be prepared for the consequences.

More TechCrunch

The AI industry moves faster than the rest of the technology sector, which means it outpaces the federal government by several orders of magnitude.

Senate study proposes ‘at least’ $32B yearly for AI programs

The FBI along with a coalition of international law enforcement agencies seized the notorious cybercrime forum BreachForums on Wednesday.  For years, BreachForums has been a popular English-language forum for hackers…

FBI seizes hacking forum BreachForums — again

The announcement signifies a significant shake-up in the streaming giant’s advertising approach.

Netflix to take on Google and Amazon by building its own ad server

It’s tough to say that a $100 billion business finds itself at a critical juncture, but that’s the case with Amazon Web Services, the cloud arm of Amazon, and the…

Matt Garman taking over as CEO with AWS at crossroads

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show…

Google still hasn’t fixed Gemini’s biased image generator

A feature Google demoed at its I/O confab yesterday, using its generative AI technology to scan voice calls in real time for conversational patterns associated with financial scams, has sent…

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

Google’s going all in on AI — and it wants you to know it. During the company’s keynote at its I/O developer conference on Tuesday, Google mentioned “AI” more than…

The top AI announcements from Google I/O

Uber is taking a shuttle product it developed for commuters in India and Egypt and converting it for an American audience. The ride-hail and delivery giant announced Wednesday at its…

Uber has a new way to solve the concert traffic problem

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

Google is preparing to launch a new system to help address the problem of malware on Android. Its new live threat detection service leverages Google Play Protect’s on-device AI to…

Google takes aim at Android malware with an AI-powered live threat detection service

Users will be able to access the AR content by first searching for a location in Google Maps.

Google Maps is getting geospatial AR content later this year

The heat pump startup unveiled its first products and revealed details about performance, pricing and availability.

Quilt heat pump sports sleek design from veterans of Apple, Tesla and Nest

The space is available from the launcher and can be locked as a second layer of authentication.

Google’s new Private Space feature is like Incognito Mode for Android

Gemini, the company’s family of generative AI models, will enhance the smart TV operating system so it can generate descriptions for movies and TV shows.

Google TV to launch AI-generated movie descriptions

When triggered, the AI-powered feature will automatically lock the device down.

Android’s new Theft Detection Lock helps deter smartphone snatch and grabs

The company said it is increasing the on-device capability of its Google Play Protect system to detect fraudulent apps trying to breach sensitive permissions.

Google adds live threat detection and screen-sharing protection to Android

This latest release, one of many announcements from the Google I/O 2024 developer conference, focuses on improved battery life and other performance improvements, like more efficient workout tracking.

Wear OS 5 hits developer preview, offering better battery life

For years, Sammy Faycurry has been hearing from his registered dietitian (RD) mom and sister about how poorly many Americans eat and their struggles with delivering nutritional counseling. Although nearly…

Dietitian startup Fay has been booming from Ozempic patients and emerges from stealth with $25M from General Catalyst, Forerunner

Apple is bringing new accessibility features to iPads and iPhones, designed to cater to a diverse range of user needs.

Apple announces new accessibility features for iPhone and iPad users

TechCrunch Disrupt, our flagship startup event held annually in San Francisco, is back on October 28-30 — and you can expect a bustling crowd of thousands of startup enthusiasts. Exciting…

Startup Blueprint: TC Disrupt 2024 Builders Stage agenda sneak peek!

Mike Krieger, one of the co-founders of Instagram and, more recently, the co-founder of personalized news app Artifact (which TechCrunch corporate parent Yahoo recently acquired), is joining Anthropic as the…

Anthropic hires Instagram co-founder as head of product

Seven orgs so far have signed on to standardize the way data is collected and shared.

Venture orgs form alliance to standardize data collection

As cloud adoption continues to surge toward the $1 trillion mark in annual spend, we’re seeing a wave of enterprise startups gaining traction with customers and investors for tools to…

Alkira connects with $100M for a solution that connects your clouds

Charging has long been the Achilles’ heel of electric vehicles. One startup thinks it has a better way for apartment dwelling EV drivers to charge overnight.

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

So did investors laugh them out of the room when they explained how they wanted to replace Quickbooks? Kind of.

Embedded accounting startup Layer secures $2.3M toward goal of replacing QuickBooks

While an increasing number of companies are investing in AI, many are struggling to get AI-powered projects into production — much less delivering meaningful ROI. The challenges are many. But…

Weka raises $140M as the AI boom bolsters data platforms

PayHOA, a previously bootstrapped Kentucky-based startup that offers software for self-managed homeowner associations (HOAs), is an example of how real-world problems can translate into opportunity. It just raised a $27.5…

Meet PayHOA, a profitable and once-bootstrapped SaaS startup that just landed a $27.5M Series A

Restaurant365, which offers a restaurant management suite, has raised a hot $175M from ICONIQ Growth, KKR and L Catterton.

Restaurant365 orders in $175M at $1B+ valuation to supersize its food service software stack 

Venture firm Shilling has launched a €50M fund to support growth-stage startups in its own portfolio and to invest in startups everywhere else. 

Portuguese VC firm Shilling launches €50M opportunity fund to back growth-stage startups

Chang She, previously the VP of engineering at Tubi and a Cloudera veteran, has years of experience building data tooling and infrastructure. But when She began working in the AI…

LanceDB, which counts Midjourney as a customer, is building databases for multimodal AI