AI

Try ‘Riffusion,’ an AI model that composes music by visualizing it

Comment

Image Credits: Seth Forsgren / Hayk Martiros

AI-generated music is already an innovative enough concept, but Riffusion takes it to another level with a clever, weird approach that produces weird and compelling music using not audio but images of audio.

Sounds strange, is strange. But if it works, it works. And it does work! Kind of.

Diffusion is a machine learning technique for generating images that supercharged the AI world over the last year. DALL-E 2 and Stable Diffusion are the two most high-profile models that work by gradually replacing visual noise with what the AI thinks a prompt ought to look like.

The method has proved powerful in many contexts and is very susceptible to fine-tuning, where you give the mostly trained model a lot of a specific kind of content in order to have it specialize in producing more examples of that content. For instance, you could fine-tune it on watercolors or on photos of cars, and it would prove more capable in reproducing either of those things.

What Seth Forsgren and Hayk Martiros did for their hobby project Riffusion was fine-tune Stable Diffusion on spectrograms.

“Hayk and I play in a little band together, and we started the project simply because we love music and didn’t know if it would be even possible for Stable Diffusion to create a spectrogram image with enough fidelity to convert into audio,” Forsgren told TechCrunch. “At every step along the way we’ve been more and more impressed by what is possible, and one idea leads to the next.”

What are spectrograms, you ask? They’re visual representations of audio that show the amplitude of different frequencies over time. You have probably seen waveforms, which show volume over time and make audio look like a series of hills and valleys; imagine if instead of just total volume, it showed the volume of each frequency, from the low end to the high end.

Here’s part of one I made of a song (“Marconi’s Radio” by Secret Machines, if you’re wondering):

Image Credits: Devin Coldewey

You can see how it gets louder in all frequencies as the song builds, and you can even spot individual notes and instruments if you know what to look for. The process isn’t inherently perfect or lossless by any means, but it is an accurate, systematic representation of the sound. And you can convert it back to sound by doing the same process in reverse.

Forsgren and Martiros made spectrograms of a bunch of music and tagged the resulting images with the relevant terms, like “blues guitar,” “jazz piano,” “afrobeat,” stuff like that. Feeding the model this collection gave it a good idea of what certain sounds “look like” and how it might re-create or combine them.

Here’s what the diffusion process looks like if you sample it as it’s refining the image:

Image Credits: Seth Forsgren / Hayk Martiros

And indeed the model proved capable of producing spectrograms that, when converted to sound, are a pretty good match for prompts like “funky piano,” “jazzy saxophone,” and so on. Here’s an example:

Image Credits: Seth Forsgren / Hayk Martiros

But of course a square spectrogram (512 x 512 pixels, a standard Stable Diffusion resolution) represents only a short clip; a three-minute song would be a much, much wider rectangle. No one wants to listen to music five seconds at a time, but the limitations of the system they’d created mean they couldn’t just create a spectrogram 512 pixels tall and 10,000 wide.

After trying a few things, they took advantage of the fundamental structure of large models like Stable Diffusion, which have a great deal of “latent space.” This is sort of like the no-man’s-land between more well-defined nodes. Like if you had an area of the model representing cats, and another representing dogs, what’s “between” them is latent space that, if you just told the AI to draw, would be some kind of dogcat, or catdog, even though there’s no such thing.

Incidentally, latent space stuff gets a lot weirder than that:

A terrifying AI-generated woman is lurking in the abyss of latent space

No creepy nightmare worlds for the Riffusion project, though. Instead, they found that if you have two prompts, like “church bells” and “electronic beats,” you can kind of step from one to the other a bit at a time and it gradually and surprisingly naturally fades from one to the other, on the beat even:

It’s a strange, interesting sound, though obviously not particularly complex or high-fidelity; remember, they weren’t even sure that diffusion models could do this at all, so the facility with which this one turns bells into beats or typewriter taps into piano and bass is pretty remarkable.

Producing longer-form clips is possible but still theoretical:

“We haven’t really tried to create a classic 3-minute song with repeating choruses and verses,” Forsgren said. “I think it could be done with some clever tricks such as building a higher level model for song structure, and then using the lower level model for individual clips. Alternatively you could deeply train our model with much larger resolution images of full songs.”

Where does it go from here? Other groups are attempting to create AI-generated music in various ways, from using speech synthesis models to specially trained audio ones like Dance Diffusion.

AI music generators could be a boon for artists — but also problematic

Riffusion is more of a “wow, look at this” demo than any kind of grand plan to reinvent music, and Forsgren said he and Martiros were just happy to see people engaging with their work, having fun and iterating on it:

“There are many directions we could go from here, and we’re excited to keep learning along the way. It’s been fun to see other people already building their own ideas on top of our code this morning, too. One of the amazing things about the Stable Diffusion community is how fast people are to build on top of things in directions that the original authors can’t predict.”

You can test it out in a live demo at Riffusion.com, but you might have to wait a bit for your clip to render — this got a little more attention than the creators were expecting. The code is all available via the about page, so feel free to run your own as well, if you’ve got the chips for it.

More TechCrunch

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

5 hours ago
A comprehensive list of 2024 tech layoffs

Featured Article

What to expect from WWDC 2024: iOS 18, macOS 15 and so much AI

Apple is hoping to make WWDC 2024 memorable as it finally spells out its generative AI plans.

6 hours ago
What to expect from WWDC 2024: iOS 18, macOS 15 and so much AI

We just announced the breakout session winners last week. Now meet the roundtable sessions that really “rounded” out the competition for this year’s Disrupt 2024 audience choice program. With five…

The votes are in: Meet the Disrupt 2024 audience choice roundtable winners

The malicious attack appears to have involved malware transmitted through TikTok’s DMs.

TikTok acknowledges exploit targeting high-profile accounts

It’s unusual for three major AI providers to all be down at the same time, which could signal a broader infrastructure issues or internet-scale problem.

AI apocalypse? ChatGPT, Claude and Perplexity all went down at the same time

Welcome to TechCrunch Fintech! This week, we’re looking at LoanSnap’s woes, Nubank’s and Monzo’s positive milestones, a plethora of fintech fundraises and more! To get a roundup of TechCrunch’s biggest…

A look at LoanSnap’s troubles and which neobanks are having a moment

Databricks, the analytics and AI giant, has acquired data management company Tabular for an undisclosed sum. (CNBC reports that Databricks paid over $1 billion.) According to Tabular co-founder Ryan Blue,…

Databricks acquires Tabular to build a common data lakehouse standard

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved…

ChatGPT: Everything you need to know about the AI-powered chatbot

The next few weeks could be pivotal for Worldcoin, the controversial eyeball-scanning crypto venture co-founded by OpenAI’s Sam Altman, whose operations remain almost entirely shuttered in the European Union following…

Worldcoin faces pivotal EU privacy decision within weeks

OpenAI’s chatbot ChatGPT has been down for several users across the globe for the last few hours.

OpenAI fixes the issue that caused ChatGPT outage for several hours

True Fit, the AI-powered size-and-fit personalization tool, has offered its size recommendation solution to thousands of retailers for nearly 20 years. Now, the company is venturing into the generative AI…

True Fit leverages generative AI to help online shoppers find clothes that fit

Audio streaming service TuneIn is teaming up with Discord to bring free live radio to the platform. This is TuneIn’s first collaboration with a social platform and one that is…

Discord and TuneIn partner to bring live radio to the social platform

The early victors in the AI gold rush are selling the picks and shovels needed to develop and apply artificial intelligence. Just take a look at data-labeling startup Scale AI…

Scale AI founder Alexandr Wang is coming to Disrupt 2024

Try to imagine the number of parts that go into making a rocket engine. Now imagine requesting and comparing quotes for each of those parts, getting approvals to purchase the…

Engineer brothers found Forge to modernize hardware procurement

Raspberry Pi has released a $70 AI extension kit with a neural network inference accelerator that can be used for local inferencing, for the Raspberry Pi 5.

Raspberry Pi partners with Hailo for its AI extension kit

When Stacklet’s founders, Travis Stanfield and Kapil Thangavelu, came out of Capital One in 2020 to launch their startup, most companies weren’t all that concerned with constraining cloud costs. But…

Stacklet sees demand grow as companies take cloud cost control more seriously

Fivetran’s Managed Data Lake Service aims to remove the repetitive work of managing data lakes.

Fivetran launches a managed data lake service

Lance Riedel and Nigel Daley both spent decades in search discovery, but it was while working at Pinterest that they began trying to understand how to use search engines to…

How a couple of former Pinterest search experts caught Biz Stone’s attention

GetWhy helps businesses carry out market studies and extract insights from video-based interviews using AI.

GetWhy, a market research AI platform that extracts insights from video interviews, raises $34.5M

AI-powered virtual physical therapy platform Sword Health has seen its valuation soar 50% to $3 billion.

Sword Health raises $130M and its valuation soars to $3B

Jeffrey Katzenberg and Sujay Jaswa, along with three general partners, manage $1.5 billion in assets today through their Build, Venture and Seed strategies.

WndrCo officially gets into venture capital with fresh $460M across two funds

The startup targets the middle ground between platforms that offer rigid templates, and those that facilitate a full-control approach.

Storyblok raises $80M to add more AI to its ‘headless’ CMS aimed at non-technical people

The startup has been pursuing a ground-up redesign of a well-understood technology.

‘Star Wars’ lasers and waterfalls of molten salt: How Xcimer plans to make fusion power happen

Sēkr, a startup that offers a mobile app for outdoor enthusiasts and campers, is launching a new AI tool for planning road trips. The new tool, called Copilot, is available…

Travel app Sēkr can plan your next road trip with its new AI tool

Microsoft’s education-focused flavor of its cloud productivity suite, Microsoft 365 Education, is facing investigation in the European Union. Privacy rights nonprofit noyb has just lodged two complaints with Austria’s data…

Microsoft hit with EU privacy complaints over schools’ use of 365 Education suite

Since the shock of Russia’s 2022 invasion of Ukraine, solar energy has been having a moment in Europe. Electricity prices have been going up while the investment required to get…

Samara is accelerating the energy transition in Spain one solar panel at a time

Featured Article

DEI backlash: Stay up-to-date on the latest legal and corporate challenges

It’s clear that this year will be a turning point for DEI.

1 day ago
DEI backlash: Stay up-to-date on the latest legal and corporate challenges

The keynote will be focused on Apple’s software offerings and the developers that power them, including the latest versions of iOS, iPadOS, macOS, tvOS, visionOS and watchOS.

Watch Apple kick off WWDC 2024 right here

Hello and welcome back to TechCrunch Space. Unfortunately, Boeing’s Starliner launch was delayed yet again, this time due to issues with one of the three redundant computers used by United…

TechCrunch Space: China’s victory

The court ruling said that Fearless Fund’s Strivers Grant likely violates the Civil Rights Act of 1866, which bans the use of race in contracts.

An appeals court rules that VC Fearless Fund cannot issue grants to Black women, but the fight continues