AI

VALL-E’s quickie voice deepfakes should worry you, if you weren’t worried already

Comment

Image Credits: Bryce Durbin/TechCrunch

The emergence in the last week of a particularly effective voice synthesis machine learning model called VALL-E has prompted a new wave of concern over the possibility of deepfake voices made quick and easy — quickfakes, if you will. But VALL-E is more iterative than breakthrough, and the capabilities aren’t so new as you might think. Whether that means you should be more or less worried is up to you.

Voice replication has been a subject of intense research for years, and the results have been good enough to power plenty of startups, like WellSaid, Papercup and Respeecher. The latter is even being used to create authorized voice reproductions of actors like James Earl Jones. Yes: from now on Darth Vader will be AI generated.

VALL-E, posted on GitHub by its creators at Microsoft last week, is a “neural codec language model” that uses a different approach to rendering voices than many before it. Its larger training corpus and some new methods allow it to create “high-quality personalized speech” using just three seconds of audio from a target speaker.

That is to say, all you need is an extremely short clip like the following (all clips from Microsoft’s paper):

To produce a synthetic voice that sounds remarkably similar:

As you can hear, it maintains tone, timbre, a semblance of accent and even the “acoustic environment,” (for instance, a voice compressed into a cell phone call). I didn’t bother labeling them because you can easily tell which of the above is which. It’s quite impressive!

So impressive, in fact, that this particular model seems to have pierced the hide of the research community and “gone mainstream.” As I got a drink at my local last night, the bartender emphatically described the new AI menace of voice synthesis. That’s how I know I misjudged the zeitgeist.

But if you look back a bit, in as early as 2017 all you needed was a minute of voice to produce a fake version convincing enough that it would pass in casual use. And that was far from the only project.

Lyrebird is a voice mimic for the fake news era

The improvement we’ve seen in image-generating models like DALL-E 2 and Stable Diffusion, or in language ones like ChatGPT, has been a transformative, qualitative one: A year or two ago this level of detailed, convincing AI-generated content was impossible. The worry (and panic) around these models is understandable and justified.

Contrariwise, the improvement offered by VALL-E is quantitative not qualitative. Bad actors interested in proliferating fake voice content could have done so long ago, just at greater computational cost, not something that is particularly difficult to find these days. State-sponsored actors in particular would have plenty of resources at hand to do the kind of compute jobs necessary to, say, create a fake audio clip of the President saying something damaging on a hot mic.

I chatted with James Betker, an engineer who worked for a while on another text-to-speech system, called Tortoise-TTS.

Betker said that VALL-E is indeed iterative and like other popular models these days gets its strength from its size.

The emerging types of language models and why they matter

“It’s a large model, like ChatGPT or Stable Diffusion; it has some inherent understanding of how speech is formed by humans. You can then fine-tune Tortoise and other models on specific speakers, and it makes them really, really good. Not ‘kind of sounds like’; good,” he explained.

When you “fine-tune” Stable Diffusion on a particular artist’s work, you’re not retraining the whole enormous model (that takes a lot more power), but you can still vastly improve its capability of replicating that content.

But just because it’s familiar doesn’t mean it should be dismissed, Betker clarified.

“I’m glad it’s getting some traction because i really want people to be talking about this. I actually feel that speech is somewhat sacred, the way our culture thinks about it,” and he actually stopped working on his own model as a result of these concerns. A fake Dali created by DALL-E 2 doesn’t have the same visceral effect for people as hearing something in their own voice, that of a loved one or of someone admired.

VALL-E moves us one step closer to ubiquity, and although it is not the type of model you run on your phone or home computer, that isn’t too far off, Betker speculated. A few years, perhaps, to run something like it yourself; as an example, he sent this clip he’d generated on his own PC using Tortoise-TTS of Samuel L. Jackson, based on audiobook readings of his:

Good, right? And a few years ago you might have been able to accomplish something similar, albeit with greater effort.

This is all just to say that while VALL-E and the three-second quickfake are definitely notable, they’re a single step on a long road researchers have been walking for over a decade.

The threat has existed for years and if anyone cared to replicate your voice, they could easily have done so long ago. That doesn’t make it any less disturbing to think about, and there’s nothing wrong with being creeped out by it. I am too!

But the benefits to malicious actors are dubious. Petty scams that use a passable quickfake based on a wrong number call, for instance, are already super easy because security practices at many companies are already lax. Identity theft doesn’t need to rely on voice replication because there are so many easier paths to money and access.

Meanwhile the benefits are potentially huge — think about people who lose the ability to speak due to an illness or accident. These things happen quickly enough that they don’t have time to record an hour of speech to train a model on (not that this capability is widely available, though it could have been years ago). But with something like VALL-E, all you’d need is a couple clips off someone’s phone of them making a toast at dinner or talking with a friend.

There’s always opportunity for scams and impersonation and all that — although more people are parted with their money and identities via far more prosaic ways, like a simple phone or phishing scam. The potential for this technology is huge, but we should also listen to our collective gut, saying there’s something dangerous here. Just don’t panic — yet.

More TechCrunch

A Singapore High Court has effectively approved Pine Labs’ request to shift its operations to India.

Pine Labs gets Singapore court approval to shift base to India

Ahead of the AI safety summit kicking off in Seoul, South Korea later this week, its co-host the United Kingdom is expanding its own efforts in the field. The AI…

UK opens office in San Francisco to tackle AI risk

Companies are always looking for an edge, and searching for ways to encourage their employees to innovate. One way to do that is by running an internal hackathon around a…

Why companies are turning to internal hackathons

Featured Article

I’m rooting for Melinda French Gates to fix tech’s broken ‘brilliant jerk’ culture

Women in tech still face a shocking level of mistreatment at work. Melinda French Gates is one of the few working to change that.

14 hours ago
I’m rooting for Melinda French Gates to fix tech’s  broken ‘brilliant jerk’ culture

Blue Origin has successfully completed its NS-25 mission, resuming crewed flights for the first time in nearly two years. The mission brought six tourist crew members to the edge of…

Blue Origin successfully launches its first crewed mission since 2022

Creative Artists Agency (CAA), one of the top entertainment and sports talent agencies, is hoping to be at the forefront of AI protection services for celebrities in Hollywood. With many…

Hollywood agency CAA aims to help stars manage their own AI likenesses

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

A new crop of early-stage startups — along with some recent VC investments — illustrates a niche emerging in the autonomous vehicle technology sector. Unlike the companies bringing robotaxis to…

VCs and the military are fueling self-driving startups that don’t need roads

When the founders of Sagetap, Sahil Khanna and Kevin Hughes, started working at early-stage enterprise software startups, they were surprised to find that the companies they worked at were trying…

Deal Dive: Sagetap looks to bring enterprise software sales into the 21st century

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI moves away from safety

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine

IndieBio’s Bay Area incubator is about to debut its 15th cohort of biotech startups. We took special note of a few, which were making some major, bordering on ludicrous, claims…

IndieBio’s SF incubator lineup is making some wild biotech promises

YouTube TV has announced that its multiview feature for watching four streams at once is now available on Android phones and tablets. The Android launch comes two months after YouTube…

YouTube TV’s ‘multiview’ feature is now available on Android phones and tablets

Featured Article

Two Santa Cruz students uncover security bug that could let millions do their laundry for free

CSC ServiceWorks provides laundry machines to thousands of residential homes and universities, but the company ignored requests to fix a security bug.

3 days ago
Two Santa Cruz students uncover security bug that could let millions do their laundry for free

TechCrunch Disrupt 2024 is just around the corner, and the buzz is palpable. But what if we told you there’s a chance for you to not just attend, but also…

Harness the TechCrunch Effect: Host a Side Event at Disrupt 2024

Decks are all about telling a compelling story and Goodcarbon does a good job on that front. But there’s important information missing too.

Pitch Deck Teardown: Goodcarbon’s $5.5M seed deck

Slack is making it difficult for its customers if they want the company to stop using its data for model training.

Slack under attack over sneaky AI training policy

A Texas-based company that provides health insurance and benefit plans disclosed a data breach affecting almost 2.5 million people, some of whom had their Social Security number stolen. WebTPA said…

Healthcare company WebTPA discloses breach affecting 2.5 million people

Featured Article

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Microsoft won’t be facing antitrust scrutiny in the U.K. over its recent investment into French AI startup Mistral AI.

3 days ago
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Ember has partnered with HSBC in the U.K. so that the bank’s business customers can access Ember’s services from their online accounts.

Embedded finance is still trendy as accounting automation startup Ember partners with HSBC UK

Kudos uses AI to figure out consumer spending habits so it can then provide more personalized financial advice, like maximizing rewards and utilizing credit effectively.

Kudos lands $10M for an AI smart wallet that picks the best credit card for purchases

The EU’s warning comes after Microsoft failed to respond to a legally binding request for information that focused on its generative AI tools.

EU warns Microsoft it could be fined billions over missing GenAI risk info

The prospects for troubled banking-as-a-service startup Synapse have gone from bad to worse this week after a United States Trustee filed an emergency motion on Wednesday.  The trustee is asking…

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

U.K.-based Seraphim Space is spinning up its 13th accelerator program, with nine participating companies working on a range of tech from propulsion to in-space manufacturing and space situational awareness. The…

Seraphim’s latest space accelerator welcomes nine companies

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said…

OpenAI inks deal to train AI on Reddit data