AI

Voicemod tools up with $14.5M to ride the generative AI (sonic)boom

Comment

woman speaking into a microphone in a recording studio
Image Credits: Nicola Katie (opens in a new window) / Getty Images

The first thing we ask Voicemod‘s CEO and co-founder, Jamie Bosch, when he picks up the phone to talk about a new funding round is not something we’re accustomed to asking — but our question may become the norm in the generative AI future that’s fast-flying at us: Is this your real voice?

Bosch’s startup has been fiddling with audio effects for almost a decade, playing in the field of digital signal processing (DSP) — where its early focus was on creating fun ‘sound emoji’ effects and reactions for gamers to spice up their voice chats. And gamers do remain its main user-base (for now). But the audio field is being charged by developments in AI — which Voicemod’s team is hoping will lead to whole new use-cases and many more users for its tools.

So where DSP technology was about applying effects to a person’s (real) voice, developments in artificial intelligence are enabling startups like Voicemod to offer tools to create entirely synthesized (unreal) voices. And even the ability for users to ‘wear’ these voices in real-time — so they can speak with a voice that isn’t theirs. Think of it as the audio equivalent of a Snapchat lens or TikTok’s viral teenage filter or Reface’s celebrity face-swaps.

AI voice can even enable voice-shifting into another person’s (real) voice. And not just for talking about the weather or shooting the shit. But for what’s known as sing-to-sing voice conversion. Meaning you could get to sing in someone else’s voice — supercharging your karaoke game, say, by singing Bohemian Rhapsody as literally the voice of Freddie Mercury. And even switching between Mercury, May and Taylor, for the full mock opera effect if you have enough trained AI models (and microphones) on hand. Mamma-mia! 

Artificial intelligence makes all this possible — even if legal and ethical questions may create pause for thought about rushing to unleash real-time voice-shifting upon a world that still relies plenty upon fixed identities. (Banks pushing customers to record ‘a unique voiceprint’ to use as a password definitely need to sit up and start listening.)

Voicemod acquired another audio effects startup last year, called Voctro Labs, whose technology Bosch says it’s working to blend with its own to create an amped up hybrid platform. The combo has already allowed it to expand what it offers — launching a text-to-song feature last December which lets you turn your own lyrics into a vocal composition using generative AI. He tells us more is on the way — including the aforementioned sing-to-sing feature.

Voctro’s tech may be familiar as it was involved in the development of a voice clone of musician Holly Herndon which appeared in a viral Ted Talk last year — in which her AI voice could be heard duetting with another musician (Pher)’s real voice in real-time. Which, well, if you haven’t already seen it is quite the visual-audio spectacle, as well as being a mouthful to explain. It’s also a taster of what Voicemod has coming to a keyboard near you.

“We’re definitely going to launch more products and more ways for people to express themselves with the generative AI technology,” Bosch tells us. “Not all Voctro Labs’ technologies are related to music — but they have a lot of technology related to singing, from this text-to-song technology to sing-to-sing technology in real time. So we have a lot of new projects and new products of upcoming.

“We are going to strengthen our speech-to-speech AI real-time technology, because we are basically merging our technology with their technology. We’re basically creating an hybrid technology that will be better than ours — or there’s a mix of both… [So their sing-to-sing technology will be] combined with our DSP technology — that we could use to do autotune. So we could potentially help artists with their voice and on the tone. And so this is, this is gonna be really, really interesting.”

As well as providing direct-to-consumer/creator audio tools, it offers its technologies via SDK and APIs for third parties to integrate into their own products, from games and apps to hardware. So it’s set up to distribute its tech across the gamer-creator ecosystem and have demand come find it.

Generative AI-powered disruption in audio of course mirrors (in a non-exact fairground ‘crazy mirror’ kind of a way) developments we’re seeing happen elsewhere: Visually, to graphics and illustration, as a result of deep learning and the advent of prompt-based image generation interfaces (such as DALL-E and Stable Diffusion). Also to the written word, through the large language models that underpin generative AI chatbots like ChatGPT that can produce song lyrics or a whole essay on demand. And, indeed, in the case of musical composition — where Google recently showed off a prompt-based generative AI song composer which can apparently produce arrangements that match the musical vibe you describe (although it said it’s not releasing that particular generative AI model — but surely someone else will).

It’s clear that AI is bending the rules of what it’s possible for a single person to create. And, well, as with freedom, the open concept, this is both thrilling and terrifying. Because, it’s what you do with it that counts.

The coming years are going to be all about finding out what people do with such powerful AI tools at their fingertips.

Voicemod team photo
Image Credits: Voicemod

Voicemod is positioning itself to ride this wave by building a toolbox for creators to survive and thrive in a reality-bending future and across a range of use-cases — hence it’s talking in terms of sonic identity and voice avatars for the social metaverse (at the future-gaze-y end) but also just helping you sound your sparkling best on a work Zoom call. So a sort of audio make-up as it were. Apply as needed.

“Now suddenly everyone can become a creator,” predicts Bosch of the generative AI boon. “Everyone can come, basically, with no skill set. Or with no learnings on how to really craft those audios. They will be able to actually create those pieces of music. Songs. And this eventually evolves into into — probably — even voices. So the ability to create voices.”

“This could potentially be something really viral for platforms like TikTok, or YouTube Shorts or Instagram… And this could eventually evolve into things like karaoke, for example. And be, I don’t know, part of game consoles, or things like that, for people to use this to entertain. And, if we go a step further — and it’s the technology getting better and better as we think it will be — this could potentially be a professional tool for people who want to create music. Or for people who want to create voices for movies or voices for games characters.

“We have a strong belief in user-generated content, and we are building tools for our users to start creating sounds and creating voices. And we will be putting technology in the hands of the users to create those [sounds]. And, eventually in the future, hopefully, they will go even to a professional level.”

So while — currently — in order for the startup to synthesize a whole voice it does still involve a team of sound engineers and designers, Bosch suggests generative AI will put that power in the hands of the individual — and it’ll happen soon; “in the near future”.

“I don’t know if we’ll be prompting — now we’re in this wave of everything is done through prompts — I’m not sure if that will be the way or it will be more tools that will have AI technology embedded and we have user experiences that will make things a lot easier,” he adds. “But definitely what I see from generative AI in the audience but also in the management phase is that suddenly everyone’s can come become a creator, which I think is really interesting.”

The birth of AI voice may not sound like amazing news for the employment prospects of sound engineers and designers (albeit, tech advances may simply create new requirements that just shift where their expertise is needed). But Bosch reckons that voice actors, at least, will still have a key role to play — emoting for AI. Since robot voices aren’t good at getting the pitch and intonation, or indeed emotion, right. It’s a voice clone without a soul, basically. (Or as Nick Cave might put it, AI voice lacks ‘its own blood, its own struggle, its own suffering’ — it lacks humanness.)

“I think that you will always need a human factor in your sample with these voices,” suggests Bosch. “You could have the best voice — of even a famous person — but what really comes is the impression. You still need a human to do the cadence on the words. You still need a human to do the rhythm, the tone. So [it’s not just that] I can speak normally and I will sound like a famous person — no, you don’t — you still need to act a little bit. So… I think human factor for expression is key.”

Might generative AI not be able to be learn to emote as well, with the right human data-sets — and further dial up its mimickry so as to make us laugh or cry or love or hate on-demand too?

“Yeah. Well, we will see,” responds Bosch. “I’m not sure. I mean, as of today, for me AI is a tool to be used by humans. But yeah, we don’t know where this is going to evolve.”

Voicemod for Desktop
Voicemod for Desktop. Image Credits: Voicemod

Voicemod is gearing up for whatever phonic crazyiness lies ahead with a fresh tranche of funding. The 2014-founded startup has been revenue generating for years, via pro versions of its tools — its main product, Voicemod for Desktop, has had more than 40 million downloads to-date, while Bosch says it has 3.3 million monthly active users — but it’s just closed $14.5 million in expansion funding, following an $8M Series A back in summer 2020Madrid-based Kfund’s growth fund Leadwind, led the round, with participation from Minifund (Eros Resmini former CMO at Discord) and Bitkraft Ventures.

“We’re super excited by what generative AI can do to all creative industries and more specifically audio, especially when it comes to enhancing and augmenting the job that creative people already do,” Jamie Novoa, partner at Kfund, tells TechCrunch. “In the past few months there’s been an explosion in generative AI in general and more specifically in audio but we think this is a phenomenon that’s just starting.

“What many of the cool technologies being launched to market lack are concrete and scalable business models attached to them, and Voicemod differentiates itself from the pack by having built a product used by millions of people on a daily basis and with significant revenue traction. We’re super excited about what Jaime and the rest of the Voicemod team have in the pipeline and what’s to come.”

Voicemod says the extra funds will be used to enhance the development of its real-time AI voice identity capabilities — and dial up its proposition for Gen Z, gamers, content creators, and professionals of all skill levels wanting tools to help them express themselves vocally in digital spaces.

Per Bosch, part of the reason it’s taking more funding now relates to the acquisition of Voctro Labs. Beyond that, he says it’s about making the most of the opportunities sparking off the Cambrian explosion in generative AI tools.

“We are in the middle of tremendous revolution in AI,” he says. “We want to be well funding in order to be able to develop technology but also to be able to deliver technology to users. So I think one of our competitive advantages is that we already have the market and the traction and we basically are able to put this in the hands of the users. And I want to make sure to have enough runway, also due to market conditions, to be able to put all of this in place. So it will be mainly focused… on building the next generation AI technology and putting it in the hands of the users and also building these creation tools for the users to create content.”

The first new tool will be landing next month — with a launch of Voicemod’s desktop product on macOS (currently it’s PC only). The goal is to evolve into a multi-platform product spanning all devices. “We’re also working on a creation tool mobile app that hopefully will see the light towards the beginning of next quarter. And, and yeah, some more stuff to come, hopefully,” Bosch adds.

He also tells us the startup is working on a watermarking technology which it hopes to launch in Q2 this year — to give platforms a way to be able to spot AI-generated voices in the wild.

Such a feature is likely to be a vital tool to counter all the possible negative use-cases (scams, fraud, manipulation, abuse, bullying, trolling etc etc) one could imagine humans coming up with for voice-shifting tools that let you sound exactly like someone you’re not.

“It’s an algorithm to watermark the audio,” explains Bosch. “Moderation is is complicated because it really changes depending on the space… on which are the platforms where the audio is used — so we believe that the channel is the one that should own that moderation and what we are doing is we will be providing this watermarking system in order for them to be able to know if the audio is created via synthetic voice or is created by a real voice.”

“Every single new technology can be used for for the good or for the bad,” he adds. “So we are of course putting some technology some tools in place to be able to have more control around a misuse of this technology.”

On questions of licensing for training data, IP issues here are currently a grey area — as the law hasn’t caught up with developments in AI (let alone generative AI). That means startups operating in the space have to consider whether to make the most of total legal freedom to do whatever they want (and hope expensive consequences don’t come clanging down on them in short order), or tread more carefully and thoughtfully. (Other startups in the space include the likes of Voice AI, Koe and ElevenLabs.)

Bosch claims Voicemod is taking the latter approach — using (paid) voice actors to build up data-sets to train and hone its AI models. If it wants to make use of some original content he says the team will go to the IP provider and negotiate — and figure out what kind of licensing terms they’d be up for. (The generative AI boom is also a crazy-thrilling time to be an IP lawyer, clearly.)

“We are basically pioneering here,” he adds. “So a lot of things are without laws yet so we were trying to stick to our values, basically, and try to do the right thing. That’s our approach on the data [side]. But yeah, you’re completely, right — there’s no ‘legal attachment’ to your voice, as of today… We own our fingerprint. You don’t own, like, whatever the fingerprint of your voice [is]. As of today.

“It sounds a little bit like science fiction but maybe, in the future, we will ‘own’ something related to our voice.”

For the record, Bosch was talking to me with his actual voice. The company’s real-time voice-shifting technology doesn’t yet work over mobile. But he says that’s coming too. So buckle up: The synthesized future is gonna be a screaming wild ride.

As ChatGPT hype hits fever pitch, Neeva launches its generative AI search engine internationally

More TechCrunch

The fresh funds were raised from two investors who transferred the capital into a special purpose vehicle, a legal entity associated with the OpenAI Startup Fund.

OpenAI Startup Fund raises additional $5M

Accel has invested in more than 200 startups in the region to date, making it one of the more prolific VCs in this market.

Accel has a fresh $650M to back European early-stage startups

Kyle Vogt, the former founder and CEO of self-driving car company Cruise, has a new VC-backed robotics startup focused on household chores. Vogt announced Monday that the new startup, called…

Cruise founder Kyle Vogt is back with a robot startup

When Keith Rabois announced he was leaving Founders Fund to return to Khosla Ventures in January, it came as a shock to many in the venture capital ecosystem — and…

From Miles Grimshaw to Eva Ho, venture capitalists continue to play musical chairs

On the heels of OpenAI announcing the latest iteration of its GPT large language model, its biggest rival in generative AI in the U.S. announced an expansion of its own.…

Anthropic is expanding to Europe and raising more money

If you’re looking for a Starliner mission recap, you’ll have to wait a little longer, because the mission has officially been delayed.

TechCrunch Space: You rock(et) my world, moms

Apple devoted a full event to iPad last Tuesday, roughly a month out from WWDC. From the invite artwork to the polarizing ad spot, Apple was clear — the event…

Apple iPad Pro M4 vs. iPad Air M2: Reviewing which is right for most

Terri Burns, a former partner at GV, is venturing into a new chapter of her career by launching her own venture firm called Type Capital. 

GV’s youngest partner has launched her own firm

The decision to go monochrome was probably a smart one, considering the candy-colored alternatives that seem to want to dazzle and comfort you.

ChatGPT’s new face is a black hole

Apple and Google announced on Monday that iPhone and Android users will start seeing alerts when it’s possible that an unknown Bluetooth device is being used to track them. The…

Apple and Google agree on standard to alert people when unknown Bluetooth devices may be tracking them

The company is describing the event as “a chance to demo some ChatGPT and GPT-4 updates.”

OpenAI’s ChatGPT announcement: Watch here

A human safety operator will be behind the wheel during this phase of testing, according to the company.

GM’s Cruise ramps up robotaxi testing in Phoenix

OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the “o” stands for “omni,” referring to the model’s ability to handle text, speech, and…

OpenAI debuts GPT-4o ‘omni’ model now powering ChatGPT

Featured Article

The women in AI making a difference

As a part of a multi-part series, TechCrunch is highlighting women innovators — from academics to policymakers —in the field of AI.

10 hours ago
The women in AI making a difference

The expansion of Polar Semiconductor’s facility would enable the company to double its U.S. production capacity of sensor and power chips within two years.

White House proposes up to $120M to help fund Polar Semiconductor’s chip facility expansion

In 2021, Google kicked off work on Project Starline, a corporate-focused teleconferencing platform that uses 3D imaging, cameras and a custom-designed screen to let people converse with someone as if…

Google’s 3D video conferencing platform, Project Starline, is coming in 2025 with help from HP

Over the weekend, Instagram announced it is expanding its creator marketplace to 10 new countries — this marketplace connects brands with creators to foster collaboration. The new regions include South…

Instagram expands its creator marketplace to 10 new countries

You can expect plenty of AI, but probably not a lot of hardware.

Google I/O 2024: What to expect

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Google I/O 2024: How to watch

Four-year-old Mexican BNPL startup Aplazo facilitates fractionated payments to offline and online merchants even when the buyer doesn’t have a credit card.

Aplazo is using buy now, pay later as a stepping stone to financial ubiquity in Mexico

We received countless submissions to speak at this year’s Disrupt 2024. After carefully sifting through all the applications, we’ve narrowed it down to 19 session finalists. Now we need your…

Vote for your Disrupt 2024 Audience Choice favs

Co-founder and CEO Bowie Cheung, who previously worked at Uber Eats, said the company now has 200 customers.

Healthy growth helps B2B food e-commerce startup Pepper nab $30 million led by ICONIQ Growth

Booking.com has been designated a gatekeeper under the EU’s DMA, meaning the firm will be regulated under the bloc’s market fairness framework.

Booking.com latest to fall under EU market power rules

Featured Article

‘Got that boomer!’: How cybercriminals steal one-time passcodes for SIM swap attacks and raiding bank accounts

Estate is an invite-only website that has helped hundreds of attackers make thousands of phone calls aimed at stealing account passcodes, according to its leaked database.

15 hours ago
‘Got that boomer!’: How cybercriminals steal one-time passcodes for SIM swap attacks and raiding bank accounts

Squarespace is being taken private in an all-cash deal that values the company on an equity basis at $6.6 billion.

Permira is taking Squarespace private in a $6.9 billion deal

AI-powered tools like OpenAI’s Whisper have enabled many apps to make transcription an integral part of their feature set for personal note-taking, and the space has quickly flourished as a…

Buy Me a Coffee’s founder has built an AI-powered voice note app

Airtel, India’s second-largest telco, is partnering with Google Cloud to develop and deliver cloud and GenAI solutions to Indian businesses.

Google partners with Airtel to offer cloud and GenAI products to Indian businesses

To give AI-focused women academics and others their well-deserved — and overdue — time in the spotlight, TechCrunch has been publishing a series of interviews focused on remarkable women who’ve contributed to…

Women in AI: Rep. Dar’shun Kendrick wants to pass more AI legislation

We took the pulse of emerging fund managers about what it’s been like for them during these post-ZERP, venture-capital-winter years.

A reckoning is coming for emerging venture funds, and that, VCs say, is a good thing

It’s been a busy weekend for union organizing efforts at U.S. Apple stores, with the union at one store voting to authorize a strike, while workers at another store voted…

Workers at a Maryland Apple store authorize strike