AI

With DeepFloyd, generative AI art gets a text upgrade

Comment

DeepFloyd IF
Image Credits: DeepFloyd

Generative AI is pretty impressive in terms of its fidelity these days, as viral memes like Balenciaga Pope would suggest. The latest systems can conjure up scenescapes from city skylines to cafes, creating images that appear startlingly realistic — at least on first glance.

But one of the longstanding weaknesses of text-to-image AI models is, ironically, text. Even the best models struggle to generate images with legible logos, much less text, calligraphy or fonts.

But that might change.

Last week, DeepFloyd, a research group backed by Stability AI, unveiled DeepFloyd IF, a text-to-image model that can “smartly” integrate text into images. Trained on a dataset of more than a billion images and text, DeepFloyd IF, which requires a GPU with at least 16GB of RAM to run, can create an image from a prompt like “a teddy bear wearing a shirt that reads ‘Deep Floyd’” — optionally in a range of styles.

DeepFloyd IF is available in open source, licensed in a way that prohibits commercial use — for now. The restriction was likely motivated by the current tenuous legal status of generative AI art models. Several commercial model vendors are under fire from artists who allege the vendors are profiting from their work without compensating them by scraping that work from the web without permission.

But NightCafe, the generative art platform, was granted early access to DeepFloyd IF.

NightCafe CEO Angus Russell spoke to TechCrunch about what makes DeepFloyd IF different from other text-to-image models and why it might represent a significant step forward for generative AI.

According to Russell, DeepFloyd IF’s design was heavily inspired by Google’s Imagen model, which was never released publicly. In contrast to models like OpenAI’s DALL-E 2 and Stable Diffusion, DeepFloyd IF uses multiple different processes stacked together in a modular architecture to generate images.

DeepFloyd IF
Image Credits: DeepFloyd

With a typical diffusion model, the model learns how to gradually subtract noise from a starting image made almost entirely of noise, moving it closer step by step to the target prompt. DeepFloyd IF performs diffusion not once but several times, generating a 64x64px image then upscaling the image to 256x256px and finally to 1024x1024px.

Why the need for multiple diffusion steps? DeepFloyd IF works directly with pixels, Russell explained. Diffusion models are for the most part latent diffusion models, which essentially means they work in a lower-dimensional space that represents a lot more pixels but in a less accurate way.

The other key difference between DeepFloyd IF and models such as Stable Diffusion and DALL-E 2 is that the former uses a large language model to understand and represent prompts as a vector, a basic data structure. Due to the size of the large language model embedded in DeepFloyd IF’s architecture, the model is particularly good at understanding complex prompts and even spatial relationships described in prompts (e.g. “a red cube on top of a pink sphere”).

“It’s also very good at generating legible and correctly spelled text in images, and can even understand prompts in multiple languages,” Russell added. “Of these capabilities, the ability to generate legible text in images is perhaps the biggest breakthrough to make DeepFloyd IF stand out from other algorithms.”

Because DeepFloyd IF can pretty capably generate text in images, Russell expects it to unlock a wave of new generative art possibilities — think logo design, web design, posters, billboards and even memes. The model should also be much better at generating things like hands, he says, and — because it can understand prompts in other languages — it might be able to create text in those languages, too.

NightCafe users are excited about DeepFloyd IF largely because of the possibilities that are unlocked by generating text in images,” Russell said. “Stable Diffusion XL was the first open source algorithm to make headway on generating text — it can accurately generate one or two words some of the time — but it’s still not good enough at it for use cases where text is important.”

That’s not to suggest DeepFloyd IF is the holy grail of text-to-image models. Russell notes that the base model doesn’t generate images that are quite as aesthetically pleasing as some diffusion models, although he expects fine-tuning will improve that.

DeepFloyd IF
Image Credits: DeepFloyd

But the bigger question, to me, is to what degree DeepFloyd IF suffers from the same flaws as its generative AI brethren.

A growing body of research has turned up racial, ethnic, gender and other forms of stereotyping in image-generating AI, including Stable Diffusion. Just this month, researchers at AI startup Hugging Face and Leipzig University published a tool demonstrating that models including Stable Diffusion and OpenAI’s DALL-E 2 tend to produce images of people that look white and male, especially when asked to depict people in positions of authority.

The DeepFloyd team, to their credit, note the potential for biases in the fine print accompanying DeepFloyd IF:

Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for. This affects the overall output of the model, as white and western cultures are often set as the default.

Aside from this, DeepFloyd IF, like other open source generative models, could be used for harm, like generating pornographic celebrity deepfakes and graphic depictions of violence. On the official webpage for DeepFloyd IF, the DeepFloyd team says that they used “custom filters” to remove watermarked, “NSFW” and “other inappropriate content” from the training data.

But it’s unclear exactly which content was removed — and how much might’ve been missed. Ultimately, time will tell.

More TechCrunch

Struggling EV startup Fisker has laid off hundreds of employees in a bid to stay alive, as it continues to search for funding, a buyout or prepare for bankruptcy. Workers…

Fisker cuts hundreds of workers in bid to keep EV startup alive

Chinese EV manufacturers face a new challenge in their pursuit of U.S. customers: a new House bill that would limit or ban the introduction of their connected vehicles. The bill,…

Chinese EV makers, and their connected vehicles, targeted by new House bill

With the release of iOS 18 later this year, Apple may again borrow ideas third-party apps. This time it’s Arc that could be among those affected.

Is Apple planning to ‘sherlock’ Arc?

TechCrunch Disrupt 2024 will be in San Francisco on October 28–30, and we’re already excited! This is the startup world’s main event, and it’s where you’ll find the knowledge, tools…

Meet Visa, Mercury, Artisan, Golub Capital and more at TC Disrupt 2024

Featured Article

The women in AI making a difference

As a part of a multi-part series, TechCrunch is highlighting women innovators — from academics to policymakers —in the field of AI.

7 hours ago
The women in AI making a difference

Cadillac may seem a bit too traditional to hang its driving cap on EVs. And yet, that hasn’t stopped the GM brand from rolling out — or at least showing…

The Cadillac Optiq EV starts at $54,000 and is designed to hook young hipsters

Ifeel is being offered as part of an employer’s or insurance provider’s healthcare coverage.

Mental health insurance platform ifeel raises a $20 million Series B

Instead of opening the user’s actual browser or a WebView, Custom Tabs let users remain in their app while browsing.

Google Chrome becomes a ‘picture-in-picture’ app

Sanil Chawla remembers the meetings he had with countless artists in college. Those creatives were looking for one thing: sustainable economic infrastructure that could help them scale rather than drown…

Slingshot raises $2.2 million to provide financial services to artists

A startup called Firefly that’s tackling the thorny and growing issue of cloud asset management with an “infrastructure as code” solution has raised $23 million in funding. That comes on…

Firefly forges on after co-founder murdered by Hamas

Mistral, the French AI startup backed by Microsoft and valued at $6 billion, has released its first generative AI model for coding, dubbed Codestral. Like other code-generating models, Codestral is…

Mistral releases Codestral, its first generative AI model for code

Pinterest announced today that it is evolving its Creator Inclusion Fund to now be called the Pinterest Inclusion Fund. Pinterest teamed up with Shopify’s Build Black and Build Native programs…

Pinterest expands its Creator Fund to allow founders

Alex Taub, a longtime founder with multiple exits under his belt, believes it’s time to disrupt the meme industry. “I have this big thesis that meme tech is going to…

This founder says meme tech is the next big thing

Lux, the startup behind popular pro photography app Halide and others, is venturing into video with its latest app launch. On Wednesday, the company announced Kino, a new video capture app…

Kino is a new iPhone app for videographers from the makers of Halide

DevOps startup Harness has shown itself to be an ambitious company, building a broad platform of services while also dabbling in M&A when it made sense to fill in functionality.…

Harness snags Split.io as it goes all in on feature flags and experiments

Microsoft’s Copilot, a generative AI-powered tool that can generate text as well as answer specific questions, is now available as an in-app chatbot on Telegram, the instant messaging app.  Currently…

Microsoft’s Copilot is now on Telegram

HBO’s new documentary, “MoviePass, MovieCrash,” tells a story that many of us know about: how MoviePass, the subscription-based movie ticketing startup, was a catastrophic failure. After a series of mishaps…

MoviePass co-founders speak their truth in HBO’s new documentary 

The watch features a variety of different 3D games, unlocking more play time the more kids move.

Fitbit’s new kid smartwatch is a little Wiimote, a little Tamagotchi

In the video, a crowd is roaring at a packed summer music festival. As a beat starts playing over the speakers, the performer finally walks onstage: It’s the Joker. Clad…

Discord has become an unlikely center for the generative AI boom

After the Wirecard scandal, Germany’s financial regulator BaFin started to look more closely at young fintech startups that wanted to grow at a rapid pace — it’s better to be…

Germany’s financial regulator ends anti-money laundering cap on N26 signups after $10M fine

Among other things, this includes the ability to trace code from source to binary packages across both platforms, single sign-on support and unified project structures.

JFrog and GitHub team up to closely integrate their source code and binary platforms

The company’s public fund disbursement and e-commerce platform makes accepting school tuition and enabling educational enrichment more accessible. 

Tech startup Odyssey goes on journey to help states implement school choice programs

A new startup called Kinnect aims to help people privately save generational memories, traditions, recipes and more. The company’s app, launched this month, lets people create invite-only spaces where they…

Kinnect’s new app aims to help families record and store generational memories

Spotify has hiked its premium subscription in France by an eye-watering €0.13, in response to a new music-streaming tax.

Spotify hikes subscription price in France by 1.2% to match new music-streaming tax

The European Union has taken the wraps off the structure of the new AI Office, the ecosystem-building and oversight body that’s being established under the bloc’s AI Act. The risk-based…

With the EU AI Act incoming this summer, the bloc lays out its plan for AI governance

Solutions by Text, a company that gives people a way to pay their bills and apply for loans via text messaging, has secured $110 million in new growth funding. Edison…

Bootstrapped for over a decade, this Dallas company just secured $110M to help people pay bills by text

Owners of small- and medium-sized businesses check their bank balances daily to make financial decisions. But it’s entrepreneur Yoseph West’s assertion that there’s typically information and functions missing from bank…

Relay raises $32.2 million to help smaller businesses manage their cash flow

When other firms were investing and raising eye-popping sums, Clean Energy Ventures took a different approach. It appears to be paying off.

How Clean Energy Ventures avoided the pandemic bubble and raised a $305M fund

PwC, the management consulting giant, will become OpenAI’s biggest customer to date, covering 100,000 users.

OpenAI signs 100K PwC workers to ChatGPT’s enterprise tier as PwC becomes its first resale partner

Tech enthusiasts and entrepreneurs, the clock is ticking! With just 72 hours remaining until the early-bird ticket deadline for TechCrunch Disrupt 2024, now is the time to secure your spot…

72 hours left of the Disrupt early-bird sale