AI

OpenAI’s new DALL-E model draws anything — but bigger, better and faster than before

Comment

Image Credits: OpenAI

Early last year OpenAI showed off a remarkable new AI model called DALL-E (a combination of WALL-E and Dali), capable of drawing nearly anything and in nearly any style. But the results were rarely something you’d want to hang on the wall. Now DALL-E 2 is out, and it does what its predecessor did much, much better — scarily well, in fact. But the new capabilities come with new restrictions to prevent abuse.

DALL-E was described in detail in our original post on it, but the gist is that it is able to take quite complex prompts, such as “A bear riding a bicycle through a mall, next to a picture of a cat stealing the Declaration of Independence.” It would gladly comply, and out of hundreds of outputs find the most likely to meet the user’s standards.

DALL-E 2 does the same thing fundamentally, turning a text prompt into a surprisingly accurate image. But it has learned a few new tricks.

First, it’s just plain better at doing the original thing. The images that come out the other end of DALL-E 2 are several times bigger and more detailed. It’s actually faster despite producing more imagery, meaning more variations can be spun out in the handful of seconds a user might be willing to wait.

“A sea otter in the style of Girl with a Pearl Earring” turns out pretty good. Image Credits: OpenAI

Part of that improvement comes from a switch to a diffusion model, a type of image creation that starts with pure noise and refines the image over time, repeatedly making it a little more like the image requested until there’s no noise left at all. But it’s also just a smaller and more efficient model, some of the engineers who worked on it told me.

Second, DALL-E does what they call “inpainting,” essentially smart replacement of a given area in an image. Say you have a picture of your place but there are some dirty dishes on the table. Simply select that area and describe what you want instead: “an empty wooden table,” or “a table without dishes on it,” whatever seems logical. In seconds, the model will show you a handful of interpretations of that prompt, and you can pick whatever looks best.

You may be familiar with something similar in Photoshop, “context-aware fill.” But that tool is more for filling in a space with more of the same, like if you want to replace a bird in an otherwise clear sky and don’t want to bother with clone stamping. DALL-E 2’s capabilities are much greater, able to invent new things, for example a different kind of bird, or a cloud, or in the case of the table, a vase of flowers or a spilled bottle of ketchup. It’s not hard to imagine useful applications for this.

Notably, the model will include things like appropriate lighting and shadows, or choose correct materials, since it’s aware of the rest of the scene. I use “aware” loosely here — no one, not even its creators, knows how DALL-E represents these concepts internally, but what matters for these purposes is that the results suggest that it has some form of understanding.

Examples of teddy bears in an ukiyo-e style and a quaint flower shop. Image Credits: OpenAI

The third new capability is “variations,” which is accurate enough: You give the system an example image and it generates as many variations on it as you like, from very close approximations to impressionistic redos. You can even give it a second image and it will sort of cross-pollinate them, combining the most salient aspects of each. The demo they showed me had DALL-E 2 generating street murals based on an original, and it really did capture the artist’s style for the most part, even if it was probably clear on inspection which was the original.

It’s hard to overstate the quality of these images compared with other generators I’ve seen. Although there are almost always the kinds of “tells” you expect from AI-generated imagery, they’re less obvious and the rest of the image is way better than the best generated by others.

Almost anything

I wrote that DALL-E 2 can draw “almost anything” before, though there’s not really any technical limitation that would prevent the model from convincingly drawing anything you can come up with. But OpenAI is conscious of the risk presented by deepfakes and other misuses of AI-generated imagery and content, and so has added some restrictions for their latest model.

DALL-E 2 runs on a hosted platform for now, an invite-only test environment where developers can try it out in a controlled way. Part of that means that all their prompts for the model are evaluated for violations of a content policy that prohibits, as they put it, “images that are not G-rated.”

That means no: hate, harassment, violence, self-harm, explicit or “shocking” imagery, illegal activities, deception (e.g., fake news reports), political actors or situations, medical or disease-related imagery, or general spam. In fact much of this won’t be possible as violating imagery was excluded from the training set: DALL-E 2 can do a shiba inu in a beret, but it doesn’t even know what a missile strike is.

In addition to prompts being evaluated, the resultant imagery will all (for now) be reviewed by human inspectors. That’s obviously not scalable, but the team told me that this is part of the learning process. They’re not sure exactly how the boundaries should work, which is why they’re keeping the platform small and self-hosted for now.

In time DALL-E 2 will likely be turned into an API that can be called like OpenAI’s other functions, but the team said they want to be sure that’s wise before taking the training wheels off.

You can learn more about DALL-E 2 and test out some semi-interactive examples over at the OpenAI blog post.

More TechCrunch

Rivian needs to sell its new revamped vehicles at a profit in order to sustain itself long enough to get to the cheaper mass market R2 SUV on the road.

Rivian’s path to survival is now remarkably clear

Featured Article

What to expect from WWDC 2024: iOS 18, macOS 15 and so much AI

Apple is hoping to make WWDC 2024 memorable as it finally spells out its generative AI plans.

41 mins ago
What to expect from WWDC 2024: iOS 18, macOS 15 and so much AI

In a research note, HSBC estimates that the Indian edtech giant Byju’s, once valued at $22 billion, is now worth nothing.

HSBC believes that $22 billion Byju’s is now worth zero

As WWDC 2024 nears, all sorts of rumors and leaks have emerged about what iOS 18 and its AI-powered apps and features have in store.

What to expect from Apple’s AI-powered iOS 18 at WWDC 2024

Apple’s annual list of what it considers the best and most innovative software available on its platform is turning its attention to the little guy.

Apple’s Design Awards winners highlight indies and startups

Meta launched its Meta Verified program today along with other features, such as the ability to call large businesses and custom messages.

Meta rolls out Meta Verified for WhatsApp Business users in Brazil, India, Indonesia and Colombia

Last year, during the Q3 2023 earnings call, Mark Zuckerberg talked about leveraging AI to have business accounts respond to customers for purchase and support queries. Today, Meta announced AI-powered…

Meta adds AI-powered features to WhatsApp Business app

TikTok is testing streaks that are similar to Snapchat’s in order to boost engagement, including how long people stay on the app.

TikTok is testing Snapchat-like streaks

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Your usual…

Inside Fisker’s collapse and robotaxis come to more US cities

New York-based Revel has made a lot of pivots since initially launching in 2018 as a dockless e-moped sharing service. The BlackRock-backed startup briefly stepped into the e-bike subscription business.…

Revel to lay off 1,000 staff ride-hail drivers, saying they’d rather be contractors anyway

Google says apps offering AI features will have to prevent the generation of restricted content.

Google Play cracks down on AI apps after circulation of apps for making deepfake nudes

The British retailers association also takes aim at Amazon’s “Buy Box,” claiming that Amazon manipulated which retailers were selected for the coveted placement.

UK retailers file a £1.1B collective action against Amazon over claims of data misuse

Featured Article

Rivian overhauled the R1S and R1T to entice new buyers ahead of cheaper R2 launch

Rivian has changed 600 parts on its R1S SUV and R1T pickup truck in a bid to drive down manufacturing costs, while improving performance of its flagship vehicles.  The end goal, which will play out over the coming year, is an existential one. Rivian lost about $38,784 on every vehicle…

5 hours ago
Rivian overhauled the R1S and R1T to entice new buyers ahead of cheaper R2 launch

Twitch has come up with a solution for the ongoing copyright issues that DJs encounter on the platform. The company announced Thursday a new program that enables DJs to stream…

Twitch DJs will now have to pay music labels to play songs in livestreams

Google said today it is partnering with RapidSOS, a platform for emergency first responders, to enable users to contact 911 through RCS (Rich Messaging Service).

Google partners with RapidSOS to enable 911 contact through RCS

Long before product-led growth became a buzzword, Atlassian offered free tiers for virtually all of its productivity and developer tools. Today, that mostly means free access for up to 10…

Atlassian now gives startups a year of free access

Featured Article

A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies

Artists have finally had enough with Meta’s predatory AI policies, but Meta’s loss is Cara’s gain. An artist-run, anti-AI social platform, Cara has grown from 40,000 to 650,000 users within the last week, catapulting it to the top of the App Store charts. Instagram is a necessity for many artists,…

5 hours ago
A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies

Google has developed a new AI tool to help marine biologists better understand coral reef ecosystems and their health, which can aid in conversation efforts. The tool, SurfPerch, created with…

Google looks to AI to help save the coral reefs

Only a few years ago, one of the hottest topics in enterprise software was ‘robotic process automation’ (RPA). It doesn’t feel like those services, which tried to automate a lot…

Tektonic AI raises $10M to build GenAI agents for automating business operations

SpaceX achieved a key milestone in its Starship flight test campaign: returning the booster and the upper stage back to Earth.

SpaceX launches mammoth Starship rocket and brings it back for the first time

There’s a lot of buzz about generative AI and what impact it might have on businesses. But look beyond the hype and high-profile deals like the one between OpenAI and…

Sirion, now valued around $1B, acquires Eigen as consolidation comes to enterprise AI tooling

Carlo Kobe and Scott Smith believed so strongly in the need for a debit card product designed specifically for Gen Zers that they dropped out of Harvard and Cornell at…

Kleiner Perkins leads $14.4M seed round into Fizz, a credit-building debit card aimed at Gen Z college students

A new app called MyGlimpact is intended not only to help people understand their environmental footprint, but why they shouldn’t feel guilty about it.

How many Earths does your lifestyle require?

Prolific Machines believes it has a way of transitioning away from molecules to something better: light.

Prolific Machines, with a $55M Series B, shines ‘light’ on a better way to grow lab proteins for food and medicine

It’s been 20 years since Shira Yevin, the lead singer of punk band Shiragirl drove a pink RV into the Vans Warped Tour grounds, the now-defunct punk rock festival notorious…

Punk singer Shira Yevin pushes for fair pay with InPink, a women-focused job marketplace

While the transport industry does use legacy software, many of these platforms are from an earlier era. Qargo hopes its newer technologies can help it leapfrog the competition.

Qargo raises $14M to digitize and decarbonize the trucking industry

When you look at how generative AI is being implemented across developer tools, the focus for the most part has been on generating code, as with Github Copilot. Greptile, an…

Greptile raises $4M to build an AI-fueled code base expert

The models tended to answer questions inconsistently, which reflects biases embedded in the data used to train the models.

Study finds that AI models hold opposing views on controversial topics

A growing number of businesses are embracing data models — abstract models that organize elements of data and standardize how they relate to one another. But as the data analytics…

Cube is building a ‘semantic layer’ for company data

Stock-trading app Robinhood is diving deeper into the cryptocurrency realm with the acquisition of crypto exchange Bitstamp.

Robinhood acquires global crypto exchange Bitstamp for $200M