OpenAI’s new tool attempts to explain language models’ behaviors

10:00 AM PDT • May 9, 2023

Image Credits: OpenAI

It’s often said that large language models (LLMs) along the lines of OpenAI’s ChatGPT are a black box, and certainly, there’s some truth to that. Even for data scientists, it’s difficult to know why, always, a model responds in the way it does, like inventing facts out of whole cloth.

In an effort to peel back the layers of LLMs, OpenAI is developing a tool to automatically identify which parts of an LLM are responsible for which of its behaviors. The engineers behind it stress that it’s in the early stages, but the code to run it is available in open source on GitHub as of this morning.

“We’re trying to [develop ways to] anticipate what the problems with an AI system will be,” William Saunders, the interpretability team manager at OpenAI, told TechCrunch in a phone interview. “We want to really be able to know that we can trust what the model is doing and the answer that it produces.”

To that end, OpenAI’s tool uses a language model (ironically) to figure out the functions of the components of other, architecturally simpler LLMs — specifically OpenAI’s own GPT-2.

OpenAI explainability — OpenAI’s tool attempts to simulate the behaviors of neurons in an LLM. Image Credits: OpenAI

How? First, a quick explainer on LLMs for background. Like the brain, they’re made up of “neurons,” which observe some specific pattern in text to influence what the overall model “says” next. For example, given a prompt about superheros (e.g. “Which superheros have the most useful superpowers?”), a “Marvel superhero neuron” might boost the probability the model names specific superheroes from Marvel movies.

OpenAI’s tool exploits this setup to break models down into their individual pieces. First, the tool runs text sequences through the model being evaluated and waits for cases where a particular neuron “activates” frequently. Next, it “shows” GPT-4, OpenAI’s latest text-generating AI model, these highly active neurons and has GPT-4 generate an explanation. To determine how accurate the explanation is, the tool provides GPT-4 with text sequences and has it predict, or simulate, how the neuron would behave. In then compares the behavior of the simulated neuron with the behavior of the actual neuron.

“Using this methodology, we can basically, for every single neuron, come up with some kind of preliminary natural language explanation for what it’s doing and also have a score for how how well that explanation matches the actual behavior,” Jeff Wu, who leads the scalable alignment team at OpenAI, said. “We’re using GPT-4 as part of the process to produce explanations of what a neuron is looking for and then score how well those explanations match the reality of what it’s doing.”

The researchers were able to generate explanations for all 307,200 neurons in GPT-2, which they compiled in a dataset that’s been released alongside the tool code.

Tools like this could one day be used to improve an LLM’s performance, the researchers say — for example to cut down on bias or toxicity. But they acknowledge that it has a long way to go before it’s genuinely useful. The tool was confident in its explanations for about 1,000 of those neurons, a small fraction of the total.

A cynical person might argue, too, that the tool is essentially an advertisement for GPT-4, given that it requires GPT-4 to work. Other LLM interpretability tools are less dependent on commercial APIs, like DeepMind’s Tracr, a compiler that translates programs into neural network models.

Wu said that isn’t the case — the fact the tool uses GPT-4 is merely “incidental” — and, on the contrary, shows GPT-4’s weaknesses in this area. He also said it wasn’t created with commercial applications in mind and, in theory, could be adapted to use LLMs besides GPT-4.

“Most of the explanations score quite poorly or don’t explain that much of the behavior of the actual neuron,” Wu said. “A lot of the neurons, for example, are active in a way where it’s very hard to tell what’s going on — like they activate on five or six different things, but there’s no discernible pattern. Sometimes there is a discernible pattern, but GPT-4 is unable to find it.”

That’s to say nothing of more complex, newer and larger models, or models that can browse the web for information. But on that second point, Wu believes that web browsing wouldn’t change the tool’s underlying mechanisms much. It could simply be tweaked, he says, to figure out why neurons decide to make certain search engine queries or access particular websites.

“We hope that this will open up a promising avenue to address interpretability in an automated way that others can build on and contribute to,” Wu said. “The hope is that we really actually have good explanations of not just what neurons are responding to but overall, the behavior of these models — what kinds of circuits they’re computing and how certain neurons affect other neurons.”

More TechCrunch

Apple’s Design Awards nominees highlight indies and startups, largely ignore AI (except for Arc)

Sarah Perez

3 mins ago

Apple’s annual list of what it considers the best and most innovative software available on its platform is turning its attention to the little guy.

Apple’s Design Awards nominees highlight indies and startups, largely ignore AI (except for Arc)

Security

Spyware maker pcTattletale shutters after data breach

Zack Whittaker

9 mins ago

The spyware maker’s founder, Bryan Fleming, said pcTattletale is “out of business and completely done,” following a data breach.

Spyware maker pcTattletale shutters after data breach

AI models have favorite numbers, because they think they’re people

Devin Coldewey

54 mins ago

AI models are always surprising us, not just in what they can do, but what they can’t, and why. An interesting new behavior is both superficial and revealing about these…

AI models have favorite numbers, because they think they’re people

Rock band’s hidden hacking-themed website gets hacked

Lorenzo Franceschi-Bicchierai

3 hours ago

On Friday, Pal Kovacs was listening to the long-awaited new album from rock and metal giants Bring Me The Horizon when he noticed a strange sound at the end of…

Rock band’s hidden hacking-themed website gets hacked

Anthropic hires former OpenAI safety lead to head up new team

Kyle Wiggers

3 hours ago

Jan Leike, a leading AI researcher who earlier this month resigned from OpenAI before publicly criticizing the company’s approach to AI safety, has joined OpenAI rival Anthropic to lead a…

Anthropic hires former OpenAI safety lead to head up new team

Fintech

The demise of BaaS fintech Synapse could derail the funding prospects for other startups in the space

Mary Ann Azevedo

4 hours ago

Welcome to TechCrunch Fintech! This week, we’re looking at the long-term implications of Synapse’s bankruptcy on the fintech sector, Majority’s impressive ARR milestone, and more! To get a roundup of…

The demise of BaaS fintech Synapse could derail the funding prospects for other startups in the space

Gaming

YouTube’s free games catalog ‘Playables’ rolls out to all users

Sarah Perez

4 hours ago

YouTube’s free Playables don’t directly challenge the app store model or break Apple’s rules. However, they do compete with the App Store’s free games.

YouTube’s free games catalog ‘Playables’ rolls out to all users

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the first months of 2024. Smaller-sized…

Alyssa Stringer

Cody Corrall

4 hours ago

A comprehensive list of 2024 tech layoffs

OpenAI’s new safety committee is made up of all insiders

Kyle Wiggers

5 hours ago

OpenAI has formed a new committee to oversee “critical” safety and security decisions related to the company’s projects and operations. But, in a move that’s sure to raise the ire…

OpenAI’s new safety committee is made up of all insiders

Early bird gets the savings — 4 days left for Disrupt sale

TechCrunch Events

5 hours ago

Time is running out for tech enthusiasts and entrepreneurs to secure their early-bird tickets for TechCrunch Disrupt 2024! With only four days left until the May 31 deadline, now is…

Early bird gets the savings — 4 days left for Disrupt sale

Skej’s AI meeting scheduling assistant works like adding an EA to your email

Sarah Perez

6 hours ago

AI may not be up to the task of replacing Google Search just yet, but it can be useful in more specific contexts — including handling the drudgery that comes…

Skej’s AI meeting scheduling assistant works like adding an EA to your email

Apps

Faircado raises $3M to nudge people to buy pre-owned goods

Ivan Mehta

7 hours ago

Faircado has built a browser extension that suggests pre-owned alternatives for ecommerce listings.

Faircado raises $3M to nudge people to buy pre-owned goods

Apps

Tumblr launches its semi-private Communities in open beta

Aisha Malik

7 hours ago

Tumblr, the blogging site acquired twice, is launching its “Communities” feature in open beta, the Tumblr Labs division has announced. The feature offers a dedicated space for users to connect…

Tumblr launches its semi-private Communities in open beta

Fintech

Félix Pago raises $15.5 million to help Latino workers send money home via WhatsApp

Anna Heim

7 hours ago

Remittances from workers in the U.S. to their families and friends in Latin America amounted to $155 billion in 2023. With such a huge opportunity, banks, money transfer companies, retailers,…

Félix Pago raises $15.5 million to help Latino workers send money home via WhatsApp

Gadgets

Google adds AI-powered features to Chromebook

Ivan Mehta

8 hours ago

Google said today it’s adding new AI-powered features such as a writing assistant and a wallpaper creator and providing easy access to Gemini chatbot to its Chromebook Plus line of…

Google adds AI-powered features to Chromebook

Startups

The Chainsmokers light up Disrupt 2024

TechCrunch Events

8 hours ago

The dynamic duo behind the Grammy Award–winning music group the Chainsmokers, Alex Pall and Drew Taggart, are set to bring their entrepreneurial expertise to TechCrunch Disrupt 2024. Known for their…

Enterprise

LumApps, the French ‘intranet super app,’ sells majority stake to Bridgepoint in a $650M deal

Ingrid Lunden

8 hours ago

The deal will give LumApps a big nest egg to make acquisitions and scale its business.

LumApps, the French ‘intranet super app,’ sells majority stake to Bridgepoint in a $650M deal

Featured Article

More neobanks are becoming mobile networks — and Nubank wants a piece of the action

Nubank is taking its first tentative steps into the mobile network realm, as the NYSE-traded Brazilian neobank rolls out an eSIM (embedded SIM) service for travelers. The service will give customers access to 10GB of free roaming internet in more than 40 countries without having to switch out their own existing physical SIM card or…

Paul Sawers

12 hours ago

More neobanks are becoming mobile networks — and Nubank wants a piece of the action

Fundraising

MARS doubles down on India’s Infra.Market with new $50M investment

Manish Singh

14 hours ago

Infra.Market, an Indian startup that helps construction and real estate firms procure materials, has raised $50M from MARS Unicorn Fund.

MARS doubles down on India’s Infra.Market with new $50M investment

Climate

Cloover wants to speed solar adoption by helping installers finance new sales

Tim De Chant

15 hours ago

Small operations can lose customers by not offering financing, something the Berlin-based startup wants to change.

Cloover wants to speed solar adoption by helping installers finance new sales

Commerce

Adani looks to battle Reliance, Walmart in India’s e-commerce, payments race, report says

Manish Singh

16 hours ago

India’s Adani Group is in discussions to venture into digital payments and e-commerce, according to a report.

Adani looks to battle Reliance, Walmart in India’s e-commerce, payments race, report says

Crypto

Ledger starts shipping its high-end hardware crypto wallet

Romain Dillet

16 hours ago

Ledger, a French startup mostly known for its secure crypto hardware wallets, has started shipping new wallets nearly 18 months after announcing the latest Ledger Stax devices. The updated wallet…

Ledger starts shipping its high-end hardware crypto wallet

Privacy

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

Natasha Lomas

1 day ago

A data protection taskforce that’s spent over a year considering how the European Union’s data protection rulebook applies to OpenAI’s viral chatbot, ChatGPT, reported preliminary conclusions Friday. The top-line takeaway…

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

LatAm startups: Apply to Startup Battlefield 200

TechCrunch Events

1 day ago

Here’s a shoutout to LatAm early-stage startup founders! We want YOU to apply for the Startup Battlefield 200 at TechCrunch Disrupt 2024. But you’d better hurry — time is running…

LatAm startups: Apply to Startup Battlefield 200

5 days left to get your early-bird Disrupt passes

TechCrunch Events

1 day ago

The countdown to early-bird savings for TechCrunch Disrupt, taking place October 28–30 in San Francisco, continues. You have just five days left to save up to $800 on the price…

5 days left to get your early-bird Disrupt passes

Venture

Spanish startups reached €100 billion in aggregate value last year

Anna Heim

1 day ago

Venture investment into Spanish startups also held up quite well, with €2.2 billion raised across some 850 funding rounds.

Spanish startups reached €100 billion in aggregate value last year

Featured Article

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

James Khatiblou, the owner and CEO of Onyx Motorbikes, was watching his e-bike startup fall apart. Onyx was being evicted from its warehouse in El Segundo, near Los Angeles. The company’s unpaid bills were stacking up. Its chief operating officer had abruptly resigned. A shipment of around 100 CTY2 dirt bikes from Chinese supplier Suzhou…

Rebecca Bellan

1 day ago

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

Featured Article

Iyo thinks its GenAI earbuds can succeed where Humane and Rabbit stumbled

Iyo represents a third form factor in the push to deliver standalone generative AI devices: Bluetooth earbuds.

Brian Heater

1 day ago

Iyo thinks its GenAI earbuds can succeed where Humane and Rabbit stumbled

Women in AI: Arati Prabhakar thinks it’s crucial to get AI ‘right’

Kyle Wiggers

1 day ago

Arati Prabhakar, profiled as part of TechCrunch’s Women in AI series, is director of the White House Office of Science and Technology Policy.

Women in AI: Arati Prabhakar thinks it’s crucial to get AI ‘right’

Apps

Doly lets you generate 3D product videos from your iPhone

Romain Dillet

1 day ago

AniML, the French startup behind a new 3D capture app called Doly, wants to create the PhotoRoom of product videos, sort of. If you’re selling sneakers on an online marketplace…

OpenAI’s new tool attempts to explain language models’ behaviors

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags