AI

OpenAI’s new tool attempts to explain language models’ behaviors

Comment

OpenAI's logo
Image Credits: OpenAI

It’s often said that large language models (LLMs) along the lines of OpenAI’s ChatGPT are a black box, and certainly, there’s some truth to that. Even for data scientists, it’s difficult to know why, always, a model responds in the way it does, like inventing facts out of whole cloth.

In an effort to peel back the layers of LLMs, OpenAI is developing a tool to automatically identify which parts of an LLM are responsible for which of its behaviors. The engineers behind it stress that it’s in the early stages, but the code to run it is available in open source on GitHub as of this morning.

“We’re trying to [develop ways to] anticipate what the problems with an AI system will be,” William Saunders, the interpretability team manager at OpenAI, told TechCrunch in a phone interview. “We want to really be able to know that we can trust what the model is doing and the answer that it produces.”

To that end, OpenAI’s tool uses a language model (ironically) to figure out the functions of the components of other, architecturally simpler LLMs — specifically OpenAI’s own GPT-2.

OpenAI explainability
OpenAI’s tool attempts to simulate the behaviors of neurons in an LLM. Image Credits: OpenAI

How? First, a quick explainer on LLMs for background. Like the brain, they’re made up of “neurons,” which observe some specific pattern in text to influence what the overall model “says” next. For example, given a prompt about superheros (e.g. “Which superheros have the most useful superpowers?”), a “Marvel superhero neuron” might boost the probability the model names specific superheroes from Marvel movies.

OpenAI’s tool exploits this setup to break models down into their individual pieces. First, the tool runs text sequences through the model being evaluated and waits for cases where a particular neuron “activates” frequently. Next, it “shows” GPT-4, OpenAI’s latest text-generating AI model, these highly active neurons and has GPT-4 generate an explanation. To determine how accurate the explanation is, the tool provides GPT-4 with text sequences and has it predict, or simulate, how the neuron would behave. In then compares the behavior of the simulated neuron with the behavior of the actual neuron.

“Using this methodology, we can basically, for every single neuron, come up with some kind of preliminary natural language explanation for what it’s doing and also have a score for how how well that explanation matches the actual behavior,” Jeff Wu, who leads the scalable alignment team at OpenAI, said. “We’re using GPT-4 as part of the process to produce explanations of what a neuron is looking for and then score how well those explanations match the reality of what it’s doing.”

The researchers were able to generate explanations for all 307,200 neurons in GPT-2, which they compiled in a dataset that’s been released alongside the tool code.

Tools like this could one day be used to improve an LLM’s performance, the researchers say — for example to cut down on bias or toxicity. But they acknowledge that it has a long way to go before it’s genuinely useful. The tool was confident in its explanations for about 1,000 of those neurons, a small fraction of the total.

A cynical person might argue, too, that the tool is essentially an advertisement for GPT-4, given that it requires GPT-4 to work. Other LLM interpretability tools are less dependent on commercial APIs, like DeepMind’s Tracr, a compiler that translates programs into neural network models.

Wu said that isn’t the case — the fact the tool uses GPT-4 is merely “incidental” — and, on the contrary, shows GPT-4’s weaknesses in this area. He also said it wasn’t created with commercial applications in mind and, in theory, could be adapted to use LLMs besides GPT-4.

OpenAI explainability
The tool identifies neurons activating across layers in the LLM. Image Credits: OpenAI

“Most of the explanations score quite poorly or don’t explain that much of the behavior of the actual neuron,” Wu said. “A lot of the neurons, for example, are active in a way where it’s very hard to tell what’s going on — like they activate on five or six different things, but there’s no discernible pattern. Sometimes there is a discernible pattern, but GPT-4 is unable to find it.”

That’s to say nothing of more complex, newer and larger models, or models that can browse the web for information. But on that second point, Wu believes that web browsing wouldn’t change the tool’s underlying mechanisms much. It could simply be tweaked, he says, to figure out why neurons decide to make certain search engine queries or access particular websites.

“We hope that this will open up a promising avenue to address interpretability in an automated way that others can build on and contribute to,” Wu said. “The hope is that we really actually have good explanations of not just what neurons are responding to but overall, the behavior of these models — what kinds of circuits they’re computing and how certain neurons affect other neurons.”

More TechCrunch

The best known mycoprotein is probably Quorn, a meat substitute that’s fast approaching its 40th birthday. But Finnish biotech startup Enifer is cooking up something even older: Its proprietary single-cell…

Meet the Finnish biotech startup bringing a long lost mycoprotein to your plate

Silo, a Bay Area food supply chain startup, has hit a rough patch. TechCrunch has learned that the company on Tuesday laid off roughly 30% of its staff, or north…

Food supply chain software maker Silo lays off ~30% of staff amid M&A discussions

Featured Article

Meta’s new AI council is composed entirely of white men

Meanwhile, women and people of color are disproportionately impacted by irresponsible AI.

11 hours ago
Meta’s new AI council is composed entirely of white men

If you’ve ever wanted to apply to Y Combinator, here’s some inside scoop on how the iconic accelerator goes about choosing companies.

Garry Tan has revealed his ‘secret sauce’ for getting into Y Combinator

Indian ride-hailing startup BluSmart has started operating in Dubai, TechCrunch has exclusively learned and confirmed with its executive. The move to Dubai, which has been rumored for months, could help…

India’s BluSmart is testing its ride-hailing service in Dubai

Under the envisioned framework, both candidate and issue ads would be required to include an on-air and filed disclosure that AI-generated content was used.

FCC proposes all AI-generated content in political ads must be disclosed

Want to make a founder’s day, week, month, and possibly career? Refer them to Startup Battlefield 200 at Disrupt 2024! Applications close June 10 at 11:59 p.m. PT. TechCrunch’s Startup…

Refer a founder to Startup Battlefield 200 at Disrupt 2024

Social networking startup and X competitor Bluesky is officially launching DMs (direct messages), the company announced on Wednesday. Later, Bluesky plans to “fully support end-to-end encrypted messaging down the line,”…

Bluesky now has DMs

The perception in Silicon Valley is that every investor would love to be in business with Peter Thiel. But the venture capital fundraising environment has become so difficult that even…

Peter Thiel-founded Valar Ventures raised a $300 million fund, half the size of its last one

Featured Article

Spyware found on US hotel check-in computers

Several hotel check-in computers are running a remote access app, which is leaking screenshots of guest information to the internet.

14 hours ago
Spyware found on US hotel check-in computers

Gavet has had a rocky tenure at Techstars and her leadership was the subject of much controversy.

Techstars CEO Maëlle Gavet is out

The struggle isn’t universal, however.

Connected fitness is adrift post-pandemic

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the first months of 2024. Smaller-sized…

16 hours ago
A comprehensive list of 2024 tech layoffs

HoundDog actually looks at the code a developer is writing, using both traditional pattern matching and large language models to find potential issues.

HoundDog.ai helps developers prevent personal information from leaking

The changes are designed to enhance the consumer experience of using Google Pay and make it a more competitive option against other payment methods.

Google Pay will now display card perks, BNPL options and more

Few figures in the tech industry have earned the storied reputation of Vinod Khosla, founder and partner at Khosla Ventures. For over 40 years, he has been at the center…

Vinod Khosla is coming to Disrupt to discuss how AI might change the future

AI has already started replacing voice agents’ jobs. Now, companies are exploring ways to replace the existing computer-generated voice models with synthetic versions of human voices. Truecaller, the widely known…

Truecaller partners with Microsoft to let its AI respond to calls in your own voice

Meta is updating its Ray-Ban smart glasses with new hands-free functionality, the company announced on Wednesday. Most notably, users can now share an image from their smart glasses directly to…

Meta’s Ray-Ban smart glasses now let you share images directly to your Instagram Story

Spotify launched its own font, the company announced on Wednesday. The music streaming service hopes that its new typeface, “Spotify Mix,” will help Spotify distinguish its own unique visual identity. …

Why Spotify is launching its own font, Spotify Mix

In 2008, Marty Kagan, who’d previously worked at Cisco and Akamai, co-founded Cedexis, a (now-Cisco-owned) firm developing observability tech for content delivery networks. Fellow Cisco veteran Hasan Alayli joined Kagan…

Hydrolix seeks to make storing log data faster and cheaper

A dodgy email containing a link that looks “legit” but is actually malicious remains one of the most dangerous, yet successful, tricks in a cybercriminal’s handbook. Now, an AI startup…

Bolster, creator of the CheckPhish phishing tracker, raises $14M led by Microsoft’s M12

If you’ve been looking forward to seeing Boeing’s Starliner capsule carry two astronauts to the International Space Station for the first time, you’ll have to wait a bit longer. The…

Boeing, NASA indefinitely delay crewed Starliner launch

TikTok is the latest tech company to incorporate generative AI into its ads business, as the company announced on Tuesday that it’s launching a new “TikTok Symphony” AI suite for…

TikTok turns to generative AI to boost its ads business

Gone are the days when space and defense were considered fundamentally antithetical to venture investment. Now, the country’s largest venture capital firms are throwing larger portions of their money behind…

Space VC closes $20M Fund II to back frontier tech founders from day zero

These days every company is trying to figure out if their large language models are compliant with whichever rules they deem important, and with legal or regulatory requirements. If you’re…

Patronus AI is off to a magical start as LLM governance tool gains traction

Link-in-bio startup Linktree has crossed 50 million users and is rolling out the beta of its social commerce program.

Linktree surpasses 50M users, rolls out its social commerce program to more creators

For a $5.99 per month, immigrants have a bank account and debit card with fee-free international money transfers and discounted international calling.

Immigrant banking platform Majority secures $20M following 3x revenue growth

When developers have a particular job that AI can solve, it’s not typically as simple as just pointing an LLM at the data. There are other considerations such as cost,…

Unify helps developers find the best LLM for the job

Response time is Aerodome’s immediate value prop for potential clients.

Aerodome is sending drones to the scene of the crime