Featured Article

The Great Pretender

AI doesn’t know the answer, and it hasn’t learned how to care

Comment

Hands holding a mask of anonymity. Polygonal design of interconnected elements.
Image Credits: llya Lukichev / Getty / Getty Images

There is a good reason not to trust what today’s AI constructs tell you, and it has nothing to do with the fundamental nature of intelligence or humanity, with Wittgensteinian concepts of language representation, or even disinfo in the dataset. All that matters is that these systems do not distinguish between something that is correct and something that looks correct. Once you understand that the AI considers these things more or less interchangeable, everything makes a lot more sense.

Now, I don’t mean to short circuit any of the fascinating and wide-ranging discussions about this happening continually across every form of media and conversation. We have everyone from philosophers and linguists to engineers and hackers to bartenders and firefighters questioning and debating what “intelligence” and “language” truly are, and whether something like ChatGPT possesses them.

This is amazing! And I’ve learned a lot already as some of the smartest people in this space enjoy their moment in the sun, while from the mouths of comparative babes come fresh new perspectives.

But at the same time, it’s a lot to sort through over a beer or coffee when someone asks “what about all this GPT stuff, kind of scary how smart AI is getting, right?” Where do you start — with Aristotle, the mechanical Turk, the perceptron or “Attention is all you need”?

During one of these chats I hit on a simple approach that I’ve found helps people get why these systems can be both really cool and also totally untrustable, while subtracting not at all from their usefulness in some domains and the amazing conversations being had around them. I thought I’d share it in case you find the perspective useful when talking about this with other curious, skeptical people who nevertheless don’t want to hear about vectors or matrices.

There are only three things to understand, which lead to a natural conclusion:

  1. These models are created by having them observe the relationships between words and sentences and so on in an enormous dataset of text, then build their own internal statistical map of how all these millions and millions of words and concepts are associated and correlated. No one has said, this is a noun, this is a verb, this is a recipe, this is a rhetorical device; but these are things that show up naturally in patterns of usage.
  2. These models are not specifically taught how to answer questions, in contrast to the familiar software companies like Google and Apple have been calling AI for the last decade. Those are basically Mad Libs with the blanks leading to APIs: Every question is either accounted for or produces a generic response. With large language models the question is just a series of words like any other.
  3. These models have a fundamental expressive quality of “confidence” in their responses. In a simple example of a cat recognition AI, it would go from 0, meaning completely sure that’s not a cat, to 100, meaning absolutely sure that’s a cat. You can tell it to say “yes, it’s a cat” if it’s at a confidence of 85, or 90, whatever produces your preferred response metric.

So given what we know about how the model works, here’s the crucial question: What is it confident about? It doesn’t know what a cat or a question is, only statistical relationships found between data nodes in a training set. A minor tweak would have the cat detector equally confident the picture showed a cow, or the sky, or a still life painting. The model can’t be confident in its own “knowledge” because it has no way of actually evaluating the content of the data it has been trained on.

The AI is expressing how sure it is that its answer appears correct to the user.

This is true of the cat detector, and it is true of GPT-4 — the difference is a matter of the length and complexity of the output. The AI cannot distinguish between a right and wrong answer — it only can make a prediction of how likely a series of words is to be accepted as correct. That is why it must be considered the world’s most comprehensively informed bullshitter rather than an authority on any subject. It doesn’t even know it’s bullshitting you — it has been trained to produce a response that statistically resembles a correct answer, and it will say anything to improve that resemblance.

The AI doesn’t know the answer to any question, because it doesn’t understand the question. It doesn’t know what questions are. It doesn’t “know” anything! The answer follows the question because, extrapolating from its statistical analysis, that series of words is the most likely to follow the previous series of words. Whether those words refer to real places, people, locations, etc. is not material — only that they are like real ones.

It’s the same reason AI can produce a Monet-like painting that isn’t a Monet — all that matters is it has all the characteristics that cause people to identify a piece of artwork as his. Today’s AI approximates factual responses the way it would approximate “Water Lilies.”

Now, I hasten to add that this isn’t an original or groundbreaking concept — it’s basically another way to explain the stochastic parrot, or the undersea octopus. Those problems were identified very early by very smart people and represent a great reason to read commentary on tech matters widely.

Ethicists fire back at ‘AI Pause’ letter they say ‘ignores the actual harms’

But in the context of today’s chatbot systems, I’ve just found that people intuitively get this approach: The models don’t understand facts or concepts, but relationships between words, and its responses are an “artist’s impression” of an answer. Their goal, when you get down to it, is to fill in the blank convincingly, not correctly. This is the reason why its responses fundamentally cannot be trusted.

Of course sometimes, even a lot of the time, its answer is correct! And that isn’t an accident: For many questions, the answer that looks the most correct is the correct answer. That is what makes these models so powerful — and dangerous. There is so, so much you can extract from a systematic study of millions of words and documents. And unlike recreating “Water Lilies” exactly, there’s a flexibility to language that lets an approximation of a factual response also be factual — but also make a totally or partially invented response appear equally or more so. The only thing the AI cares about is that the answer scans right.

This leaves the door open to discussions around whether this is truly knowledge, what if anything the models “understand,” if they have achieved some form of intelligence, what intelligence even is and so on. Bring on the Wittgenstein!

Furthermore, it also leaves open the possibility of using these tools in situations where truth isn’t really a concern. If you want to generate five variants of an opening paragraph to get around writer’s block, an AI might be indispensable. If you want to make up a story about two endangered animals, or write a sonnet about Pokémon, go for it. As long as it is not crucial that the response reflects reality, a large language model is a willing and able partner — and not coincidentally, that’s where people seem to be having the most fun with it.

Where and when AI gets it wrong is very, very difficult to predict because the models are too large and opaque. Imagine a card catalog the size of a continent, organized and updated over a period of a hundred years by robots, from first principles that they came up with on the fly. You think you can just walk in and understand the system? It gives a right answer to a difficult question and a wrong answer to an easy one. Why? Right now that is one question that neither AI nor its creators can answer.

This may well change in the future, perhaps even the near future. Everything is moving so quickly and unpredictably that nothing is certain. But for the present this is a useful mental model to keep in mind: The AI wants you to believe it and will say anything to improve its chances.

More TechCrunch

Live Nation says its Ticketmaster subsidiary was hacked. A hacker claims to be selling 560 million customer records.

Live Nation confirms Ticketmaster was hacked, says personal information stolen in data breach

Featured Article

Inside EV startup Fisker’s collapse: how the company crumbled under its founders’ whims

An autonomous pod. A solid-state battery-powered sports car. An electric pickup truck. A convertible grand tourer EV with up to 600 miles of range. A “fully connected mobility device” for young urban innovators to be built by Foxconn and priced under $30,000. The next Popemobile. Over the past eight years, famed vehicle designer Henrik Fisker…

56 mins ago
Inside EV startup Fisker’s collapse: how the company crumbled under its founders’ whims

Late Friday afternoon, a time window companies usually reserve for unflattering disclosures, AI startup Hugging Face said that its security team earlier this week detected “unauthorized access” to Spaces, Hugging…

Hugging Face says it detected ‘unauthorized access’ to its AI model hosting platform

Featured Article

Hacked, leaked, exposed: Why you should never use stalkerware apps

Using stalkerware is creepy, unethical, potentially illegal, and puts your data and that of your loved ones in danger.

2 hours ago
Hacked, leaked, exposed: Why you should never use stalkerware apps

The design brief was simple: each grind and dry cycle had to be completed before breakfast. Here’s how Mill made it happen.

Mill’s redesigned food waste bin really is faster and quieter than before

Google is embarrassed about its AI Overviews, too. After a deluge of dunks and memes over the past week, which cracked on the poor quality and outright misinformation that arose…

Google admits its AI Overviews need work, but we’re all helping it beta test

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. In…

Startups Weekly: Musk raises $6B for AI and the fintech dominoes are falling

The product, which ZeroMark calls a “fire control system,” has two components: a small computer that has sensors, like lidar and electro-optical, and a motorized buttstock.

a16z-backed ZeroMark wants to give soldiers guns that don’t miss against drones

The RAW Dating App aims to shake up the dating scheme by shedding the fake, TikTok-ified, heavily filtered photos and replacing them with a more genuine, unvarnished experience. The app…

Pitch Deck Teardown: RAW Dating App’s $3M angel deck

Yes, we’re calling it “ThreadsDeck” now. At least that’s the tag many are using to describe the new user interface for Instagram’s X competitor, Threads, which resembles the column-based format…

‘ThreadsDeck’ arrived just in time for the Trump verdict

Japanese crypto exchange DMM Bitcoin confirmed on Friday that it had been the victim of a hack resulting in the theft of 4,502.9 bitcoin, or about $305 million.  According to…

Hackers steal $305M from DMM Bitcoin crypto exchange

This is not a drill! Today marks the final day to secure your early-bird tickets for TechCrunch Disrupt 2024 at a significantly reduced rate. At midnight tonight, May 31, ticket…

Disrupt 2024 early-bird prices end at midnight

Instagram is testing a way for creators to experiment with reels without committing to having them displayed on their profiles, giving the social network a possible edge over TikTok and…

Instagram tests ‘trial reels’ that don’t display to a creator’s followers

U.S. federal regulators have requested more information from Zoox, Amazon’s self-driving unit, as part of an investigation into rear-end crash risks posed by unexpected braking. The National Highway Traffic Safety…

Feds tell Zoox to send more info about autonomous vehicles suddenly braking

You thought the hottest rap battle of the summer was between Kendrick Lamar and Drake. You were wrong. It’s between Canva and an enterprise CIO. At its Canva Create event…

Canva’s rap battle is part of a long legacy of Silicon Valley cringe

Voice cloning startup ElevenLabs introduced a new tool for users to generate sound effects through prompts today after announcing the project back in February.

ElevenLabs debuts AI-powered tool to generate sound effects

We caught up with Antler founder and CEO Magnus Grimeland about the startup scene in Asia, the current tech startup trends in the region and investment approaches during the rise…

VC firm Antler’s CEO says Asia presents ‘biggest opportunity’ in the world for growth

Temu is to face Europe’s strictest rules after being designated as a “very large online platform” under the Digital Services Act (DSA).

Chinese e-commerce marketplace Temu faces stricter EU rules as a ‘very large online platform’

Meta has been banned from launching features on Facebook and Instagram that would have collected data on voters in Spain using the social networks ahead of next month’s European Elections.…

Spain bans Meta from launching election features on Facebook, Instagram over privacy fears

Stripe, the world’s most valuable fintech startup, said on Friday that it will temporarily move to an invite-only model for new account sign-ups in India, calling the move “a tough…

Stripe curbs its India ambitions over regulatory situation

The 2024 election is likely to be the first in which faked audio and video of candidates is a serious factor. As campaigns warm up, voters should be aware: voice…

Voice cloning of political figures is still easy as pie

When Alex Ewing was a kid growing up in Purcell, Oklahoma, he knew how close he was to home based on which billboards he could see out the car window.…

OneScreen.ai brings startup ads to billboards and NYC’s subway

SpaceX’s massive Starship rocket could take to the skies for the fourth time on June 5, with the primary objective of evaluating the second stage’s reusable heat shield as the…

SpaceX sent Starship to orbit — the next launch will try to bring it back

Eric Lefkofsky knows the public listing rodeo well and is about to enter it for a fourth time. The serial entrepreneur, whose net worth is estimated at nearly $4 billion,…

Billionaire Groupon founder Eric Lefkofsky is back with another IPO: AI health tech Tempus

TechCrunch Disrupt showcases cutting-edge technology and innovation, and this year’s edition will not disappoint. Among thousands of insightful breakout session submissions for this year’s Audience Choice program, five breakout sessions…

You’ve spoken! Meet the Disrupt 2024 breakout session audience choice winners

Check Point is the latest security vendor to fix a vulnerability in its technology, which it sells to companies to protect their networks.

Zero-day flaw in Check Point VPNs is ‘extremely easy’ to exploit

Though Spotify never shared official numbers, it’s likely that Car Thing underperformed or was just not worth continued investment in today’s tighter economic market.

Spotify offers Car Thing refunds as it faces lawsuit over bricking the streaming device

The studies, by researchers at MIT, Ben-Gurion University, Cambridge and Northeastern, were independently conducted but complement each other well.

Misinformation works, and a handful of social ‘supersharers’ sent 80% of it in 2020

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Okay, okay…

Tesla shareholder sweepstakes and EV layoffs hit Lucid and Fisker

In a series of posts on X on Thursday, Paul Graham, the co-founder of startup accelerator Y Combinator, brushed off claims that OpenAI CEO Sam Altman was pressured to resign…

Paul Graham claims Sam Altman wasn’t fired from Y Combinator