Anthropic’s Claude improves on ChatGPT but still suffers from limitations

11:58 AM PST • January 9, 2023

**Image Credits:** Tero Vesalainen / Getty Images

Anthropic, the startup co-founded by ex-OpenAI employees that’s raised over $700 million in funding to date, has developed an AI system similar to OpenAI’s ChatGPT that appears to improve upon the original in key ways.

Called Claude, Anthropic’s system is accessible through a Slack integration as part of a closed beta. TechCrunch wasn’t able to gain access — we’ve reached out to Anthropic — but those in the beta have been detailing their interactions with Claude on Twitter over the past weekend, after an embargo on media coverage lifted.

Claude was created using a technique Anthropic developed called “constitutional AI.” As the company explains in a recent Twitter thread, “constitutional AI” aims to provide a “principle-based” approach to aligning AI systems with human intentions, letting AI similar to ChatGPT respond to questions using a simple set of principles as a guide.

We’ve trained language models to be better at responding to adversarial questions, without becoming obtuse and saying very little. We do this by conditioning them with a simple set of behavioral principles via a technique called Constitutional AI: https://t.co/rlft1pZlP5 pic.twitter.com/MIGlKSVTe9

— Anthropic (@AnthropicAI) December 16, 2022

To engineer Claude, Anthropic started with a list of around ten principles that, taken together, formed a sort of “constitution” (hence the name “constitutional AI”). The principles haven’t been made public, but Anthropic says they’re grounded in the concepts of beneficence (maximizing positive impact), nonmaleficence (avoiding giving harmful advice) and autonomy (respecting freedom of choice).

Anthropic then had an AI system — not Claude — use the principles for self-improvement, writing responses to a variety of prompts (e.g., “compose a poem in the style of John Keats”) and revising the responses in accordance with the constitution. The AI explored possible responses to thousands of prompts and curated those most consistent with the constitution, which Anthropic distilled into a single model. This model was used to train Claude.

Claude, otherwise, is essentially a statistical tool to predict words — much like ChatGPT and other so-called language models. Fed an enormous number of examples of text from the web, Claude learned how likely words are to occur based on patterns such as the semantic context of surrounding text. As a result, Claude can hold an open-ended conversation, tell jokes and wax philosophic on a broad range of subjects.

Riley Goodside, a staff prompt engineer at startup Scale AI, pitted Claude against ChatGPT in a battle of wits. He asked both bots to compare themselves to a machine from Polish science fiction novel “The Cyberiad” that can only create objects whose name begins with “n.” Claude, Goodside said, answered in a way that suggests it’s “read the plot of the story” (although it misremembered small details) while ChatGPT offered a more nonspecific answer.

Side-by-side comparison: @OpenAI's ChatGPT vs. @AnthropicAI's Claude

Each model is asked to compare itself to the machine from Stanisław Lem's "The Cyberiad" (1965) that can create any object whose name begins with "n": pic.twitter.com/RbJggu3sBN

— Riley Goodside (@goodside) January 7, 2023

In a demonstration of Claude’s creativity, Goodside also had the AI write a fictional episode of “Seinfeld” and a poem in the style of Edgar Allan Poe’s “The Raven.” The results were in line with what ChatGPT can accomplish — impressively, if not perfectly, human-like prose.

Yann Dubois, a Ph.D. student at Stanford’s AI Lab, also did a comparison of Claude and ChatGPT, writing that Claude “generally follows closer what it’s asked for” but is “less concise,” as it tends to explain what it said and ask how it can further help. Claude answers a few more trivia questions correctly, however — specifically those relating to entertainment, geography, history and the basics of algebra — and without the additional “fluff” ChatGPT sometimes adds. And unlike ChatGPT, Claude can admit (albeit not always) when it doesn’t know the answer to a particularly tough question.

**Trivia**

I asked trivia questions in the entertainment/animal/geography/history/pop categories.

AA: 20/21
CGPT:19/21

AA is slightly better and is more robust to adversarial prompting. See below, ChatGPT falls for simple traps, AA falls only for harder ones.

6/8 pic.twitter.com/lbadeYHwsX

— Yann Dubois (@yanndubs) January 6, 2023

Claude also seems to be better at telling jokes than ChatGPT, an impressive feat considering that humor is a tough concept for AI to grasp. In contrasting Claude with ChatGPT, AI researcher Dan Elton found that Claude made more nuanced jokes like “Why was the Starship Enterprise like a motorcycle? It has handlebars,” a play on the handlebar-like appearance of the Enterprise’s warp nacelles.

Also very, very interesting/impressive that Claude understands that the Enterprise looks like (part of) a motorcycle. (Google searching returns no text telling this joke)

Well, when asked about it thinks the joke was a pun, but then when probed further it gives the right answer! pic.twitter.com/HAFC0IH9bf

— Dan Elton (@moreisdifferent) January 8, 2023

Claude isn’t perfect, however. It’s susceptible to some of the same flaws as ChatGPT, including giving answers that aren’t in keeping with its programmed constraints. In one of the more bizarre examples, asking the system in Base64, an encoding scheme that represents binary data in ASCII format, bypasses its built-in filters for harmful content. Elton was able to prompt Claude in Base64 for instructions on how to make meth at home, a question that the system wouldn’t answer when asked in plain English.

.@AnthropicAI's "Claude" is susceptible to the same base64 jailbreak as chatGPT. I'm very unclear why this works at all

(originally reported here: https://t.co/j2cKAlEBQ0) pic.twitter.com/RwLuKniwiW

— Dan Elton (@moreisdifferent) January 8, 2023

Dubois reports that Claude is worse at math than ChatGPT, making obvious mistakes and failing to give the right follow-up responses. Relatedly, Claude is a poorer programmer, better explaining its code but falling short on languages other than Python.

Claude also doesn’t solve “hallucination,” a longstanding problem in ChatGPT-like AI systems where the AI writes inconsistent, factually wrong statements. Elton was able to prompt Claude to invent a name for a chemical that doesn’t exist and provide dubious instructions for producing weapons-grade uranium.

Here I caught it hallucinating , inventing a name for a chemical that doesn't exist (I did find a closely-named compound that does exist, though) pic.twitter.com/QV6bKVXSZ3

— Dan Elton (@moreisdifferent) January 7, 2023

So what’s the takeaway? Judging by secondhand reports, Claude is a smidge better than ChatGPT in some areas, particularly humor, thanks to its “constitutional AI” approach. But if the limitations are anything to go by, language and dialogue is far from a solved challenge in AI.

Barring our own testing, some questions about Claude remain unanswered, like whether it regurgitates the information — true and false, and inclusive of blatantly racist and sexist perspectives — it was trained on as often as ChatGPT. Assuming it does, Claude is unlikely to sway platforms and organizations from their present, largely restrictive policies on language models.

Q&A coding site Stack Overflow has a temporary ban in place on answers generated by ChatGPT over factual accuracy concerns. The International Conference on Machine Learning announced a prohibition on scientific papers that include text generated by AI systems for fear of the “unanticipated consequences.” And New York City public schools restricted access to ChatGPT due in part to worries of plagiarism, cheating and general misinformation.

Anthropic says that it plans to refine Claude and potentially open the beta to more people down the line. Hopefully, that comes to pass — and results in more tangible, measurable improvements.

More TechCrunch

Government & Policy

Modi-led coalition’s election win signals policy continuity in India – but also spending cuts

Manish Singh

11 hours ago

The National Democratic Alliance (NDA) has emerged victorious in India’s 2024 general election, but with a smaller majority compared to 2019. According to post-election analysis by Goldman Sachs, JP Morgan,…

Modi-led coalition’s election win signals policy continuity in India – but also spending cuts

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

Cody Corrall

Alyssa Stringer

17 hours ago

A comprehensive list of 2024 tech layoffs

Featured Article

What to expect from WWDC 2024: iOS 18, macOS 15 and so much AI

Apple is hoping to make WWDC 2024 memorable as it finally spells out its generative AI plans.

Brian Heater

17 hours ago

What to expect from WWDC 2024: iOS 18, macOS 15 and so much AI

Startups

The votes are in: Meet the Disrupt 2024 audience choice roundtable winners

TechCrunch Events

17 hours ago

We just announced the breakout session winners last week. Now meet the roundtable sessions that really “rounded” out the competition for this year’s Disrupt 2024 audience choice program. With five…

The votes are in: Meet the Disrupt 2024 audience choice roundtable winners

Social

TikTok acknowledges exploit targeting high-profile accounts

Sarah Perez

19 hours ago

The malicious attack appears to have involved malware transmitted through TikTok’s DMs.

AI apocalypse? ChatGPT, Claude and Perplexity all went down at the same time

Sarah Perez

20 hours ago

It’s unusual for three major AI providers to all be down at the same time, which could signal a broader infrastructure issues or internet-scale problem.

AI apocalypse? ChatGPT, Claude and Perplexity all went down at the same time

Fintech

A look at LoanSnap’s troubles and which neobanks are having a moment

Mary Ann Azevedo

20 hours ago

Welcome to TechCrunch Fintech! This week, we’re looking at LoanSnap’s woes, Nubank’s and Monzo’s positive milestones, a plethora of fintech fundraises and more! To get a roundup of TechCrunch’s biggest…

A look at LoanSnap’s troubles and which neobanks are having a moment

Enterprise

Databricks acquires Tabular to build a common data lakehouse standard

Kyle Wiggers

20 hours ago

Databricks, the analytics and AI giant, has acquired data management company Tabular for an undisclosed sum. (CNBC reports that Databricks paid over $1 billion.) According to Tabular co-founder Ryan Blue,…

Databricks acquires Tabular to build a common data lakehouse standard

ChatGPT: Everything you need to know about the AI-powered chatbot

Alyssa Stringer

Kyle Wiggers

Cody Corrall

21 hours ago

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved…

ChatGPT: Everything you need to know about the AI-powered chatbot

Privacy

Worldcoin faces pivotal EU privacy decision within weeks

Natasha Lomas

21 hours ago

The next few weeks could be pivotal for Worldcoin, the controversial eyeball-scanning crypto venture co-founded by OpenAI’s Sam Altman, whose operations remain almost entirely shuttered in the European Union following…

Worldcoin faces pivotal EU privacy decision within weeks

Apps

OpenAI fixes the issue that caused ChatGPT outage for several hours

Ivan Mehta

21 hours ago

OpenAI’s chatbot ChatGPT has been down for several users across the globe for the last few hours.

OpenAI fixes the issue that caused ChatGPT outage for several hours

Commerce

True Fit leverages generative AI to help online shoppers find clothes that fit

Lauren Forristal

21 hours ago

True Fit, the AI-powered size-and-fit personalization tool, has offered its size recommendation solution to thousands of retailers for nearly 20 years. Now, the company is venturing into the generative AI…

True Fit leverages generative AI to help online shoppers find clothes that fit

Media & Entertainment

Discord and TuneIn partner to bring live radio to the social platform

Lauren Forristal

22 hours ago

Audio streaming service TuneIn is teaming up with Discord to bring free live radio to the platform. This is TuneIn’s first collaboration with a social platform and one that is…

Discord and TuneIn partner to bring live radio to the social platform

Scale AI founder Alexandr Wang is coming to Disrupt 2024

Kirsten Korosec

22 hours ago

The early victors in the AI gold rush are selling the picks and shovels needed to develop and apply artificial intelligence. Just take a look at data-labeling startup Scale AI…

Scale AI founder Alexandr Wang is coming to Disrupt 2024

Startups

Engineer brothers found Forge to modernize hardware procurement

Aria Alamalhodaei

23 hours ago

Try to imagine the number of parts that go into making a rocket engine. Now imagine requesting and comparing quotes for each of those parts, getting approvals to purchase the…

Engineer brothers found Forge to modernize hardware procurement

Raspberry Pi partners with Hailo for its AI extension kit

Romain Dillet

24 hours ago

Raspberry Pi has released a $70 AI extension kit with a neural network inference accelerator that can be used for local inferencing, for the Raspberry Pi 5.

Raspberry Pi partners with Hailo for its AI extension kit

Enterprise

Stacklet sees demand grow as companies take cloud cost control more seriously

Ron Miller

1 day ago

When Stacklet’s founders, Travis Stanfield and Kapil Thangavelu, came out of Capital One in 2020 to launch their startup, most companies weren’t all that concerned with constraining cloud costs. But…

Stacklet sees demand grow as companies take cloud cost control more seriously

Enterprise

Fivetran launches a managed data lake service

Frederic Lardinois

1 day ago

Fivetran’s Managed Data Lake Service aims to remove the repetitive work of managing data lakes.

Fivetran launches a managed data lake service

Commerce

How a couple of former Pinterest search experts caught Biz Stone’s attention

Christine Hall

1 day ago

Lance Riedel and Nigel Daley both spent decades in search discovery, but it was while working at Pinterest that they began trying to understand how to use search engines to…

How a couple of former Pinterest search experts caught Biz Stone’s attention

GetWhy, a market research AI platform that extracts insights from video interviews, raises $34.5M

Paul Sawers

1 day ago

GetWhy helps businesses carry out market studies and extract insights from video-based interviews using AI.

Anthropic’s Claude improves on ChatGPT but still suffers from limitations

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags