AI

Anthropic claims its new AI chatbot models beat OpenAI’s GPT-4

Comment

Anthropic Claude logo
Image Credits: Anthropic

AI startup Anthropic, backed by Google and hundreds of millions in venture capital (and perhaps soon hundreds of millions more), today announced the latest version of its GenAI tech, Claude. And the company claims that the AI chatbot beats OpenAI’s GPT-4 in terms of performance.

Claude 3, as Anthropic’s new GenAI is called, is a family of models — Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus, Opus being the most powerful. All show “increased capabilities” in analysis and forecasting, Anthropic claims, as well as enhanced performance on specific benchmarks versus models like ChatGPT and GPT-4 and Google’s Gemini 1.0 Ultra (but not Gemini 1.5 Pro).

Notably, Claude 3 is Anthropic’s first multimodal GenAI, meaning that it can analyze text as well as images — similar to some flavors of GPT-4 and Gemini. Claude 3 can process photos, charts, graphs and technical diagrams, drawing from PDFs, slideshows and other document types.

In a step one better than some GenAI rivals, Claude 3 can analyze multiple images in a single request (up to a maximum of 20). This allows it to compare and contrast images, notes Anthropic.

But there are limits to Claude 3’s image processing.

Anthropic has disabled the models from identifying people — no doubt wary of the ethical and legal implications. And the company admits that Claude 3 is prone to making mistakes with “low-quality” images (under 200 pixels) and struggles with tasks involving spatial reasoning (e.g. reading an analog clock face) and object counting (Claude 3 can’t give exact counts of objects in images).

Anthropic Claude 3
Image Credits: Anthropic

Claude 3 also won’t generate artwork. The models are strictly image-analyzing — at least for now.

Whether fielding text or images, Anthropic says that customers can generally expect Claude 3 to better follow multi-step instructions, produce structured output in formats like JSON and converse in languages other than English compared to its predecessors. Claude 3 should also refuse to answer questions less often thanks to a “more nuanced understanding of requests,” Anthropic says. And soon, the models will cite the source of their answers to questions so users can verify them.

“Claude 3 tends to generate more expressive and engaging responses,” Anthropic writes in a support article. “[It’s] easier to prompt and steer compared to our legacy models. Users should find that they can achieve the desired results with shorter and more concise prompts.”

Some of those improvements stem from Claude 3’s expanded context.

A model’s context, or context window, refers to input data (e.g. text) that the model considers before generating output. Models with small context windows tend to “forget” the content of even very recent conversations, leading them to veer off topic — often in problematic ways. As an added upside, large-context models can better grasp the narrative flow of data they take in and generate more contextually rich responses (hypothetically, at least).

Anthropic says that Claude 3 will initially support a 200,000-token context window, equivalent to about 150,000 words, with select customers getting up a 1-milion-token context window (~700,000 words). That’s on par with Google’s newest GenAI model, the above-mentioned Gemini 1.5 Pro, which also offers up to a million-token context window.

Now, just because Claude 3 is an upgrade over what came before it doesn’t mean it’s perfect.

In a technical whitepaper, Anthropic admits that Claude 3 isn’t immune from the issues plaguing other GenAI models, namely bias and hallucinations (i.e. making stuff up). Unlike some GenAI models, Claude 3 can’t search the web; the models can only answer questions using data from before August 2023. And while Claude is multilingual, it’s not as fluent in certain “low-resource” languages versus English.

But Anthropic is promising frequent updates to Claude 3 in the months to come.

“We don’t believe that model intelligence is anywhere near its limits, and we plan to release [enhancements] to the Claude 3 model family over the next few months,” the company writes in a blog post.

Opus and Sonnet are available now on the web and via Anthropic’s dev console and API, Amazon’s Bedrock platform and Google’s Vertex AI. Haiku will follow later this year.

Here’s the pricing breakdown:

  • Opus: $15 per million input tokens, $75 per million output tokens
  • Sonnet: $3 per million input tokens, $15 per million output tokens
  • Haiku: $0.25 per million input tokens, $1.25 per million output tokens

So that’s Claude 3. But what’s the 30,000-foot view of all this?

Well, as we’ve reported previously, Anthropic’s ambition is to create a next-gen algorithm for “AI self-teaching.” Such an algorithm could be used to build virtual assistants that can answer emails, perform research and generate art, books and more — some of which we’ve already gotten a taste of with the likes of GPT-4 and other large language models.

Anthropic hints at this in the aforementioned blog post, saying that it plans to add features to Claude 3 that enhance its out-of-the-gate capabilities by allowing Claude to interact with other systems, code “interactively” and deliver “advanced agentic capabilities.”

That last bit calls to mind OpenAI’s reported ambitions to build a software agent to automate complex tasks, like transferring data from a document to a spreadsheet or automatically filling out expense reports and entering them in accounting software. OpenAI already offers an API that allows developers to build “agent-like experiences” into their apps, and Anthropic, it seems, is intent on delivering functionality that’s comparable.

Could we see an image generator from Anthropic next? It’d surprise me, frankly. Image generators are the subject of much controversy these days, mainly for copyright- and bias-related reasons. Google was recently forced to disable its image generator after it injected diversity into pictures with a farcical disregard for historical context. And a number of image generator vendors are in legal battles with artists who accuse them of profiting off of their work by training GenAI on that work without providing compensation or even credit.

I’m curious to see the evolution of Anthropic’s technique for training GenAI, “constitutional AI,” which the company claims makes the behavior of its GenAI easier to understand, more predictable and simpler to adjust as needed. Constitutional AI aims to provide a way to align AI with human intentions, having models respond to questions and perform tasks using a simple set of guiding principles. For example, for Claude 3, Anthropic said that it added a principle — informed by crowdsourced feedback — that instructs the models to be understanding of and accessible to people with disabilities.

Whatever Anthropic’s endgame, it’s in it for the long haul. According to a pitch deck leaked in May of last year, the company aims to raise as much as $5 billion over the next 12 months or so — which might just be the baseline it needs to remain competitive with OpenAI. (Training models isn’t cheap, after all.) It’s well on its way, with $2 billion and $4 billion in committed capital and pledges from Google and Amazon, respectively, and well over a billion combined from other backers.

More TechCrunch

The AI industry moves faster than the rest of the technology sector, which means it outpaces the federal government by several orders of magnitude.

Senate study proposes ‘at least’ $32B yearly for AI programs

The FBI along with a coalition of international law enforcement agencies seized the notorious cybercrime forum BreachForums on Wednesday.  For years, BreachForums has been a popular English-language forum for hackers…

FBI seizes hacking forum BreachForums — again

The announcement signifies a significant shake-up in the streaming giant’s advertising approach.

Netflix to take on Google and Amazon by building its own ad server

It’s tough to say that a $100 billion business finds itself at a critical juncture, but that’s the case with Amazon Web Services, the cloud arm of Amazon, and the…

Matt Garman taking over as CEO with AWS at crossroads

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show…

Google still hasn’t fixed Gemini’s biased image generator

A feature Google demoed at its I/O confab yesterday, using its generative AI technology to scan voice calls in real time for conversational patterns associated with financial scams, has sent…

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

Google’s going all in on AI — and it wants you to know it. During the company’s keynote at its I/O developer conference on Tuesday, Google mentioned “AI” more than…

The top AI announcements from Google I/O

Uber is taking a shuttle product it developed for commuters in India and Egypt and converting it for an American audience. The ride-hail and delivery giant announced Wednesday at its…

Uber has a new way to solve the concert traffic problem

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

Google is preparing to launch a new system to help address the problem of malware on Android. Its new live threat detection service leverages Google Play Protect’s on-device AI to…

Google takes aim at Android malware with an AI-powered live threat detection service

Users will be able to access the AR content by first searching for a location in Google Maps.

Google Maps is getting geospatial AR content later this year

The heat pump startup unveiled its first products and revealed details about performance, pricing and availability.

Quilt heat pump sports sleek design from veterans of Apple, Tesla and Nest

The space is available from the launcher and can be locked as a second layer of authentication.

Google’s new Private Space feature is like Incognito Mode for Android

Gemini, the company’s family of generative AI models, will enhance the smart TV operating system so it can generate descriptions for movies and TV shows.

Google TV to launch AI-generated movie descriptions

When triggered, the AI-powered feature will automatically lock the device down.

Android’s new Theft Detection Lock helps deter smartphone snatch and grabs

The company said it is increasing the on-device capability of its Google Play Protect system to detect fraudulent apps trying to breach sensitive permissions.

Google adds live threat detection and screen-sharing protection to Android

This latest release, one of many announcements from the Google I/O 2024 developer conference, focuses on improved battery life and other performance improvements, like more efficient workout tracking.

Wear OS 5 hits developer preview, offering better battery life

For years, Sammy Faycurry has been hearing from his registered dietitian (RD) mom and sister about how poorly many Americans eat and their struggles with delivering nutritional counseling. Although nearly…

Dietitian startup Fay has been booming from Ozempic patients and emerges from stealth with $25M from General Catalyst, Forerunner

Apple is bringing new accessibility features to iPads and iPhones, designed to cater to a diverse range of user needs.

Apple announces new accessibility features for iPhone and iPad users

TechCrunch Disrupt, our flagship startup event held annually in San Francisco, is back on October 28-30 — and you can expect a bustling crowd of thousands of startup enthusiasts. Exciting…

Startup Blueprint: TC Disrupt 2024 Builders Stage agenda sneak peek!

Mike Krieger, one of the co-founders of Instagram and, more recently, the co-founder of personalized news app Artifact (which TechCrunch corporate parent Yahoo recently acquired), is joining Anthropic as the…

Anthropic hires Instagram co-founder as head of product

Seven orgs so far have signed on to standardize the way data is collected and shared.

Venture orgs form alliance to standardize data collection

As cloud adoption continues to surge toward the $1 trillion mark in annual spend, we’re seeing a wave of enterprise startups gaining traction with customers and investors for tools to…

Alkira connects with $100M for a solution that connects your clouds

Charging has long been the Achilles’ heel of electric vehicles. One startup thinks it has a better way for apartment dwelling EV drivers to charge overnight.

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

So did investors laugh them out of the room when they explained how they wanted to replace Quickbooks? Kind of.

Embedded accounting startup Layer secures $2.3M toward goal of replacing QuickBooks

While an increasing number of companies are investing in AI, many are struggling to get AI-powered projects into production — much less delivering meaningful ROI. The challenges are many. But…

Weka raises $140M as the AI boom bolsters data platforms

PayHOA, a previously bootstrapped Kentucky-based startup that offers software for self-managed homeowner associations (HOAs), is an example of how real-world problems can translate into opportunity. It just raised a $27.5…

Meet PayHOA, a profitable and once-bootstrapped SaaS startup that just landed a $27.5M Series A

Restaurant365, which offers a restaurant management suite, has raised a hot $175M from ICONIQ Growth, KKR and L Catterton.

Restaurant365 orders in $175M at $1B+ valuation to supersize its food service software stack 

Venture firm Shilling has launched a €50M fund to support growth-stage startups in its own portfolio and to invest in startups everywhere else. 

Portuguese VC firm Shilling launches €50M opportunity fund to back growth-stage startups

Chang She, previously the VP of engineering at Tubi and a Cloudera veteran, has years of experience building data tooling and infrastructure. But when She began working in the AI…

LanceDB, which counts Midjourney as a customer, is building databases for multimodal AI