Apps

While anticipation builds for GPT-4, OpenAI quietly releases GPT-3.5

Comment

OpenAI's logo
Image Credits: OpenAI

Released two years ago, OpenAI’s remarkably capable, if flawed, GPT-3 was perhaps the first to demonstrate that AI can write convincingly — if not perfectly — like a human. The successor to GPT-3, most likely called GPT-4, is expected to be unveiled in the near future, perhaps as soon as 2023. But in the meantime, OpenAI has quietly rolled out a series of AI models based on “GPT-3.5,” a previously-unannounced, improved version of GPT-3.

GPT-3.5 broke cover on Wednesday with ChatGPT, a fine-tuned version of GPT-3.5 that’s essentially a general-purpose chatbot. Debuted in a public demo yesterday afternoon, ChatGPT can engage with a range of topics, including programming, TV scripts and scientific concepts.

According to OpenAI, GPT-3.5 was trained on a blend of text and code published prior to Q4 2021. Like GPT-3 and other text-generating AI, GPT-3.5 learned the relationships between sentences, words and parts of words by ingesting huge amounts of content from the web, including hundreds of thousands of Wikipedia entries, social media posts and news articles.

Rather than release the fully trained GPT-3.5, OpenAI used it to create several systems fine-tuned for specific tasks — each available through the OpenAI API. One — text-davinci-003 — can handle more complex instructions than models built on GPT-3, according to the lab, and is measurably better at both long-form and “high-quality” writing.

According to OpenAI data scientist Jan Leike, text-davinci-003 is similar but not identical to InstructGPT, a family of GPT-3-based models released by OpenAI earlier this year that are less likely to generate problematic (e.g., toxic and highly biased) text while more closely aligning with a user’s intent. Text-davinci-003 — and by extension GPT-3.5 — “scores higher on human preference ratings” while suffering from “less severe” limitations, Leike said in a tweet.

That appears to be the case anecdotally. Data scientists at Pepper Content, a content marketing platform, report that text-davinci-003 “performs better in understanding the ‘context’ behind a request and then using that to produce better content” while “hallucinating” less than GPT-3-based models. (Where it concerns text-generating AI, hallucination refers to an AI writing inconsistent, factually incorrect statements.)

In a test on OpenAI’s Playground website, which provides a UI frontend for the models, the Pepper Content team fed several prompts to text-davinci-003 and a model based on GPT-3 (text-davinci-002). Given “What is the philosophy behind WeWork?,” the GPT-3.5-based text-davinci-003 generated this:

WeWork is a global workspace provider that believes people are the most important asset in any organization. The philosophy of WeWork is to create a collaborative environment that enables people to work together in a flexible and efficient way. The company encourages collaboration and productivity, while providing a comfortable and inspiring space.

WeWork is also committed to being a socially responsible organization, by finding ways to reduce its environmental impact, by providing meaningful work experiences, and by promoting diversity and inclusion. WeWork also strives to create meaningful experiences for its members, through its unique community-based programming, events and activities. The company believes that when people work together in an inspiring and collaborative environment, they can achieve more and create meaningful change.

It’s not perfect — note the excess commas and repetitiveness. But the copy’s certainly more engaging than what the GPT-3-based text-davinci-002 produced:

At WeWork, we believe in openness, collaboration, and community. We have reimagined what a workspace can be by bringing together a global community of creators, entrepreneurs, and startups — anyone looking to build something meaningful and transform the world. We are proud to be your neighbour, coworker, and friend.

GPT-3.5 is also better at generating blog posts, it seems. Here’s what the Pepper Content team got when they prompted text-davinci-003 to write a post about picking a sofa:

If you are looking to give your green living room a fresh look, adding the the best sofa for the space is a great way to start. However, there is more to choosing the perfect sofa than just finding a piece that looks good. You want to find a piece that is comfortable and fits your green living room décor.

In this blog, we will discuss how to pick sofa far your green living room. We will look at the different types of sofas available, the design factors to consider when making your selection, and how to choose a sofa that fits your style and budget. We will also provide tips on how to maintain and care for your new sofa. By the end of this blog, you will have all the information you need to pick the perfect for your green living room.

Again, it isn’t foolproof. GPT-3.5 oddly added the bit about a “green living room.” But also again, GPT-3 is more basic and less grammatically correct in its generation:

Sofa is one of the most basic requirements in a living room. It’s not just a piece of furniture but an important part of the décor of your living room. So, what should be the criteria while picking a sofa? If you are wondering about this then stay with me as I discuss the different aspects of the sofa would help you in picking the best one for yourself.

Experiments beyond Pepper Content’s suggest that GPT-3.5 tends to be much more sophisticated and thorough in its responses than GPT-3. For example, when YouTube channel All About AI prompted text-davinci-003 to write a history about AI, the model’s output mentioned key luminaries in the field, including Alan Turing and Arthur Samuelson, while text-davinci-002”s did not. All About AI also found that text-davinci-003 tended to have a more nuanced understanding of instructions, for instance providing details such as a title, description, outline, introduction and recap when asked to create a video script.

That’s no accident — a hallmark feature of text-davinci-003/GPT-3.5’s outputs is verboseness. (This writer can sympathize.) In an analysis, scientists at startup Scale AI found text-davinci-003/GPT-3.5 generates outputs roughly 65% longer than text-davinci-002/GPT-3 with identical prompts.

Perhaps less useful for most potential users but nonetheless entertaining, text-davinci-003/GPT-3.5 is superior at composing songs, limericks and rhyming poetry than its predecessor. Ars Technica reports that commenters on Y Combinator’s Hacker News forum used text-davinci-003 to write a poem explaining Albert Einstein’s theory of relativity and then re-write the poem in the style of John Keats. See:

If you want to understand Einstein’s thought
It’s not that hard if you give it a shot
General Relativity is the name of the game
Where space and time cannot remain the same
Mass affects the curvature of space
Which affects the flow of time’s race
An object’s motion will be affected
By the distortion that is detected
The closer you are to a large mass
The slower time will seem to pass
The farther away you may be
Time will speed up for you to see

The Scale AI team even found that text-davinci-003/GPT-3.5 has a notion of meters like iambic pentameter. See:

O gentle steeds, that bear me swift and

sure

Through fields of green and pathways so

obscure,

My heart doth swell with pride to be with

you

As on we ride the world a-fresh to view

The wind doth whistle through our hair so

free

And stirs a passion deep inside of me.

My soul doth lift, my spirits soar on high,

To ride with you, my truest friend, am I

Your strength and grace, your courage and

your fire,

Inspire us both to go beyond our sire.

No earthly bonds can hold us, only fate,

To gallop on, our wond’rous course create

Relatedly, GPT-3.5 is wittier than GPT-3 — at least from a subjective standpoint. Asking text-davinci-002/GPT-3 to “tell a joke” usually yields this:

Why did the chicken cross the road? To get to the other side.

Text-davinci-003/GPT-3.5 has cleverer responses:

Q: What did the fish say when it hit the wall? A: Dam!

Q: What did one ocean say to the other ocean? A: Nothing, they just waved.

Scale AI had the model explain Python code in the style of Eminem, a feat which text-davinci-002/GPT-3 simply couldn’t accomplish:

Yo, so I’m loopin’ through this list

With each item that I find

I’m gonna print out every letter in each one

of them

Dog, Cat, Banana, Apple, I’m gonna get’em

all with this rhyme

So why is GPT-3.5 better than GPT-3 in these particular areas? We can’t know the exact answer without additional details from OpenAI, which aren’t forthcoming; an OpenAI spokesperson declined a request for comment. But it’s safe to assume that GPT-3.5’s training approach had something to do with it. Like InstructGPT, GPT-3.5 was trained with the help of human trainers who ranked and rated the way early versions of the model responded to prompts. This information was then fed back into the system, which tuned its answers to match the trainers’ preferences.

Of course, this doesn’t make GPT-3.5 immune to the pitfalls to which all modern language models succumb. Because GPT-3.5 merely relies on statistical regularities in its training data rather than a human-like understanding of the world, it’s still prone to, in Leike’s words, “mak[ing] stuff up a bunch.” It also has limited knowledge of the world after 2021 because its training data is more sparse after that year. And the model’s safeguards against toxic output can be circumvented.

Still, GPT-3.5 and its derivative models demonstrate that GPT-4 — whenever it arrives — won’t necessarily need a huge number of parameters to best the most capable text-generating systems today. (Parameters are the parts of the model learned from historical training data and essentially define the skill of the model on a problem.) While some have predicted that GPT-4 will contain over 100 trillion parameters — nearly 600 times as many as GPT-3 — others argue that emerging techniques in language processing, like those seen in GPT-3.5 and InstructGPT, will make such a jump unnecessary.

One of those techniques could involve browsing the web for greater context, a la Meta’s ill-fated BlenderBot 3.0 chatbot. John Shulman, a research scientist and co-founder of OpenAI, told MIT Tech Review in a recent interview that OpenAI is continuing work on a language model it announced late last year, WebGPT, that can go and look up information on the web (via Bing) and give sources for its answers. At least one Twitter user appears to have found evidence of the feature undergoing testing for ChatGPT.

OpenAI has another reason to pursue lower-parameter models as it continues to evolve GPT-3: huge costs. A 2020 study from AI21 Labs pegged the expenses for developing a text-generating model with only 1.5 billion parameters at as much as $1.6 million. OpenAI has raised over $1 billion to date from Microsoft and other backers, and it’s reportedly in talks to raise more. But all investors, no matter how big, expect to see returns eventually.

More TechCrunch

Featured Article

I’m rooting for Melinda French Gates to fix tech’s broken ‘brilliant jerk’ culture

Women in tech still face a shocking level of mistreatment at work. Melinda French Gates is one of the few working to change that.

2 hours ago
I’m rooting for Melinda French Gates to fix tech’s  broken ‘brilliant jerk’ culture

Blue Origin has successfully completed its NS-25 mission, resuming crewed flights for the first time in nearly two years. The mission brought six tourist crew members to the edge of…

Blue Origin successfully launches its first crewed mission since 2022

Creative Artists Agency (CAA), one of the top entertainment and sports talent agencies, is hoping to be at the forefront of AI protection services for celebrities in Hollywood. With many…

Hollywood agency CAA aims to help stars manage their own AI likenesses

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

A new crop of early-stage startups — along with some recent VC investments — illustrates a niche emerging in the autonomous vehicle technology sector. Unlike the companies bringing robotaxis to…

VCs and the military are fueling self-driving startups that don’t need roads

When the founders of Sagetap, Sahil Khanna and Kevin Hughes, started working at early-stage enterprise software startups, they were surprised to find that the companies they worked at were trying…

Deal Dive: Sagetap looks to bring enterprise software sales into the 21st century

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI moves away from safety

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine

IndieBio’s Bay Area incubator is about to debut its 15th cohort of biotech startups. We took special note of a few, which were making some major, bordering on ludicrous, claims…

IndieBio’s SF incubator lineup is making some wild biotech promises

YouTube TV has announced that its multiview feature for watching four streams at once is now available on Android phones and tablets. The Android launch comes two months after YouTube…

YouTube TV’s ‘multiview’ feature is now available on Android phones and tablets

Featured Article

Two Santa Cruz students uncover security bug that could let millions do their laundry for free

CSC ServiceWorks provides laundry machines to thousands of residential homes and universities, but the company ignored requests to fix a security bug.

2 days ago
Two Santa Cruz students uncover security bug that could let millions do their laundry for free

TechCrunch Disrupt 2024 is just around the corner, and the buzz is palpable. But what if we told you there’s a chance for you to not just attend, but also…

Harness the TechCrunch Effect: Host a Side Event at Disrupt 2024

Decks are all about telling a compelling story and Goodcarbon does a good job on that front. But there’s important information missing too.

Pitch Deck Teardown: Goodcarbon’s $5.5M seed deck

Slack is making it difficult for its customers if they want the company to stop using its data for model training.

Slack under attack over sneaky AI training policy

A Texas-based company that provides health insurance and benefit plans disclosed a data breach affecting almost 2.5 million people, some of whom had their Social Security number stolen. WebTPA said…

Healthcare company WebTPA discloses breach affecting 2.5 million people

Featured Article

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Microsoft won’t be facing antitrust scrutiny in the U.K. over its recent investment into French AI startup Mistral AI.

2 days ago
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Ember has partnered with HSBC in the U.K. so that the bank’s business customers can access Ember’s services from their online accounts.

Embedded finance is still trendy as accounting automation startup Ember partners with HSBC UK

Kudos uses AI to figure out consumer spending habits so it can then provide more personalized financial advice, like maximizing rewards and utilizing credit effectively.

Kudos lands $10M for an AI smart wallet that picks the best credit card for purchases

The EU’s warning comes after Microsoft failed to respond to a legally binding request for information that focused on its generative AI tools.

EU warns Microsoft it could be fined billions over missing GenAI risk info

The prospects for troubled banking-as-a-service startup Synapse have gone from bad to worse this week after a United States Trustee filed an emergency motion on Wednesday.  The trustee is asking…

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

U.K.-based Seraphim Space is spinning up its 13th accelerator program, with nine participating companies working on a range of tech from propulsion to in-space manufacturing and space situational awareness. The…

Seraphim’s latest space accelerator welcomes nine companies

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said…

OpenAI inks deal to train AI on Reddit data

X users will now be able to discover posts from new Communities that are trending directly from an Explore tab within the section.

X pushes more users to Communities

For Mark Zuckerberg’s 40th birthday, his wife got him a photoshoot. Zuckerberg gives the camera a sly smile as he sits amid a carefully crafted re-creation of his childhood bedroom.…

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?

Strava announced a slew of features, including AI to weed out leaderboard cheats, a new ‘family’ subscription plan, dark mode and more.

Strava taps AI to weed out leaderboard cheats, unveils ‘family’ plan, dark mode and more