Active learning is the future of generative AI: Here’s how to leverage it

5:00 AM PST • February 28, 2023

Digital generated image of silhouette of male head with multicoloured gears inside on white background. — **Image Credits:** Andriy Onufriyenko (opens in a new window) / Getty Images

Eric Landau

Contributor

Before Eric Landau co-founded Encord, he spent nearly a decade at DRW, where he was lead quantitative researcher on a global equity delta one desk and put thousands of models into production. He holds an S.M. in Applied Physics from Harvard University, an M.S. in Electrical Engineering and a B.S. in Physics from Stanford University.

What is active learning?

Active learning makes training a supervised model an iterative process. The model trains on an initial subset of labeled data from a large dataset. Then, it tries to make predictions on the rest of the unlabeled data based on what it has learned. ML engineers evaluate how certain the model is in its predictions and, by using a variety of acquisition functions, can quantify the performance benefit added by annotating one of the unlabeled samples.

By expressing uncertainty in its predictions, the model is deciding for itself what additional data will be most useful for its training. In doing so, it asks annotators to provide more examples of only that specific type of data so that it can train more intensively on that subset during its next round of training. Think of it like quizzing a student to figure out where their knowledge gap is. Once you know what problems they are missing, you can provide them with textbooks, presentations and other materials so that they can target their learning to better understand that particular aspect of the subject.

With active learning, training a model moves from being a linear process to a circular one with a strong feedback loop.

Why sophisticated companies should be ready to leverage active learning

Active learning is fundamental for closing the prototype-production gap and increasing model reliability.

It’s a common mistake to think of AI systems as a static piece of software, but these systems must be constantly learning and evolving. If not, they make the same mistakes repeatedly, or, when they’re released in the wild, they encounter new scenarios, make new mistakes and don’t have an opportunity to learn from them. They need to have the ability to learn over time, making corrections based on previous mistakes as a human would. Otherwise, models will have issues of reliability and micro robustness, and AI systems will not work in perpetuity.

Most companies using deep learning to solve real-world problems will need to incorporate active learning into their stack. If they don’t, they’ll lag their competitors. Their models won’t respond to or learn from the shifting landscape of possible scenarios.

However, incorporating active learning is easier said than done. For years, a lack of tooling and infrastructure made it difficult to facilitate active learning. Out of necessity, companies that began taking steps to improve their models’ performance with respect to the data have had to take a Frankenstein approach, cobbling together external tools and building tools in-house.

As a result, they don’t have an integrated, comprehensive system for model training. Instead, they have modular block-like processes that can’t talk to each other. They need a flexible system made up of decomposable components in which the processes communicate with one another as they go along the pipeline and create an iterative feedback loop.

The best ways to leverage active learning

Some companies, however, have implemented active learning to great effect and we can learn from them. For companies that have yet to put active learning in place also can do a few things to prepare for and make the most out of this methodology.

The gold standard of active learning is stacks that are fully iterative pipelines. Every component is run with respect to optimizing the performance of the downstream model: data selection, annotation, review, training and validation are done with an integrated logic rather than as disconnected units.

Counterintuitively, the best systems also have the most human interaction. They fully embrace the human-in-the-loop nature of iterative model improvement by opening up entry points for human supervision within each subprocess while also maintaining optionality for completely automated flows when things are working.

The most sophisticated companies therefore have stacks that are iterative, granular, inspectable, automatable and coherent.

Companies seeking to build neural networks that take advantage of active learning should build their stacks with the future in mind. These ML teams should project the types of problems they’ll have and understand the issues they’re likely to encounter when attempting to run their models in the wild. What edge cases will they encounter? In what unreasonable way is the model likely to behave?

If ML teams don’t think through these scenarios, models will inevitably make mistakes in a way that a human never would. Those errors can be quite embarrassing for companies and they should have been highly penalized because they’re so misaligned with human behavior and intuition.

Fortunately, for companies just entering the game, there’s now plenty of know-how and knowledge to be gained from companies that have broken through the production barrier. With more and more companies putting models into production, ML teams can more easily think about forward problems by studying their predecessors, as they will likely face similar problems when moving from proof of concept to production.

Another way to troubleshoot problems before they occur is to think about what a working model looks like beyond its performance metric scores. By thinking about how that model should operate in the wild and the sorts of data and scenarios it will encounter, ML teams will better understand the kinds of issues that might arise once it’s in the production stage.

Lastly, companies should make themselves aware of and understand the tools available to support an active learning and training data pipeline. Five or six years ago, companies had to build infrastructure internally and combine these in-house tools with imperfect external ones. Nowadays, every company should think before they build something internally. New tooling is being developed rapidly, and it’s likely that there’s already a tool that will save time and money while requiring no internal resourcing to maintain it.

Active learning is still in its very early days. However, every month, more companies are expressing an interest in taking advantage of this methodology. The most sophisticated ones will put the infrastructure, tooling and planning in place to harness its power.

More TechCrunch

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

Natasha Lomas

4 hours ago

A data protection taskforce that’s spent over a year considering how the European Union’s data protection rulebook applies to OpenAI’s viral chatbot, ChatGPT, reported preliminary conclusions Friday. The top-line takeaway…

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

LatAm startups: Apply to Startup Battlefield 200

TechCrunch Events

4 hours ago

Here’s a shoutout to LatAm early-stage startup founders! We want YOU to apply for the Startup Battlefield 200 at TechCrunch Disrupt 2024. But you’d better hurry — time is running…

LatAm startups: Apply to Startup Battlefield 200

5 days left to get your early-bird Disrupt passes

TechCrunch Events

4 hours ago

The countdown to early-bird savings for TechCrunch Disrupt, taking place October 28–30 in San Francisco, continues. You have just five days left to save up to $800 on the price…

5 days left to get your early-bird Disrupt passes

Venture

Spanish startups reached €100 billion in aggregate value last year

Anna Heim

4 hours ago

Venture investment into Spanish startups also held up quite well, with €2.2 billion raised across some 850 funding rounds.

Spanish startups reached €100 billion in aggregate value last year

Featured Article

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

James Khatiblou, the owner and CEO of Onyx Motorbikes, was watching his e-bike startup fall apart. Onyx was being evicted from its warehouse in El Segundo, Los Angeles. The company’s unpaid bills were stacking up. His chief operating officer had abruptly resigned. A shipment of around 100 CTY2 dirt bikes from Chinese supplier Suzhou Jindao…

Rebecca Bellan

5 hours ago

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

Featured Article

Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled

Iyo represents a third form factor in the push to deliver standalone generative AI devices: Bluetooth earbuds.

Brian Heater

5 hours ago

Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled

Women in AI: Arati Prabhakar thinks it’s crucial to get AI ‘right’

Kyle Wiggers

6 hours ago

Arati Prabhakar, profiled as part of TechCrunch’s Women in AI series, is director of the White House Office of Science and Technology Policy.

Women in AI: Arati Prabhakar thinks it’s crucial to get AI ‘right’

Apps

Doly lets you generate 3D product videos from your iPhone

Romain Dillet

7 hours ago

AniML, the French startup behind a new 3D capture app called Doly, wants to create the PhotoRoom of product videos, sort of. If you’re selling sneakers on an online marketplace…

Doly lets you generate 3D product videos from your iPhone

Elon Musk’s xAI raises $6B from Valor, a16z, and Sequoia

Ivan Mehta

13 hours ago

Elon Musk’s AI startup, xAI, has raised $6 billion in a new funding round, it said today, as Musk shores up capital to aggressively compete with rivals including OpenAI, Microsoft,…

Elon Musk’s xAI raises $6B from Valor, a16z, and Sequoia

Transportation

Indian EV startup Zypp Electric secures backing to fund expansion to Southeast Asia

Jagmeet Singh

16 hours ago

Indian startup Zypp Electric plans to use fresh investment from Japanese oil and energy conglomerate ENEOS to take its EV rental service into Southeast Asia early next year, TechCrunch has…

Indian EV startup Zypp Electric secures backing to fund expansion to Southeast Asia

A venture capital firm looks back on changing norms, from board seats to backing rival startups

Connie Loizos

19 hours ago

Last month, one of the Bay Area’s better-known early-stage venture capital firms, Uncork Capital, marked its 20th anniversary with a party in a renovated church in San Francisco’s SoMa neighborhood,…

A venture capital firm looks back on changing norms, from board seats to backing rival startups

Social

Families of Uvalde shooting victims sue Activision and Meta

Anthony Ha

1 day ago

The families of victims of the shooting at Robb Elementary School in Uvalde, Texas are suing Activision and Meta, as well as gun manufacturer Daniel Defense. The families bringing the…

Families of Uvalde shooting victims sue Activision and Meta

Y Combinator’s Garry Tan supports some AI regulation but warns against AI monopolies

Christine Hall

1 day ago

Like most Silicon Valley VCs, what Garry Tan sees is opportunities for new, huge, lucrative businesses.

Y Combinator’s Garry Tan supports some AI regulation but warns against AI monopolies

Social

How Maven’s AI-run ‘serendipity network’ can make social media interesting again

Rebecca Bellan

1 day ago

Everything in society can feel geared toward optimization – whether that’s standardized testing or artificial intelligence algorithms. We’re taught to know what outcome you want to achieve, and find the…

How Maven’s AI-run ‘serendipity network’ can make social media interesting again

Women in AI: Miriam Vogel stresses the need for responsible AI

Kyle Wiggers

1 day ago

Miriam Vogel, profiled as part of TechCrunch’s Women in AI series, is the CEO of the nonprofit responsible AI advocacy organization EqualAI.

Women in AI: Miriam Vogel stresses the need for responsible AI

What are Google’s AI Overviews good for?

Anthony Ha

1 day ago

Google has been taking heat for some of the inaccurate, funny, and downright weird answers that it’s been providing via AI Overviews in search. AI Overviews are the AI-generated search…

What are Google’s AI Overviews good for?

The ups and downs of investing in Europe, with VCs Saul Klein and Raluca Ragab

Connie Loizos

2 days ago

When it comes to the world of venture-backed startups, some issues are universal, and some are very dependent on where the startups and its backers are located. It’s something we…

The ups and downs of investing in Europe, with VCs Saul Klein and Raluca Ragab

Social

Scarlett Johansson brought receipts to the OpenAI controversy

Cody Corrall

2 days ago

Welcome back to TechCrunch’s Week in Review — TechCrunch’s newsletter recapping the week’s biggest news. Want it in your inbox every Saturday? Sign up here. OpenAI announced this week that…

Scarlett Johansson brought receipts to the OpenAI controversy

Fundraising

Deal Dive: Can blockchain make weather forecasts better? WeatherXM thinks so

Rebecca Szkutak

2 days ago

Accurate weather forecasts are critical to industries like agriculture, and they’re also important to help prevent and mitigate harm from inclement weather events or natural disasters. But getting forecasts right…

Deal Dive: Can blockchain make weather forecasts better? WeatherXM thinks so

Security

Spyware app pcTattletale was hacked and its website defaced

Zack Whittaker

2 days ago

pcTattletale’s website was briefly defaced and contained links containing files from the spyware maker’s servers, before going offline.

Spyware app pcTattletale was hacked and its website defaced

Featured Article

Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Synapse’s bankruptcy shows just how treacherous things are for the often-interdependent fintech world when one key player hits trouble.

Mary Ann Azevedo

2 days ago

Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Women in AI: Sarah Myers West says we should ask, ‘Why build AI at all?’

Kyle Wiggers

2 days ago

Sarah Myers West, profiled as part of TechCrunch’s Women in AI series, is managing director at the AI Now institute.

Women in AI: Sarah Myers West says we should ask, ‘Why build AI at all?’

This Week in AI: OpenAI and publishers are partners of convenience

Kyle Wiggers

Devin Coldewey

2 days ago

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI and publishers are partners of convenience

AI tutors are quietly changing how kids in the US study, and the leading apps are from China

Rita Liao

2 days ago

Evan, a high school sophomore from Houston, was stuck on a calculus problem. He pulled up Answer AI on his iPhone, snapped a photo of the problem from his Advanced…

AI tutors are quietly changing how kids in the US study, and the leading apps are from China

Startups

Startups Weekly: Drama at Techstars. Drama in AI. Drama everywhere.

Haje Jan Kamps

3 days ago

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. Well,…

Startups

From Plaid to Figma, here are the startups that are likely — or definitely — not having IPOs this year

Rebecca Szkutak

3 days ago

Last year’s investor dreams of a strong 2024 IPO pipeline have faded, if not fully disappeared, as we approach the halfway point of the year. 2024 delivered four venture-backed tech…

From Plaid to Figma, here are the startups that are likely — or definitely — not having IPOs this year

Transportation

Feds add nine more incidents to Waymo robotaxi investigation

Kirsten Korosec

3 days ago

Federal safety regulators have discovered nine more incidents that raise questions about the safety of Waymo’s self-driving vehicles operating in Phoenix and San Francisco. The National Highway Traffic Safety Administration…

Feds add nine more incidents to Waymo robotaxi investigation

Fundraising

Pitch Deck Teardown: Terra One’s $7.5M Seed deck

Haje Jan Kamps

3 days ago

Terra One’s pitch deck has a few wins, but also a few misses. Here’s how to fix that.

Pitch Deck Teardown: Terra One’s $7.5M Seed deck

Women in AI: Chinasa T. Okolo researches AI’s impact on the Global South

Dominic-Madori Davis

3 days ago

Chinasa T. Okolo researches AI policy and governance in the Global South.

Women in AI: Chinasa T. Okolo researches AI’s impact on the Global South

Disrupt 2024 early-bird tickets fly away next Friday

TechCrunch Events

3 days ago

TechCrunch Disrupt takes place on October 28–30 in San Francisco. While the event is a few months away, the deadline to secure your early-bird tickets and save up to $800…

Active learning is the future of generative AI: Here’s how to leverage it

Eric Landau

More posts from Eric Landau

What is active learning?

Why sophisticated companies should be ready to leverage active learning

The best ways to leverage active learning

More TechCrunch

Tags

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

LatAm startups: Apply to Startup Battlefield 200

5 days left to get your early-bird Disrupt passes

Spanish startups reached €100 billion in aggregate value last year

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled

Women in AI: Arati Prabhakar thinks it’s crucial to get AI ‘right’

Doly lets you generate 3D product videos from your iPhone

Elon Musk’s xAI raises $6B from Valor, a16z, and Sequoia

Indian EV startup Zypp Electric secures backing to fund expansion to Southeast Asia

A venture capital firm looks back on changing norms, from board seats to backing rival startups

Families of Uvalde shooting victims sue Activision and Meta

Y Combinator’s Garry Tan supports some AI regulation but warns against AI monopolies

How Maven’s AI-run ‘serendipity network’ can make social media interesting again

Women in AI: Miriam Vogel stresses the need for responsible AI

What are Google’s AI Overviews good for?

The ups and downs of investing in Europe, with VCs Saul Klein and Raluca Ragab

Scarlett Johansson brought receipts to the OpenAI controversy

Deal Dive: Can blockchain make weather forecasts better? WeatherXM thinks so

Spyware app pcTattletale was hacked and its website defaced

Synapse, backed by a16z, has collapsed, and 10 million consumers could be hurt

Women in AI: Sarah Myers West says we should ask, ‘Why build AI at all?’

This Week in AI: OpenAI and publishers are partners of convenience

AI tutors are quietly changing how kids in the US study, and the leading apps are from China

Startups Weekly: Drama at Techstars. Drama in AI. Drama everywhere.

From Plaid to Figma, here are the startups that are likely — or definitely — not having IPOs this year

Feds add nine more incidents to Waymo robotaxi investigation

Pitch Deck Teardown: Terra One’s $7.5M Seed deck

Women in AI: Chinasa T. Okolo researches AI’s impact on the Global South

Disrupt 2024 early-bird tickets fly away next Friday

Active learning is the future of generative AI: Here’s how to leverage it

Eric Landau

More posts from Eric Landau

What is active learning?

Why sophisticated companies should be ready to leverage active learning

The best ways to leverage active learning

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags