AI

The best way to start an AI project? Don’t think about the models

Comment

Gentoo penguin diving with perfect form
Image Credits: David Merron Photography (opens in a new window) / Getty Images

Eran Shlomo

Contributor

Eran Shlomo is the co-founder and CEO of Dataloop.

Did you know that 85% of all AI projects fail to reach the production or operation stage? Why is this the case?

It’s very common for businesses to come up with creative ideas to use AI to improve customer experience or simplify workflows. The barrier to success for these projects often resides in the time and resources it takes to get them into development and then into production. But, as we’ve seen with OpenAI’s new ChatGPT, AI can be as entertaining as it can be problematic.

With so many projects failing, or worse, being inaccurate, chances are that many of these companies are making the same mistakes. The following are some tips that will optimize your chances of success.

Start off on the right foot

The process of AI development suffers from poor planning, project management, and engineering problems. Most business leaders today learn about AI from the media, which often describes the value of AI as magic or as something that can be put into production with just a few sprinkles.

They believe implementing AI can help lower costs, improve margins and boost revenue. With competitors already in motion, it creates AI “FOMO” and executives are pushed to take action quickly despite not having a clear understanding of the overall impact, plan, cost and resources involved in creating a successful and accurate AI project.

With little understanding of the engineering environment, the first logical step should be hiring data scientists to map and plan the challenges that the team may face. However, these data scientists usually have no domain knowledge.

A data scientist entering a new organization with the goal of automating and improving the business will usually try to manually collect enough data to first prove there is value in creating AI. Once a successful proof of concept is made, the team often hits a wall regarding its data management. The organization may not collect, store or manage the data in a way that is “AI friendly.”

For example, a factory that wishes to embed smart fault inspection on a production assembly line will be able to demo the AI project pretty fast by using a single camera on a machine for a few minutes. However, for the project to get put into production and be used daily, it must transition the single-camera demo to 500 cameras operating 24/7. This will require many months or even years to bring the value the AI provides in the demo across the finish line.

Executives should, of course, have in mind a clear idea of the problem they want to solve as well as a business case. But the AI core team should include at least three personas, all of which will be equally important for the success of the project: data scientist, data engineer and domain expert.

A data-centric kickoff

While it may sound strange, the first AI project that is successful within an organization should not involve algorithms or fancy AI models. AI is essentially an effort to automate knowledge. Like all automation efforts, it’s good to start by showing the value of a few examples in a manual, slow and non-scalable manner.

In the kickoff, the data engineer will create a few cases using data and the domain expert will transform these case studies into examples. This step exposes the core of the AI process. The business will have raw data and a goal, and then they will want to get an example, which is the output data as it is ideally imagined.

This process is called data development, and the approach is data-centric by nature since no modeling is involved. This approach has many advantages over the model-first approach, such as

  • It requires less investment.
  • It utilizes the organization’s strengths (process, data, domain expertise).
  • It is faster.
  • It lowers risk.

Once a few examples are completed manually, the business can start planning the AI’s path to production.

Mapping out the plan

The manual phase outlined in the previous section will expose many of the data challenges the AI project is about to face even before the first line of code is written. These problems may include:

  • Incorrect data.
  • Missing data.
  • Data collection challenges.
  • Corner cases and decision boundary issues.
  • The data structure: inputs and outputs (schema).

More importantly, outlining these challenges will enable the data scientists to plan better. They, along with algorithm developers, will now be able to provide the best estimation of the effort it will take, the complexity and the cost of the project. These insights will be based upon real data examples rather than a high-level description of the business and the potential value AI can deliver.

When this process is executed properly, you will have a solid AI plan with minimal effort since many of the challenges and requirements have already been predicted. Having the correct organization processes, team and data sources in place will help you reach the data flywheel effect as fast as possible.

Companies should then focus on building a tech stack with the right components for fast, accurate and cost-effective AI development: A data management system, data applications, data pipelines, model management and workforce management.

Planning for the data flywheel effect

Building a data-centric system means planning from day one for the data flywheel effect, the fundamental principle behind the most powerful AI applications. AI systems are essentially learning systems, so businesses must expect that the AI system will have to learn and adapt in a fast and cost-effective manner as they gain more users, customers, deployment environments and use cases.

Planning for the data flywheel effect is essentially planning a learning system rather than an AI model that works properly at a single point in time. Most of the data that the AI needs to perform at its best ability is not available to the development team. This data will flow in over time during production through users, customers, machines and databases. This creates a “chicken or egg” problem: Businesses need production data to deliver a functional model, but the model needs to exist in order to go to production:

machine learning data flywheel concept
Image Credits: Dataloop

The above problem is solved by looking at a case that is correctly solved in the lab. However, the lab model won’t work with production data until it goes through a calibration period for every deployment. This allows the model outputs to be tested and retrained on the production data quickly. This process is repeated with each new deployment to gradually improve the speed and accuracy of the AI model.

The data flywheel effect is essentially a methodology to iteratively reduce the model overfitting and teach it how to properly solve one deployment at a time while learning faster and becoming more accurate with each deployment.

What does it take to close the loop?

In essence, the data flywheel effect is the automation of manual work done by the initial team, where cases are collected, automation (model) is run, data is validated and wrong results are added to the next model learning cycle. This is called data training.

Let us break it down into its components:

The case

Businesses need to define the clear schema of the single case: The data types, properties, metadata, files and expected output structure. Since changing schema along the way is very expensive, the team should give this area significant attention early on.

Data collection

It is very common for a single data case to contain data pieces from many different sources. The data collection process is essentially an ETL (extract, transform, load) process, where each data case is collected and prepared for processing, either by the domain expert or by the model.

The inference

The next step will be running the model on the prepared case and getting the results.

The validation

This is where the domain experts work manually to validate the accuracy of the case and the model results using the proper tools. Once a wrong model result is detected, the case is delivered to the training datasets, where it will be included in the next model learning cycle.

The training

Once enough new examples have been collected, a new training cycle produces a new model. The training result should be evaluated against all past data, ensuring the model is both improving the new cases as well as performing worse on past cases. Once it passes evaluation, the model is deployed and the data loop is closed.

Allow humans and ML to work together

The above flow shows us that the future of AI is not about replacing humans but creating a continued synergy between the two. The AI provides automation, speed and low costs while the domain experts act as moderators and constantly guide the AI to a correct result in a constantly changing environment.

Human input is needed in the system to provide real-time validations that can continuously learn from and adapt to new scenarios. Incorporating human-in-the-loop (HITL) models into the workflow will produce reliable, cost-effective throughput of training data by minimizing errors right from the start.

The majority of the errors that exist in data annotation and validation occur in the definition and ontology phases, where our schema is changing quite often. Too often, companies start with AI technologies while skipping the business processes and the structure of the data itself, which has a much greater impact on the end results.

What this means for your business

AI has the power to touch every industry. It can already be seen playing an essential role in retail, agriculture and autonomous vehicles. Businesses are in a race to come up with creative new AI solutions, but they don’t have in place the basics of getting AI into production.

Implementing the correct plan from the start will save you time and money while setting up your AI project for success. Data-centric planning starts on day one, and the best way to ensure you are on the correct AI development path is to start your AI project without thinking about the models.

More TechCrunch

Companies are always looking for an edge, and searching for ways to encourage their employees to innovate. One way to do that is by running an internal hackathon around a…

Why companies are turning to internal hackathons

Featured Article

I’m rooting for Melinda French Gates to fix tech’s broken ‘brilliant jerk’ culture

Women in tech still face a shocking level of mistreatment at work. Melinda French Gates is one of the few working to change that.

5 hours ago
I’m rooting for Melinda French Gates to fix tech’s  broken ‘brilliant jerk’ culture

Blue Origin has successfully completed its NS-25 mission, resuming crewed flights for the first time in nearly two years. The mission brought six tourist crew members to the edge of…

Blue Origin successfully launches its first crewed mission since 2022

Creative Artists Agency (CAA), one of the top entertainment and sports talent agencies, is hoping to be at the forefront of AI protection services for celebrities in Hollywood. With many…

Hollywood agency CAA aims to help stars manage their own AI likenesses

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

A new crop of early-stage startups — along with some recent VC investments — illustrates a niche emerging in the autonomous vehicle technology sector. Unlike the companies bringing robotaxis to…

VCs and the military are fueling self-driving startups that don’t need roads

When the founders of Sagetap, Sahil Khanna and Kevin Hughes, started working at early-stage enterprise software startups, they were surprised to find that the companies they worked at were trying…

Deal Dive: Sagetap looks to bring enterprise software sales into the 21st century

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI moves away from safety

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine

IndieBio’s Bay Area incubator is about to debut its 15th cohort of biotech startups. We took special note of a few, which were making some major, bordering on ludicrous, claims…

IndieBio’s SF incubator lineup is making some wild biotech promises

YouTube TV has announced that its multiview feature for watching four streams at once is now available on Android phones and tablets. The Android launch comes two months after YouTube…

YouTube TV’s ‘multiview’ feature is now available on Android phones and tablets

Featured Article

Two Santa Cruz students uncover security bug that could let millions do their laundry for free

CSC ServiceWorks provides laundry machines to thousands of residential homes and universities, but the company ignored requests to fix a security bug.

2 days ago
Two Santa Cruz students uncover security bug that could let millions do their laundry for free

TechCrunch Disrupt 2024 is just around the corner, and the buzz is palpable. But what if we told you there’s a chance for you to not just attend, but also…

Harness the TechCrunch Effect: Host a Side Event at Disrupt 2024

Decks are all about telling a compelling story and Goodcarbon does a good job on that front. But there’s important information missing too.

Pitch Deck Teardown: Goodcarbon’s $5.5M seed deck

Slack is making it difficult for its customers if they want the company to stop using its data for model training.

Slack under attack over sneaky AI training policy

A Texas-based company that provides health insurance and benefit plans disclosed a data breach affecting almost 2.5 million people, some of whom had their Social Security number stolen. WebTPA said…

Healthcare company WebTPA discloses breach affecting 2.5 million people

Featured Article

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Microsoft won’t be facing antitrust scrutiny in the U.K. over its recent investment into French AI startup Mistral AI.

2 days ago
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Ember has partnered with HSBC in the U.K. so that the bank’s business customers can access Ember’s services from their online accounts.

Embedded finance is still trendy as accounting automation startup Ember partners with HSBC UK

Kudos uses AI to figure out consumer spending habits so it can then provide more personalized financial advice, like maximizing rewards and utilizing credit effectively.

Kudos lands $10M for an AI smart wallet that picks the best credit card for purchases

The EU’s warning comes after Microsoft failed to respond to a legally binding request for information that focused on its generative AI tools.

EU warns Microsoft it could be fined billions over missing GenAI risk info

The prospects for troubled banking-as-a-service startup Synapse have gone from bad to worse this week after a United States Trustee filed an emergency motion on Wednesday.  The trustee is asking…

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

U.K.-based Seraphim Space is spinning up its 13th accelerator program, with nine participating companies working on a range of tech from propulsion to in-space manufacturing and space situational awareness. The…

Seraphim’s latest space accelerator welcomes nine companies

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said…

OpenAI inks deal to train AI on Reddit data

X users will now be able to discover posts from new Communities that are trending directly from an Explore tab within the section.

X pushes more users to Communities

For Mark Zuckerberg’s 40th birthday, his wife got him a photoshoot. Zuckerberg gives the camera a sly smile as he sits amid a carefully crafted re-creation of his childhood bedroom.…

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?