AI

The best way to start an AI project? Don’t think about the models

Comment

Gentoo penguin diving with perfect form
Image Credits: David Merron Photography (opens in a new window) / Getty Images

Eran Shlomo

Contributor

Eran Shlomo is the co-founder and CEO of Dataloop.

Did you know that 85% of all AI projects fail to reach the production or operation stage? Why is this the case?

It’s very common for businesses to come up with creative ideas to use AI to improve customer experience or simplify workflows. The barrier to success for these projects often resides in the time and resources it takes to get them into development and then into production. But, as we’ve seen with OpenAI’s new ChatGPT, AI can be as entertaining as it can be problematic.

With so many projects failing, or worse, being inaccurate, chances are that many of these companies are making the same mistakes. The following are some tips that will optimize your chances of success.

Start off on the right foot

The process of AI development suffers from poor planning, project management, and engineering problems. Most business leaders today learn about AI from the media, which often describes the value of AI as magic or as something that can be put into production with just a few sprinkles.

They believe implementing AI can help lower costs, improve margins and boost revenue. With competitors already in motion, it creates AI “FOMO” and executives are pushed to take action quickly despite not having a clear understanding of the overall impact, plan, cost and resources involved in creating a successful and accurate AI project.

With little understanding of the engineering environment, the first logical step should be hiring data scientists to map and plan the challenges that the team may face. However, these data scientists usually have no domain knowledge.

A data scientist entering a new organization with the goal of automating and improving the business will usually try to manually collect enough data to first prove there is value in creating AI. Once a successful proof of concept is made, the team often hits a wall regarding its data management. The organization may not collect, store or manage the data in a way that is “AI friendly.”

For example, a factory that wishes to embed smart fault inspection on a production assembly line will be able to demo the AI project pretty fast by using a single camera on a machine for a few minutes. However, for the project to get put into production and be used daily, it must transition the single-camera demo to 500 cameras operating 24/7. This will require many months or even years to bring the value the AI provides in the demo across the finish line.

Executives should, of course, have in mind a clear idea of the problem they want to solve as well as a business case. But the AI core team should include at least three personas, all of which will be equally important for the success of the project: data scientist, data engineer and domain expert.

A data-centric kickoff

While it may sound strange, the first AI project that is successful within an organization should not involve algorithms or fancy AI models. AI is essentially an effort to automate knowledge. Like all automation efforts, it’s good to start by showing the value of a few examples in a manual, slow and non-scalable manner.

In the kickoff, the data engineer will create a few cases using data and the domain expert will transform these case studies into examples. This step exposes the core of the AI process. The business will have raw data and a goal, and then they will want to get an example, which is the output data as it is ideally imagined.

This process is called data development, and the approach is data-centric by nature since no modeling is involved. This approach has many advantages over the model-first approach, such as

  • It requires less investment.
  • It utilizes the organization’s strengths (process, data, domain expertise).
  • It is faster.
  • It lowers risk.

Once a few examples are completed manually, the business can start planning the AI’s path to production.

Mapping out the plan

The manual phase outlined in the previous section will expose many of the data challenges the AI project is about to face even before the first line of code is written. These problems may include:

  • Incorrect data.
  • Missing data.
  • Data collection challenges.
  • Corner cases and decision boundary issues.
  • The data structure: inputs and outputs (schema).

More importantly, outlining these challenges will enable the data scientists to plan better. They, along with algorithm developers, will now be able to provide the best estimation of the effort it will take, the complexity and the cost of the project. These insights will be based upon real data examples rather than a high-level description of the business and the potential value AI can deliver.

When this process is executed properly, you will have a solid AI plan with minimal effort since many of the challenges and requirements have already been predicted. Having the correct organization processes, team and data sources in place will help you reach the data flywheel effect as fast as possible.

Companies should then focus on building a tech stack with the right components for fast, accurate and cost-effective AI development: A data management system, data applications, data pipelines, model management and workforce management.

Planning for the data flywheel effect

Building a data-centric system means planning from day one for the data flywheel effect, the fundamental principle behind the most powerful AI applications. AI systems are essentially learning systems, so businesses must expect that the AI system will have to learn and adapt in a fast and cost-effective manner as they gain more users, customers, deployment environments and use cases.

Planning for the data flywheel effect is essentially planning a learning system rather than an AI model that works properly at a single point in time. Most of the data that the AI needs to perform at its best ability is not available to the development team. This data will flow in over time during production through users, customers, machines and databases. This creates a “chicken or egg” problem: Businesses need production data to deliver a functional model, but the model needs to exist in order to go to production:

machine learning data flywheel concept
Image Credits: Dataloop

The above problem is solved by looking at a case that is correctly solved in the lab. However, the lab model won’t work with production data until it goes through a calibration period for every deployment. This allows the model outputs to be tested and retrained on the production data quickly. This process is repeated with each new deployment to gradually improve the speed and accuracy of the AI model.

The data flywheel effect is essentially a methodology to iteratively reduce the model overfitting and teach it how to properly solve one deployment at a time while learning faster and becoming more accurate with each deployment.

What does it take to close the loop?

In essence, the data flywheel effect is the automation of manual work done by the initial team, where cases are collected, automation (model) is run, data is validated and wrong results are added to the next model learning cycle. This is called data training.

Let us break it down into its components:

The case

Businesses need to define the clear schema of the single case: The data types, properties, metadata, files and expected output structure. Since changing schema along the way is very expensive, the team should give this area significant attention early on.

Data collection

It is very common for a single data case to contain data pieces from many different sources. The data collection process is essentially an ETL (extract, transform, load) process, where each data case is collected and prepared for processing, either by the domain expert or by the model.

The inference

The next step will be running the model on the prepared case and getting the results.

The validation

This is where the domain experts work manually to validate the accuracy of the case and the model results using the proper tools. Once a wrong model result is detected, the case is delivered to the training datasets, where it will be included in the next model learning cycle.

The training

Once enough new examples have been collected, a new training cycle produces a new model. The training result should be evaluated against all past data, ensuring the model is both improving the new cases as well as performing worse on past cases. Once it passes evaluation, the model is deployed and the data loop is closed.

Allow humans and ML to work together

The above flow shows us that the future of AI is not about replacing humans but creating a continued synergy between the two. The AI provides automation, speed and low costs while the domain experts act as moderators and constantly guide the AI to a correct result in a constantly changing environment.

Human input is needed in the system to provide real-time validations that can continuously learn from and adapt to new scenarios. Incorporating human-in-the-loop (HITL) models into the workflow will produce reliable, cost-effective throughput of training data by minimizing errors right from the start.

The majority of the errors that exist in data annotation and validation occur in the definition and ontology phases, where our schema is changing quite often. Too often, companies start with AI technologies while skipping the business processes and the structure of the data itself, which has a much greater impact on the end results.

What this means for your business

AI has the power to touch every industry. It can already be seen playing an essential role in retail, agriculture and autonomous vehicles. Businesses are in a race to come up with creative new AI solutions, but they don’t have in place the basics of getting AI into production.

Implementing the correct plan from the start will save you time and money while setting up your AI project for success. Data-centric planning starts on day one, and the best way to ensure you are on the correct AI development path is to start your AI project without thinking about the models.

More TechCrunch

Featured Article

The women in AI making a difference

As a part of a multi-part series, TechCrunch is highlighting women innovators — from academics to policymakers —in the field of AI.

10 mins ago
The women in AI making a difference

Ifeel is being offered as part of an employer’s or insurance provider’s healthcare coverage.

Mental health insurance platform ifeel  raises a $20 million Series B

Instead of opening the user’s actual browser or a WebView, Custom Tabs let users remain in their app while browsing.

Google Chrome becomes a ‘picture-in-picture’ app

Sanil Chawla remembers the meetings he had with countless artists in college. Those creatives were looking for one thing: sustainable economic infrastructure that could help them scale rather than drown…

Creator fintech Slingshot raises $2.2M

A startup called Firefly that’s tackling the thorny and growing issue of cloud asset management with an “infrastructure as code” solution has raised $23 million in funding. That comes on…

Firefly forges on after co-founder murdered by Hamas

Mistral, the French AI startup backed by Microsoft and valued at $6 billion, has released its first generative AI model for coding, dubbed Codestral. Codestral, like other code-generating models, is…

Mistral releases Codestral, its first generative AI model for code

Pinterest announced today that it is evolving its Creator Inclusion Fund to now be called the Pinterest Inclusion Fund. Pinterest teamed up with Shopify’s Build Black & Native program to…

Pinterest expands its Creator Fund to allow founders

Cadillac may seem a bit too traditional to hang its driving cap on EVs. And yet, that hasn’t stopped the GM brand from rolling out — or at least showing…

Cadillac’s new Optiq EV is designed to hook young hipsters

Alex Taub, a longtime founder with multiple exits under his belt, believes it’s time to disrupt the meme industry. “I have this big thesis that meme tech is going to…

This founder says meme tech is the next big thing

Lux, the startup behind popular pro photography app Halide and others, is venturing into video with its latest app launch. On Wednesday, the company announced Kino, a new video capture app…

Kino is a new iPhone app for videographers from the makers of Halide

DevOps startup Harness has shown itself to be an ambitious company, building a broad platform of services while also dabbling in M&A when it made sense to fill in functionality.…

Harness snags Split.io as it goes all in on feature flags and experiments

U.S. Rep. Elissa Slotkin will introduce a bill to Congress that would limit or ban the introduction of connected vehicles built by Chinese companies if found to pose a threat…

Chinese EVs – and their connected tech – are the next target of US lawmakers

Microsoft’s Copilot, a generative AI-powered tool that can generate text as well as answer specific questions, is now available as an in-app chatbot on Telegram, the instant messaging app.  Currently…

Microsoft’s Copilot is now on Telegram

HBO’s new documentary, “MoviePass, MovieCrash,” tells a story that many of us know about: how MoviePass, the subscription-based movie ticketing startup, was a catastrophic failure. After a series of mishaps…

MoviePass co-founders speak their truth in HBO’s new documentary 

The watch features a variety of different 3D games, unlocking more play time the more kids move.

Fitbit’s new kid smartwatch is a little Wiimote, a little Tamagotchi

In the video, a crowd is roaring at a packed summer music festival. As a beat starts playing over the speakers, the performer finally walks onstage: It’s the Joker. Clad…

Discord has become an unlikely center for the generative AI boom

After the Wirecard scandal, Germany’s financial regulator BaFin started to look more closely at young fintech startups that wanted to grow at a rapid pace — it’s better to be…

Germany’s financial regulator ends anti-money laundering cap on N26 signups after $10M fine

Among other things, this includes the ability to trace code from source to binary packages across both platforms, single sign-on support and unified project structures.

JFrog and GitHub team up to closely integrate their source code and binary platforms

The company’s public fund disbursement and e-commerce platform makes accepting school tuition and enabling educational enrichment more accessible. 

Tech startup Odyssey goes on journey to help states implement school choice programs

A new startup called Kinnect aims to help people privately save generational memories, traditions, recipes and more. The company’s app, launched this month, lets people create invite-only spaces where they…

Kinnect’s new app aims to help families record and store generational memories

Spotify has hiked its premium subscription in France by an eye-watering €0.13, in response to a new music-streaming tax.

Spotify hikes subscription price in France by 1.2% to match new music-streaming tax

The European Union has taken the wraps off the structure of the new AI Office, the ecosystem-building and oversight body that’s being established under the bloc’s AI Act. The risk-based…

With the EU AI Act incoming this summer, the bloc lays out its plan for AI governance

Solutions by Text, a company that gives people a way to pay their bills and apply for loans via text messaging, has secured $110 million in new growth funding. Edison…

Bootstrapped for over a decade, this Dallas company just secured $110M to help people pay bills by text

Owners of small- and medium-sized businesses check their bank balances daily to make financial decisions. But it’s entrepreneur Yoseph West’s assertion that there’s typically information and functions missing from bank…

Relay raises $32.2 million to help smaller businesses manage their cash flow

When other firms were investing and raising eye-popping sums, Clean Energy Ventures took a different approach. It appears to be paying off.

How Clean Energy Ventures avoided the pandemic bubble and raised a $305M fund

PwC, the management consulting giant, will become OpenAI’s biggest customer to date, covering 100,000 users.

OpenAI signs 100K PwC workers to ChatGPT’s enterprise tier as PwC becomes its first resale partner

Tech enthusiasts and entrepreneurs, the clock is ticking! With just 72 hours remaining until the early-bird ticket deadline for TechCrunch Disrupt 2024, now is the time to secure your spot…

72 hours left of the Disrupt early-bird sale

Avendus, the top investment bank for venture deals in India, confirmed on Wednesday it is looking to raise up to $350 million for its new private equity fund.  The new…

Avendus, India’s top venture adviser, confirms it’s looking to raise a $350M fund

China has closed a third state-backed investment fund to bolster its semiconductor industry and reduce reliance on other nations, both for using and manufacturing wafers — prioritizing what is called…

China’s $47B semiconductor fund puts chip sovereignty front and center

Apple’s annual list of what it considers the best and most innovative software available on its platform is turning its attention to the little guy.

Apple’s Design Awards nominees highlight indies and startups, largely ignore AI (except for Arc)