We need to talk about AI and access to publicly funded data-sets

8:00 AM PDT • July 9, 2016

**Image Credits:** Maya2008 (opens in a new window) / Shutterstock (opens in a new window)

For more than a decade the company formerly known as Google, latterly rebranded Alphabet to illustrate the full breadth of its A to Z business ambitions, has engineered an annually increasing revenue generating empire which last year pulled in ~$75 billion. And it’s done this mostly by mining user data for ad targeting intel.

Slice it and dice it how you like but Google’s business engine needs data like the human body needs oxygen. Most of its products are thus designed to remove friction to accessing more user data; whether it’s free search, free email, free cloud storage, free document editing tools, free messaging apps, a fuzzy social network that no one loves but which is somehow still hanging around, free maps, a mobile OS platform that OEMs can load onto smartphone hardware without paying a license fee… Most of what Google builds it opens to all comers to keep the data pouring in. The bits and bytes must flow.

The trade off for consumers handing over data is of course access to a particular Google service without any up front cost. Or getting to buy a cheaper piece of hardware than they might otherwise be able to. Or the convenience of using a dominant digital service. Of course they are ‘paying’ with their data, but few will think of it that way. It’s an abstract idea for starters, and a personal cost that’s far harder to quantify given how unclear it is what Google really does with the data it gathers and processes in its algorithmic black boxes.

Google certainly isn’t spelling that out. Rather it makes noises about the benefits of it knowing more about you (savvier virtual assistants, more powerful photo search and so on). And without explicit knowledge of what the trade-off entails — coupled with noisy PR about the convenience of data-powered services — most consumers will simply shrug and carry on handing over the keys to their lives. This is the momentum that fuels Mountain View’s ad-targeting empire. The more it knows about you, the richer it bets it can get.

You can dislike Google’s business model but you can also argue that consumers do (in general) have a choice about whether to use its services. Albeit in markets where the company has a defacto monopoly there may be doubt about how much choice people really have. Not least if the company is found to have been abusing a dominant position by demoting alternatives to its services in its search results (Google is facing just such antitrust claims in Europe, where it has a hugely dominant marketshare in search, for example).

Another caveat is that Google has worked to join up more personal data dots, undermining how much control users have over how they share data with the centralizing Alphabet entity — by, for example, consolidating the privacy policies of multiple products to enable it to flesh out its understanding of each user by cross-referencing their usage of different services. That collapsing of prior partitions between products has also caused Google headaches with European data protection regulators. And contributed to a caricature of it as a vampire octopus with masses of tentacles all maneuvering to feed data back into a single, hungry maw.

But if you think Google has a controversial reputation at this point in its business evolution, buckle up because things are really stepping up a gear.

The Google/Alphabet octopus, via its artificially intelligent DeepMind tentacle, is being granted access to public healthcare data. Lots and lots of healthcare data. Now personal data doesn’t really get more sensitive than people’s medical records. And these highly sensitive bits and bytes are now being sucked towards Google’s algorithmic core — albeit indirectly, via the DeepMind division, which so far this year has two publicly announced data-sharing collaborations with the UK’s National Health Service (NHS).

The public data in question is tied to the two specific projects. But the most recent of these collaborations, with Moorfields Eye Hospital NHS Trust in London, entails DeepMind applying machine learning to the data. Which is a key development. Because, as New Scientist noted this week, Google will be keeping any AI models DeepMind is able to build off of this public data-set. The trained models are effectively its payment in this trade — given it’s not charging the NHS for its services.

So yes, this is another Google freebie. And the cash-strapped, publicly (under)funded NHS has obviously leapt at the chance of a free-at-the-point-of-use high tech partner who might, in time, help improve healthcare outcomes for patients. So it’s granting the commercial giant access to patients’ data.

And while we are told the first NHS DeepMind collaboration, announced back in February with the Royal Free Hospital Trust in London, does not currently involve any AI component, the five-year strategic partnership between the pair does include a wide ranging memorandum of understanding in which DeepMind states its hope to also conduct machine learning research on Royal Free data-sets. So advancing AI is the clear objective for DeepMind’s NHS engagement, as you’d expect. It is a machine learning specialist. And its learning algorithms need the lifeblood of data in order to develop and thrive.

Now we’re all, as individuals, used to getting Google freebies in exchange for sharing some of our data. But the thing is, the data trade off here — with the publicly funded NHS — is a rather different beast. Because the people whose personal data is being pumped into Google-owned databanks are not being asked for their individual consent to the exchange.

Patient consent has not been sought in either of the current NHS collaborations. In the Moorfields project, where the data is being anonymized (or pseudonymized), NHS information governance rules allow for data to be shared for medical research purposes without obtaining patient consent (although NHS patients can opt out of supplying their data to all research projects) — so long as the relevant Health Research Authority clears the project. And DeepMind has applied to be cleared access in this case.

In the first collaboration, with the Royal Free, where DeepMind is helping co-design an app to detect acute kidney injury, the patient data being supplied is not anonymized or pseudonymized. In fact full patient medical records are being shared with the company — likely millions of people’s medical records, given it’s getting real-time data across the Trust’s three hospitals, along with five years’ worth of historical inpatient data.

In that case patient consent has not been sought because the Royal Free argues consent can be implied as it claims the app is for “direct patient care”, rather than being a medical research project (or another classification, such as indirect patient care). There has been controversy over that definition — with health data privacy groups disputing the classification of the project and questioning why DeepMind has been handed access to so much identifiable patient data. Regulators have also stepped in after the fact to take a look at the project’s parameters.

Whatever the upshot of those complaints, it’s fair to say NHS rules on information governance are not an exact science, and do involve interpretation by individual NHS Trusts. There is no definitive set of NHS data-sharing commandments to point to to definitely denounce the scope of the arrangement. The best we have is a series of principles developed by the NHS’ national data guardian, Fiona Caldicott. And, perhaps, our public sense of right and wrong.

But what is absolutely crystal clear is that millions of NHS patients’ medical histories are being traded with DeepMind in exchange for some free services. And none of these people have been asked if they agree with the specific trade.

No one has been asked if they think it’s a fair exchange.

The NHS, which launched in 1948, is a free-at-the-point of use public healthcare service for all UK residents — currently that’s around 65 million people. It’s a vast repository of medical data so it’s not at all hard to see why Google is interested. Here lies data of unprecedented value. And not for the relatively crude business of profiling consumers via their digital likes and dislikes; but for far more valuable matters, both in societal and business terms. There could be considerable future revenue-generating opportunities if DeepMind’s AI models end up being able to automate and/or improve complex diagnostic and healthcare challenges, for example. And if the models prove effective they could end up positively impacting healthcare outcomes — although we don’t know exactly who would benefit at this point because we don’t know what pricing structure Google might impose on any commercial application of its AI models.

One thing is clear: large data-sets are the lifeblood of robust machine learning algorithms. In the Moorfields case, DeepMind is getting around a million eye scans to train its machine learning models. And while those eye scans will technically be handed back at the end of the project, any diagnostic intelligence they end up generating will remain in Google’s hands.

The company admits as much in a research outline of the project, though it steers the focus away from these trained algorithms and back to the original data-set (whose value the algorithms will now have absorbed and implicitly contain):

The algorithms developed during the study will not be destroyed. Google DeepMind Health knows of no way to recreate the patient images transferred from the algorithms developed. No patient identifiable data will be included in the algorithms.

DeepMind says it will be publishing “results” of the Moorfields research in academic literature. But it does not say it will be open sourcing any AI models it is able to train off of the publicly funded data.

Which means that data might well end up fueling the future profits of one of the world’s wealthiest technology companies. Instead of that value remaining in the hands of the public, whose data it is.

And not just that — early access to large amounts of valuable taxpayer-funded data could potentially lock in massive commercial advantage for Google in healthcare. Which is perhaps the single most important sector there is, given it affects everyone on the planet. If you don’t think Google has designed on becoming the world’s medic, why do you think it’s doing things like this?

Google will argue that the potential social benefits of algorithmically improved healthcare outcomes are worth this trade off of giving it advantageous access to the locked medicine cabinet where the really powerful data is kept.

But that detracts from the wider point: if valuable public data-sets can create really powerful benefits, shouldn’t that value remain in public hands?

Or shouldn’t we at least be asking if we have a public duty to disseminate the value of publicly funded data as widely as possible?

And are we, as a society, comfortable with the trade off of a few free services — and some feel-good but fuzzy talk of future social good — for prematurely privatizing what could be our core IP?

Shouldn’t we, as the data creators, as the patients, at least be asked if we are comfortable with the terms of the trade?

Fiona Caldicott’s, the UK’s national data guardian, happened to publish her third review of how patient data is handled within the NHS just this week — and she urged a more extensive dialogue with the public about how their data is used. And a proper informed choice to opt in or out.

The old rules about information governance — which still talk in terms of shredding pieces of paper as a viable way to control access to data — have certainly not kept up with big data and machine learning. Stable doors and bolting horses spring to mind when you combine these old school data access rules with the learning and evolving character of advanced AI.

Access to data-sets is undoubtedly the core competitive advantage for AI builders because really good data is hard to come by and/or expensive to create. And that’s why Google is pushing so hard and fast to embed itself into the NHS.

You can’t blame the company for this healthcare data-grab. It’s just doing what successful commercial enterprises do: figuring out what the future looks like and plotting the fastest route to get there.

What’s less clear is why governments and public bodies find it so hard to see the value locked up in the publicly funded data-sets they control.

Or rather why they fail to come up with effective structures to support maintaining public ownership of public assets; to distribute benefits equally, rather than disproportionately rewarding the single, best-resourced, fastest-moving commercial entity that happens to have the slickest sales pitch. It’s almost as if the public sector is being encouraged to privatize yet another public resource… ehem

Inject a little more structured forward-thinking and public healthcare data could, for example, be contributed (with consent) to machine learning research departments in domestic universities so that AI models can be developed and tested ‘in house’, as it were, with public parents.

Instead we have the opposite prospect: public data assets stripped of their value by the commercial sector. And with zero guarantees that the algorithms of the future will be free at the point of use. Of course Google is going to aim to turn a profit on any healthcare AI models DeepMind creates. It’s not in the business of only giving away freebies.

So the really pressing question — roundly ignored by web consumers going about their daily Googling but perhaps moving into clearer focus, here and now, as commercial thirst to accelerate AI advancements is encouraging public sector bodies to over-hastily ink wide-ranging data-sharing arrangements — is what is the true cost of free?

And if we’ve inked the contracts before we even know the answer to that question won’t it be too late for us to haggle over the price?

Even DeepMind talks publicly about the need for new models of information governance and ethics to be put in place to properly oversee the coupling of AI with data…

https://twitter.com/jedgar/status/751185868430409732

So we, the public, really need to get our act together and demand a debate about who should own the value locked up in our data. And preferably do so before we’ve handed over any more sets of keys.

More TechCrunch

OpenAI Startup Fund raises additional $5M

Marina Temkin

2 hours ago

The fresh funds were raised from two investors who transferred the capital into a special purpose vehicle, a legal entity associated with the OpenAI Startup Fund.

OpenAI Startup Fund raises additional $5M

Venture

Accel has a fresh $650M to back European early-stage startups

Ingrid Lunden

3 hours ago

Accel has invested in more than 200 startups in the region to date, making it one of the more prolific VCs in this market.

Accel has a fresh $650M to back European early-stage startups

Robotics

Cruise founder Kyle Vogt is back with a robot startup

Kirsten Korosec

3 hours ago

Kyle Vogt, the former founder and CEO of self-driving car company Cruise, has a new VC-backed robotics startup focused on household chores. Vogt announced Monday that the new startup, called…

Cruise founder Kyle Vogt is back with a robot startup

Venture

From Miles Grimshaw to Eva Ho, venture capitalists continue to play musical chairs

Rebecca Szkutak

4 hours ago

When Keith Rabois announced he was leaving Founders Fund to return to Khosla Ventures in January, it came as a shock to many in the venture capital ecosystem — and…

From Miles Grimshaw to Eva Ho, venture capitalists continue to play musical chairs

Anthropic is expanding to Europe and raising more money

Ingrid Lunden

4 hours ago

On the heels of OpenAI announcing the latest iteration of its GPT large language model, its biggest rival in generative AI in the U.S. announced an expansion of its own.…

Anthropic is expanding to Europe and raising more money

Space

TechCrunch Space: You rock(et) my world, moms

Aria Alamalhodaei

5 hours ago

If you’re looking for a Starliner mission recap, you’ll have to wait a little longer, because the mission has officially been delayed.

TechCrunch Space: You rock(et) my world, moms

Hardware

Apple iPad Pro M4 vs. iPad Air M2: Reviewing which is right for most

Brian Heater

6 hours ago

Apple devoted a full event to iPad last Tuesday, roughly a month out from WWDC. From the invite artwork to the polarizing ad spot, Apple was clear — the event…

Apple iPad Pro M4 vs. iPad Air M2: Reviewing which is right for most

Venture

GV’s youngest partner has launched her own firm

Dominic-Madori Davis

7 hours ago

Terri Burns, a former partner at GV, is venturing into a new chapter of her career by launching her own venture firm called Type Capital.

GV’s youngest partner has launched her own firm

ChatGPT’s new face is a black hole

Devin Coldewey

8 hours ago

The decision to go monochrome was probably a smart one, considering the candy-colored alternatives that seem to want to dazzle and comfort you.

Hardware

Apple and Google agree on standard to alert people when unknown Bluetooth devices may be tracking them

Aisha Malik

8 hours ago

Apple and Google announced on Monday that iPhone and Android users will start seeing alerts when it’s possible that an unknown Bluetooth device is being used to track them. The…

Apple and Google agree on standard to alert people when unknown Bluetooth devices may be tracking them

OpenAI’s ChatGPT announcement: Watch here

Anthony Ha

9 hours ago

The company is describing the event as “a chance to demo some ChatGPT and GPT-4 updates.”

OpenAI’s ChatGPT announcement: Watch here

Transportation

GM’s Cruise ramps up robotaxi testing in Phoenix

Kirsten Korosec

9 hours ago

A human safety operator will be behind the wheel during this phase of testing, according to the company.

GM’s Cruise ramps up robotaxi testing in Phoenix

OpenAI debuts GPT-4o ‘omni’ model now powering ChatGPT

Kyle Wiggers

10 hours ago

OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the “o” stands for “omni,” referring to the model’s ability to handle text, speech, and…

OpenAI debuts GPT-4o ‘omni’ model now powering ChatGPT

Featured Article

The women in AI making a difference

As a part of a multi-part series, TechCrunch is highlighting women innovators — from academics to policymakers —in the field of AI.

Kyle Wiggers

Dominic-Madori Davis

10 hours ago

Government & Policy

White House proposes up to $120M to help fund Polar Semiconductor’s chip facility expansion

Aisha Malik

10 hours ago

The expansion of Polar Semiconductor’s facility would enable the company to double its U.S. production capacity of sensor and power chips within two years.

White House proposes up to $120M to help fund Polar Semiconductor’s chip facility expansion

Google’s 3D video conferencing platform, Project Starline, is coming in 2025 with help from HP

Kyle Wiggers

11 hours ago

In 2021, Google kicked off work on Project Starline, a corporate-focused teleconferencing platform that uses 3D imaging, cameras and a custom-designed screen to let people converse with someone as if…

Google’s 3D video conferencing platform, Project Starline, is coming in 2025 with help from HP

Apps

Instagram expands its creator marketplace to 10 new countries

Ivan Mehta

12 hours ago

Over the weekend, Instagram announced it is expanding its creator marketplace to 10 new countries — this marketplace connects brands with creators to foster collaboration. The new regions include South…

Enterprise

Google I/O 2024: What to expect

Brian Heater

12 hours ago

You can expect plenty of AI, but probably not a lot of hardware.

Google I/O 2024: How to watch

Brian Heater

13 hours ago

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Fintech

Aplazo is using buy now, pay later as a stepping stone to financial ubiquity in Mexico

Anna Heim

13 hours ago

Four-year-old Mexican BNPL startup Aplazo facilitates fractionated payments to offline and online merchants even when the buyer doesn’t have a credit card.

Aplazo is using buy now, pay later as a stepping stone to financial ubiquity in Mexico

Startups

Vote for your Disrupt 2024 Audience Choice favs

TechCrunch Events

14 hours ago

We received countless submissions to speak at this year’s Disrupt 2024. After carefully sifting through all the applications, we’ve narrowed it down to 19 session finalists. Now we need your…

Vote for your Disrupt 2024 Audience Choice favs

Startups

Healthy growth helps B2B food e-commerce startup Pepper nab $30 million led by ICONIQ Growth

Christine Hall

14 hours ago

Co-founder and CEO Bowie Cheung, who previously worked at Uber Eats, said the company now has 200 customers.

Healthy growth helps B2B food e-commerce startup Pepper nab $30 million led by ICONIQ Growth

Government & Policy

Booking.com latest to fall under EU market power rules

Natasha Lomas

15 hours ago

Booking.com has been designated a gatekeeper under the EU’s DMA, meaning the firm will be regulated under the bloc’s market fairness framework.

Booking.com latest to fall under EU market power rules

Featured Article

‘Got that boomer!’: How cybercriminals steal one-time passcodes for SIM swap attacks and raiding bank accounts

Estate is an invite-only website that has helped hundreds of attackers make thousands of phone calls aimed at stealing account passcodes, according to its leaked database.

Zack Whittaker

15 hours ago

‘Got that boomer!’: How cybercriminals steal one-time passcodes for SIM swap attacks and raiding bank accounts

Enterprise

Permira is taking Squarespace private in a $6.9 billion deal

Paul Sawers

15 hours ago

Squarespace is being taken private in an all-cash deal that values the company on an equity basis at $6.6 billion.

Permira is taking Squarespace private in a $6.9 billion deal

Apps

Buy Me a Coffee’s founder has built an AI-powered voice note app

Ivan Mehta

19 hours ago

AI-powered tools like OpenAI’s Whisper have enabled many apps to make transcription an integral part of their feature set for personal note-taking, and the space has quickly flourished as a…

Buy Me a Coffee’s founder has built an AI-powered voice note app

Google partners with Airtel to offer cloud and GenAI products to Indian businesses

Manish Singh

19 hours ago

Airtel, India’s second-largest telco, is partnering with Google Cloud to develop and deliver cloud and GenAI solutions to Indian businesses.

Google partners with Airtel to offer cloud and GenAI products to Indian businesses

Women in AI: Rep. Dar’shun Kendrick wants to pass more AI legislation

Dominic-Madori Davis

1 day ago

To give AI-focused women academics and others their well-deserved — and overdue — time in the spotlight, TechCrunch has been publishing a series of interviews focused on remarkable women who’ve contributed to…

Women in AI: Rep. Dar’shun Kendrick wants to pass more AI legislation

A reckoning is coming for emerging venture funds, and that, VCs say, is a good thing

Christine Hall

1 day ago

We took the pulse of emerging fund managers about what it’s been like for them during these post-ZERP, venture-capital-winter years.

A reckoning is coming for emerging venture funds, and that, VCs say, is a good thing

Workers at a Maryland Apple store authorize strike

Anthony Ha

1 day ago

It’s been a busy weekend for union organizing efforts at U.S. Apple stores, with the union at one store voting to authorize a strike, while workers at another store voted…

We need to talk about AI and access to publicly funded data-sets

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags