Biotech & Health

We need to talk about AI and access to publicly funded data-sets

Comment

Image Credits: Maya2008 (opens in a new window) / Shutterstock (opens in a new window)

For more than a decade the company formerly known as Google, latterly rebranded Alphabet to illustrate the full breadth of its A to Z business ambitions, has engineered an annually increasing revenue generating empire which last year pulled in ~$75 billion. And it’s done this mostly by mining user data for ad targeting intel.

Slice it and dice it how you like but Google’s business engine needs data like the human body needs oxygen. Most of its products are thus designed to remove friction to accessing more user data; whether it’s free search, free email, free cloud storage, free document editing tools, free messaging apps, a fuzzy social network that no one loves but which is somehow still hanging around, free maps, a mobile OS platform that OEMs can load onto smartphone hardware without paying a license fee… Most of what Google builds it opens to all comers to keep the data pouring in. The bits and bytes must flow.

The trade off for consumers handing over data is of course access to a particular Google service without any up front cost. Or getting to buy a cheaper piece of hardware than they might otherwise be able to. Or the convenience of using a dominant digital service. Of course they are ‘paying’ with their data, but few will think of it that way. It’s an abstract idea for starters, and a personal cost that’s far harder to quantify given how unclear it is what Google really does with the data it gathers and processes in its algorithmic black boxes.

Google certainly isn’t spelling that out. Rather it makes noises about the benefits of it knowing more about you (savvier virtual assistants, more powerful photo search and so on). And without explicit knowledge of what the trade-off entails — coupled with noisy PR about the convenience of data-powered services — most consumers will simply shrug and carry on handing over the keys to their lives. This is the momentum that fuels Mountain View’s ad-targeting empire. The more it knows about you, the richer it bets it can get.

You can dislike Google’s business model but you can also argue that consumers do (in general) have a choice about whether to use its services. Albeit in markets where the company has a defacto monopoly there may be doubt about how much choice people really have. Not least if the company is found to have been abusing a dominant position by demoting alternatives to its services in its search results (Google is facing just such antitrust claims in Europe, where it has a hugely dominant marketshare in search, for example).

Another caveat is that Google has worked to join up more personal data dots, undermining how much control users have over how they share data with the centralizing Alphabet entity — by, for example, consolidating the privacy policies of multiple products to enable it to flesh out its understanding of each user by cross-referencing their usage of different services. That collapsing of prior partitions between products has also caused Google headaches with European data protection regulators. And contributed to a caricature of it as a vampire octopus with masses of tentacles all maneuvering to feed data back into a single, hungry maw.

But if you think Google has a controversial reputation at this point in its business evolution, buckle up because things are really stepping up a gear.

The Google/Alphabet octopus, via its artificially intelligent DeepMind tentacle, is being granted access to public healthcare data. Lots and lots of healthcare data. Now personal data doesn’t really get more sensitive than people’s medical records. And these highly sensitive bits and bytes are now being sucked towards Google’s algorithmic core — albeit indirectly, via the DeepMind division, which so far this year has two publicly announced data-sharing collaborations with the UK’s National Health Service (NHS).

The public data in question is tied to the two specific projects. But the most recent of these collaborations, with Moorfields Eye Hospital NHS Trust in London, entails DeepMind applying machine learning to the data. Which is a key development. Because, as New Scientist noted this week, Google will be keeping any AI models DeepMind is able to build off of this public data-set. The trained models are effectively its payment in this trade — given it’s not charging the NHS for its services.

So yes, this is another Google freebie. And the cash-strapped, publicly (under)funded NHS has obviously leapt at the chance of a free-at-the-point-of-use high tech partner who might, in time, help improve healthcare outcomes for patients. So it’s granting the commercial giant access to patients’ data.

And while we are told the first NHS DeepMind collaboration, announced back in February with the Royal Free Hospital Trust in London, does not currently involve any AI component, the five-year strategic partnership between the pair does include a wide ranging memorandum of understanding in which DeepMind states its hope to also conduct machine learning research on Royal Free data-sets. So advancing AI is the clear objective for DeepMind’s NHS engagement, as you’d expect. It is a machine learning specialist. And its learning algorithms need the lifeblood of data in order to develop and thrive.

Now we’re all, as individuals, used to getting Google freebies in exchange for sharing some of our data. But the thing is, the data trade off here — with the publicly funded NHS — is a rather different beast. Because the people whose personal data is being pumped into Google-owned databanks are not being asked for their individual consent to the exchange.

Patient consent has not been sought in either of the current NHS collaborations. In the Moorfields project, where the data is being anonymized (or pseudonymized), NHS information governance rules allow for data to be shared for medical research purposes without obtaining patient consent (although NHS patients can opt out of supplying their data to all research projects) — so long as the relevant Health Research Authority clears the project. And DeepMind has applied to be cleared access in this case.

In the first collaboration, with the Royal Free, where DeepMind is helping co-design an app to detect acute kidney injury, the patient data being supplied is not anonymized or pseudonymized. In fact full patient medical records are being shared with the company — likely millions of people’s medical records, given it’s getting real-time data across the Trust’s three hospitals, along with five years’ worth of historical inpatient data.

In that case patient consent has not been sought because the Royal Free argues consent can be implied as it claims the app is for “direct patient care”, rather than being a medical research project (or another classification, such as indirect patient care). There has been controversy over that definition — with health data privacy groups disputing the classification of the project and questioning why DeepMind has been handed access to so much identifiable patient data. Regulators have also stepped in after the fact to take a look at the project’s parameters.

Whatever the upshot of those complaints, it’s fair to say NHS rules on information governance are not an exact science, and do involve interpretation by individual NHS Trusts. There is no definitive set of NHS data-sharing commandments to point to to definitely denounce the scope of the arrangement. The best we have is a series of principles developed by the NHS’ national data guardian, Fiona Caldicott. And, perhaps, our public sense of right and wrong.

But what is absolutely crystal clear is that millions of NHS patients’ medical histories are being traded with DeepMind in exchange for some free services. And none of these people have been asked if they agree with the specific trade.

No one has been asked if they think it’s a fair exchange.

The NHS, which launched in 1948, is a free-at-the-point of use public healthcare service for all UK residents — currently that’s around 65 million people. It’s a vast repository of medical data so it’s not at all hard to see why Google is interested. Here lies data of unprecedented value. And not for the relatively crude business of profiling consumers via their digital likes and dislikes; but for far more valuable matters, both in societal and business terms. There could be considerable future revenue-generating opportunities if DeepMind’s AI models end up being able to automate and/or improve complex diagnostic and healthcare challenges, for example. And if the models prove effective they could end up positively impacting healthcare outcomes — although we don’t know exactly who would benefit at this point because we don’t know what pricing structure Google might impose on any commercial application of its AI models.

One thing is clear: large data-sets are the lifeblood of robust machine learning algorithms. In the Moorfields case, DeepMind is getting around a million eye scans to train its machine learning models. And while those eye scans will technically be handed back at the end of the project, any diagnostic intelligence they end up generating will remain in Google’s hands.

The company admits as much in a research outline of the project, though it steers the focus away from these trained algorithms and back to the original data-set (whose value the algorithms will now have absorbed and implicitly contain):

The algorithms developed during the study will not be destroyed. Google DeepMind Health knows of no way to recreate the patient images transferred from the algorithms developed. No patient identifiable data will be included in the algorithms.

DeepMind says it will be publishing “results” of the Moorfields research in academic literature. But it does not say it will be open sourcing any AI models it is able to train off of the publicly funded data.

Which means that data might well end up fueling the future profits of one of the world’s wealthiest technology companies. Instead of that value remaining in the hands of the public, whose data it is.

And not just that — early access to large amounts of valuable taxpayer-funded data could potentially lock in massive commercial advantage for Google in healthcare. Which is perhaps the single most important sector there is, given it affects everyone on the planet. If you don’t think Google has designed on becoming the world’s medic, why do you think it’s doing things like this?

Google will argue that the potential social benefits of algorithmically improved healthcare outcomes are worth this trade off of giving it advantageous access to the locked medicine cabinet where the really powerful data is kept.

But that detracts from the wider point: if valuable public data-sets can create really powerful benefits, shouldn’t that value remain in public hands?

Or shouldn’t we at least be asking if we have a public duty to disseminate the value of publicly funded data as widely as possible?

And are we, as a society, comfortable with the trade off of a few free services — and some feel-good but fuzzy talk of future social good — for prematurely privatizing what could be our core IP?

Shouldn’t we, as the data creators, as the patients, at least be asked if we are comfortable with the terms of the trade?

Fiona Caldicott’s, the UK’s national data guardian, happened to publish her third review of how patient data is handled within the NHS just this week — and she urged a more extensive dialogue with the public about how their data is used. And a proper informed choice to opt in or out.

The old rules about information governance — which still talk in terms of shredding pieces of paper as a viable way to control access to data — have certainly not kept up with big data and machine learning. Stable doors and bolting horses spring to mind when you combine these old school data access  rules with the learning and evolving character of advanced AI.

Access to data-sets is undoubtedly the core competitive advantage for AI builders because really good data is hard to come by and/or expensive to create. And that’s why Google is pushing so hard and fast to embed itself into the NHS.

You can’t blame the company for this healthcare data-grab. It’s just doing what successful commercial enterprises do: figuring out what the future looks like and plotting the fastest route to get there.

What’s less clear is why governments and public bodies find it so hard to see the value locked up in the publicly funded data-sets they control.

Or rather why they fail to come up with effective structures to support maintaining public ownership of public assets; to distribute benefits equally, rather than disproportionately rewarding the single, best-resourced, fastest-moving commercial entity that happens to have the slickest sales pitch. It’s almost as if the public sector is being encouraged to privatize yet another public resource… ehem

Inject a little more structured forward-thinking and public healthcare data could, for example, be contributed (with consent) to machine learning research departments in domestic universities so that AI models can be developed and tested ‘in house’, as it were, with public parents.

Instead we have the opposite prospect: public data assets stripped of their value by the commercial sector. And with zero guarantees that the algorithms of the future will be free at the point of use. Of course Google is going to aim to turn a profit on any healthcare AI models DeepMind creates. It’s not in the business of only giving away freebies.

So the really pressing question — roundly ignored by web consumers going about their daily Googling but perhaps moving into clearer focus, here and now, as commercial thirst to accelerate AI advancements is encouraging public sector bodies to over-hastily ink wide-ranging data-sharing arrangements — is what is the true cost of free?

And if we’ve inked the contracts before we even know the answer to that question won’t it be too late for us to haggle over the price?

Even DeepMind talks publicly about the need for new models of information governance and ethics to be put in place to properly oversee the coupling of AI with data…

https://twitter.com/jedgar/status/751185868430409732

So we, the public, really need to get our act together and demand a debate about who should own the value locked up in our data. And preferably do so before we’ve handed over any more sets of keys.

More TechCrunch

The AI industry moves faster than the rest of the technology sector, which means it outpaces the federal government by several orders of magnitude.

Senate study proposes ‘at least’ $32B yearly for AI programs

The FBI along with a coalition of international law enforcement agencies seized the notorious cybercrime forum BreachForums on Wednesday.  For years, BreachForums has been a popular English-language forum for hackers…

FBI seizes hacking forum BreachForums — again

The announcement signifies a significant shake-up in the streaming giant’s advertising approach.

Netflix to take on Google and Amazon by building its own ad server

It’s tough to say that a $100 billion business finds itself at a critical juncture, but that’s the case with Amazon Web Services, the cloud arm of Amazon, and the…

Matt Garman taking over as CEO with AWS at crossroads

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show…

Google still hasn’t fixed Gemini’s biased image generator

A feature Google demoed at its I/O confab yesterday, using its generative AI technology to scan voice calls in real time for conversational patterns associated with financial scams, has sent…

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

Google’s going all in on AI — and it wants you to know it. During the company’s keynote at its I/O developer conference on Tuesday, Google mentioned “AI” more than…

The top AI announcements from Google I/O

Uber is taking a shuttle product it developed for commuters in India and Egypt and converting it for an American audience. The ride-hail and delivery giant announced Wednesday at its…

Uber has a new way to solve the concert traffic problem

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

Google is preparing to launch a new system to help address the problem of malware on Android. Its new live threat detection service leverages Google Play Protect’s on-device AI to…

Google takes aim at Android malware with an AI-powered live threat detection service

Users will be able to access the AR content by first searching for a location in Google Maps.

Google Maps is getting geospatial AR content later this year

The heat pump startup unveiled its first products and revealed details about performance, pricing and availability.

Quilt heat pump sports sleek design from veterans of Apple, Tesla and Nest

The space is available from the launcher and can be locked as a second layer of authentication.

Google’s new Private Space feature is like Incognito Mode for Android

Gemini, the company’s family of generative AI models, will enhance the smart TV operating system so it can generate descriptions for movies and TV shows.

Google TV to launch AI-generated movie descriptions

When triggered, the AI-powered feature will automatically lock the device down.

Android’s new Theft Detection Lock helps deter smartphone snatch and grabs

The company said it is increasing the on-device capability of its Google Play Protect system to detect fraudulent apps trying to breach sensitive permissions.

Google adds live threat detection and screen-sharing protection to Android

This latest release, one of many announcements from the Google I/O 2024 developer conference, focuses on improved battery life and other performance improvements, like more efficient workout tracking.

Wear OS 5 hits developer preview, offering better battery life

For years, Sammy Faycurry has been hearing from his registered dietitian (RD) mom and sister about how poorly many Americans eat and their struggles with delivering nutritional counseling. Although nearly…

Dietitian startup Fay has been booming from Ozempic patients and emerges from stealth with $25M from General Catalyst, Forerunner

Apple is bringing new accessibility features to iPads and iPhones, designed to cater to a diverse range of user needs.

Apple announces new accessibility features for iPhone and iPad users

TechCrunch Disrupt, our flagship startup event held annually in San Francisco, is back on October 28-30 — and you can expect a bustling crowd of thousands of startup enthusiasts. Exciting…

Startup Blueprint: TC Disrupt 2024 Builders Stage agenda sneak peek!

Mike Krieger, one of the co-founders of Instagram and, more recently, the co-founder of personalized news app Artifact (which TechCrunch corporate parent Yahoo recently acquired), is joining Anthropic as the…

Anthropic hires Instagram co-founder as head of product

Seven orgs so far have signed on to standardize the way data is collected and shared.

Venture orgs form alliance to standardize data collection

As cloud adoption continues to surge toward the $1 trillion mark in annual spend, we’re seeing a wave of enterprise startups gaining traction with customers and investors for tools to…

Alkira connects with $100M for a solution that connects your clouds

Charging has long been the Achilles’ heel of electric vehicles. One startup thinks it has a better way for apartment dwelling EV drivers to charge overnight.

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

So did investors laugh them out of the room when they explained how they wanted to replace Quickbooks? Kind of.

Embedded accounting startup Layer secures $2.3M toward goal of replacing QuickBooks

While an increasing number of companies are investing in AI, many are struggling to get AI-powered projects into production — much less delivering meaningful ROI. The challenges are many. But…

Weka raises $140M as the AI boom bolsters data platforms

PayHOA, a previously bootstrapped Kentucky-based startup that offers software for self-managed homeowner associations (HOAs), is an example of how real-world problems can translate into opportunity. It just raised a $27.5…

Meet PayHOA, a profitable and once-bootstrapped SaaS startup that just landed a $27.5M Series A

Restaurant365, which offers a restaurant management suite, has raised a hot $175M from ICONIQ Growth, KKR and L Catterton.

Restaurant365 orders in $175M at $1B+ valuation to supersize its food service software stack 

Venture firm Shilling has launched a €50M fund to support growth-stage startups in its own portfolio and to invest in startups everywhere else. 

Portuguese VC firm Shilling launches €50M opportunity fund to back growth-stage startups

Chang She, previously the VP of engineering at Tubi and a Cloudera veteran, has years of experience building data tooling and infrastructure. But when She began working in the AI…

LanceDB, which counts Midjourney as a customer, is building databases for multimodal AI