Featured Article

AI is keeping GitHub chief legal officer Shelley McKinley busy

TechCrunch chats with GitHub’s legal beagle about the EU’s AI Act and developer concerns around Copilot and ownership

Comment

GitHub Chief Legal Officer Shelley McKinley at GitHub Universe 2023
Image Credits: GitHub

GitHub’s chief legal officer, Shelley McKinley, has plenty on her plate, what with legal wrangles around its Copilot pair-progammer, as well as the Artificial Intelligence (AI) Act, which was voted through the European Parliament this week as “the world’s first comprehensive AI law.”

Three years in the making, the EU AI Act first reared its head back in 2021 via proposals designed to address the growing reach of AI into our everyday lives. The new legal framework is set to govern AI applications based on their perceived risks, with different rules and stipulations, depending on the application and use case.

GitHub, which Microsoft bought for $7.5 billion in 2018, has emerged as one of the most vocal naysayers around one very specific element of the regulations: muddy wording on how the rules might create legal liability for open source software developers.

McKinley joined Microsoft in 2005, serving in various legal roles, including hardware businesses such as Xbox and Hololens, as well as general counsel positions based in Munich and Amsterdam, before landing in the chief legal officer hot seat at GitHub almost three years ago.

“I moved over to GitHub in 2021 to take on this role, which is a little bit different to some chief legal officer roles — this is multidisciplinary,” McKinley told TechCrunch. “So I’ve got standard legal things like commercial contracts, product, and HR issues. And then I have accessibility, so [that means] driving our accessibility mission, which means all developers can use our tools and services to create stuff.”

McKinley is also tasked with overseeing environmental sustainability, which ladders directly up to Microsoft’s own sustainability goals. And then there are issues related to trust and safety, which covers things like moderating content to ensure that “GitHub remains a welcoming, safe, positive place for developers,” as McKinley puts it.

But there’s no ignoring the fact that McKinley’s role has become increasingly intertwined with the world of AI.

Ahead of the EU AI Act getting the green light this week, TechCrunch caught up with McKinley in London.

GitHub Chief Legal Officer Shelley McKinley
GitHub chief legal officer Shelley McKinley Image Credits: GitHub

Two worlds collide

For the unfamiliar, GitHub is a platform that enables collaborative software development, allowing users to host, manage, and share code “repositories” (a location where project-specific files are kept) with anyone, anywhere in the world. Companies can pay to make their repositories private for internal projects, but GitHub’s success and scale have been driven by open source software development carried out collaboratively in a public setting.

In the six years since the Microsoft acquisition, much has changed in the technological landscape. AI wasn’t exactly novel in 2018, and its growing impact was becoming more evident across society — but with the advent of ChatGPT, DALL-E, and the rest, AI has arrived firmly in the mainstream consciousness.

“I would say that AI is taking up [a lot of] my time — that includes things like ‘how do we develop and ship AI products?’ and ‘how do we engage in the AI discussions that are going on from a policy perspective?’ as well as ‘how do we think about AI as it comes onto our platform?’” McKinley said.

The advance of AI has also been heavily dependent on open source, with collaboration and shared data pivotal to some of the most preeminent AI systems today — this is perhaps best exemplified by the generative AI poster child OpenAI, which began with a strong open source foundation before abandoning those roots for a more proprietary play (this pivot is also one of the reasons Elon Musk is currently suing OpenAI).

How the OpenAI fiasco could bolster Meta and the ‘open AI’ movement

As well-meaning as Europe’s incoming AI regulations might be, critics argued that they would have significant unintended consequences for the open source community, which in turn could hamper the progress of AI. This argument has been central to GitHub’s lobbying efforts.

“Regulators, policymakers, lawyers … are not technologists,” McKinley said. “And one of the most important things that I’ve personally been involved with over the past year is going out and helping to educate people on how the products work. People just need a better understanding of what’s going on so that they can think about these issues and come to the right conclusions in terms of how to implement regulation.”

At the heart of the concerns was that the regulations would create legal liability for open source “general purpose AI systems,” which are built on models capable of handling a multitude of different tasks. If open source AI developers were to be held liable for issues arising further downstream (i.e., at the application level), they might be less inclined to contribute — and in the process, more power and control would be bestowed upon the Big Tech firms developing proprietary systems.

Open source software development by its very nature is distributed, and GitHub — with its 100 million-plus developers globally — needs developers to be incentivized to continue contributing to what many tout as the fourth industrial revolution. And this is why GitHub has been so vociferous about the AI Act, lobbying for exemptions for developers working on open source, general-purpose AI technology.

“GitHub is the home for open source, we are the steward of the world’s largest open source community,” McKinley said. “We want to be the home for all developers, we want to accelerate human progress through developer collaboration. And so for us, it’s mission critical — it’s not just a ‘fun to have’ or ‘nice to have’ — it’s core to what we do as a company as a platform.”

GitHub CEO on why open source developers should be exempt from the EU’s AI Act

As things transpired, the text of the AI Act now includes some exemptions for AI models and systems released under free and open source licenses — though a notable exception includes where “unacceptable” high-risk AI systems are at play. So in effect, developers behind open source, general-purpose AI models don’t have to provide the same level of documentation and guarantees to EU regulators — though it’s not yet clear which proprietary and open source models will fall under its “high-risk” categorization.

But those intricacies aside, McKinley reckons that their hard lobbying work has mostly paid off, with regulators placing less focus on software “componentry” (the individual elements of a system that opensource developers are more likely to create), and more on what’s happening at the compiled application level.

“That is a direct result of the work that we’ve been doing to help educate policymakers on these topics,” McKinley said. “What we’ve been able to help people understand is the componentry aspect of it — there’s open source components being developed all the time, that are being put out for free and that [already] have a lot of transparency around them — as do the open source AI models. But how do we think about responsibly allocating the liability? That’s really not on the upstream developers; it’s just really downstream commercial products. So I think that’s a really big win for innovation, and a big win for open source developers.”

Age of AI: Everything you need to know about artificial intelligence

Enter Copilot

With the rollout of its AI-enabled pair-programming tool Copilot three years back, GitHub set the stage for a generative AI revolution that looks set to upend just about every industry, including software development. Copilot suggests lines or functions as the software developer types, a little like how Gmail’s Smart Compose speeds up email writing by suggesting the next chunk of text in a message.

However, Copilot has upset a substantial segment of the developer community, including those at the not-for-profit Software Freedom Conservancy, who called for all open source software developers to ditch GitHub in the wake of Copilot’s commercial launch in 2022. The problem? Copilot is a proprietary, paid-for service that capitalizes on the hard work of the open source community. Moreover, Copilot was developed in cahoots with OpenAI (before the ChatGPT craze), leaning substantively on OpenAI Codex, which itself was trained on a massive amount of public source code and natural language models.

GitHub Copilot
GitHub Copilot Image Credits: GitHub

Copilot ultimately raises key questions around who authored a piece of software — if it’s merely regurgitating code written by another developer, then shouldn’t that developer get credit for it? Software Freedom Conservancy’s Bradley M. Kuhn wrote a substantial piece precisely on that matter, called “If Software Is My Copilot, Who Programmed My Software?

There’s a misconception that “open source” software is a free-for-all — that anyone can simply take code produced under an open source license and do as they please with it. But while different open source licenses have different restrictions, they all pretty much have one notable stipulation: Developers reappropriating code written by someone else need to include the correct attribution. It’s difficult to do that if you don’t know who (if anyone) wrote the code that Copilot is serving you.

The Copilot kerfuffle also highlights some of the difficulties in simply understanding what generative AI is. Large language models, such as those used in tools such as ChatGPT or Copilot, are trained on vast swathes of data — much like a human software developer learns to do something by poring over previous code, Copilot is always likely to produce output that is similar (or even identical) to what has been produced elsewhere. In other words, whenever it does match public code, the match “frequently” applies to “dozens, if not hundreds” of repositories.

“This is generative AI, it’s not a copy-and-paste machine,” McKinley said. “The one time that Copilot might output code that matches publicly available code, generally, is if it’s a very, very common way of doing something. That said, we hear that people have concerns about these things — we’re trying to take a responsible approach, to ensure that we’re meeting the needs of our community in terms of developers [that] are really excited about this tool. But we’re listening to developers feedback too.”

At the tail end of 2022, several U.S. software developers sued the company alleging that Copilot violates copyright law, calling it “unprecedented open-source soft­ware piracy.” In the intervening months, Microsoft, GitHub, and OpenAI managed to get various facets of the case thrown out, but the lawsuit rolls on, with the plaintiffs recently filing an amended complaint around GitHub’s alleged breach-of-contract with its developers.

The legal skirmish wasn’t exactly a surprise, as McKinley notes. “We definitely heard from the community — we all saw the things that were out there, in terms of concerns were raised,” McKinley said.

With that in mind, GitHub made some efforts to allay concerns over the way Copilot might “borrow” code generated by other developers. For instance, it introduced a “duplication detection” feature. It’s turned off by default, but once activated, Copilot will block code completion suggestions of more than 150 characters that match publicly available code. And last August, GitHub debuted a new code-referencing feature (still in beta) that allows developers to follow the breadcrumbs and see where a suggested code snippet comes from — armed with this information, they can follow the letter of the law as it pertains to licensing requirements and attribution, and even use the entire library the code snippet was appropriated from.

GitHub Code Match
Copilot Code Match Image Credits: GitHub

But it’s difficult to assess the scale of the problem that developers have voiced concerns about — GitHub has previously said that its duplication detection feature would trigger “less than 1%” of the time when activated. Even then, it’s usually when there is a near-empty file with little local context to run with — so in those cases, it is more likely to make a suggestion that matches code written elsewhere.

“There are a lot of opinions out there — there are more than 100 million developers on our platform,” McKinley said. “And there are a lot of opinions between all of the developers, in terms of what they’re concerned about. So we are trying to react to feedback to the community, proactively take measures that we think help make Copilot a great product and experience for developers.”

What next?

The EU AI Act progressing is just the beginning — we now know that it’s definitely happening, and in what form. But it will still be at least another couple of years before companies have to comply with it — similar to how companies had to prepare for GDPR in the data privacy realm.

“I think [technical] standards are going to play a big role in all of this,” McKinley said. “We need to think about how we can get harmonized standards that companies can then comply with. Using GDPR as an example, there are all kinds of different privacy standards that people designed to harmonize that. And we know that as the AI Act goes to implementation, there will be different interests, all trying to figure out how to implement it. So we want to make sure that we’re giving a voice to developers and open source developers in those discussions.”

On top of that, more regulations are on the horizon. President Biden recently issued an executive order with a view toward setting standards around AI safety and security, which gives a glimpse into how Europe and the U.S. might ultimately differ as it pertains to regulation — even if they do share a similar “risk-based” approach.

“I would say the EU AI Act is a ‘fundamental rights base,’ as you would expect in Europe,” McKinley said. “And the U.S. side is very cybersecurity, deep-fakes — that kind of lens. But in many ways, they come together to focus on what are risky scenarios — and I think taking a risk-based approach is something that we are in favor of — it’s the right way to think about it.”

More TechCrunch

Welcome back to TechCrunch’s Week in Review — TechCrunch’s newsletter recapping the week’s biggest news. Want it in your inbox every Saturday? Sign up here. Over the past eight years,…

Fisker collapsed under the weight of its founder’s promises

What is AI? We’ve put together this non-technical guide to give anyone a fighting chance to understand how and why today’s AI works.

WTF is AI?

President Joe Biden has vetoed H.J.Res. 109, a congressional resolution that would have overturned the Securities and Exchange Commission’s current approach to banks and crypto. Specifically, the resolution targeted the…

President Biden vetoes crypto custody bill

Featured Article

Industries may be ready for humanoid robots, but are the robots ready for them?

How large a role humanoids will play in that ecosystem is, perhaps, the biggest question on everyone’s mind at the moment.

12 hours ago
Industries may be ready for humanoid robots, but are the robots ready for them?

VCs are clamoring to invest in hot AI companies, willing to pay exorbitant share prices for coveted spots on their cap tables. Even so, most aren’t able to get into…

VCs are selling shares of hot AI companies like Anthropic and xAI to small investors in a wild SPV market

The fashion industry has a huge problem: Despite many returned items being unworn or undamaged, a lot, if not the majority, end up in the trash. An estimated 9.5 billion…

Deal Dive: How (Re)vive grew 10x last year by helping retailers recycle and sell returned items

Tumblr officially shut down “Tips,” an opt-in feature where creators could receive one-time payments from their followers.  As of today, the tipping icon has automatically disappeared from all posts and…

You can no longer use Tumblr’s tipping feature 

Generative AI improvements are increasingly being made through data curation and collection — not architectural — improvements. Big Tech has an advantage.

AI training data has a price tag that only Big Tech can afford

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: Can we (and could we ever) trust OpenAI?

Jasper Health, a cancer care platform startup, laid off a substantial part of its workforce, TechCrunch has learned.

General Catalyst-backed Jasper Health lays off staff

Featured Article

Live Nation confirms Ticketmaster was hacked, says personal information stolen in data breach

Live Nation says its Ticketmaster subsidiary was hacked. A hacker claims to be selling 560 million customer records.

1 day ago
Live Nation confirms Ticketmaster was hacked, says personal information stolen in data breach

Featured Article

Inside EV startup Fisker’s collapse: how the company crumbled under its founders’ whims

An autonomous pod. A solid-state battery-powered sports car. An electric pickup truck. A convertible grand tourer EV with up to 600 miles of range. A “fully connected mobility device” for young urban innovators to be built by Foxconn and priced under $30,000. The next Popemobile. Over the past eight years, famed vehicle designer Henrik Fisker…

1 day ago
Inside EV startup Fisker’s collapse: how the company crumbled under its founders’ whims

Late Friday afternoon, a time window companies usually reserve for unflattering disclosures, AI startup Hugging Face said that its security team earlier this week detected “unauthorized access” to Spaces, Hugging…

Hugging Face says it detected ‘unauthorized access’ to its AI model hosting platform

Featured Article

Hacked, leaked, exposed: Why you should never use stalkerware apps

Using stalkerware is creepy, unethical, potentially illegal, and puts your data and that of your loved ones in danger.

1 day ago
Hacked, leaked, exposed: Why you should never use stalkerware apps

The design brief was simple: each grind and dry cycle had to be completed before breakfast. Here’s how Mill made it happen.

Mill’s redesigned food waste bin really is faster and quieter than before

Google is embarrassed about its AI Overviews, too. After a deluge of dunks and memes over the past week, which cracked on the poor quality and outright misinformation that arose…

Google admits its AI Overviews need work, but we’re all helping it beta test

Welcome to Startups Weekly — Haje‘s weekly recap of everything you can’t miss from the world of startups. Sign up here to get it in your inbox every Friday. In…

Startups Weekly: Musk raises $6B for AI and the fintech dominoes are falling

The product, which ZeroMark calls a “fire control system,” has two components: a small computer that has sensors, like lidar and electro-optical, and a motorized buttstock.

a16z-backed ZeroMark wants to give soldiers guns that don’t miss against drones

The RAW Dating App aims to shake up the dating scheme by shedding the fake, TikTok-ified, heavily filtered photos and replacing them with a more genuine, unvarnished experience. The app…

Pitch Deck Teardown: RAW Dating App’s $3M angel deck

Yes, we’re calling it “ThreadsDeck” now. At least that’s the tag many are using to describe the new user interface for Instagram’s X competitor, Threads, which resembles the column-based format…

‘ThreadsDeck’ arrived just in time for the Trump verdict

Japanese crypto exchange DMM Bitcoin confirmed on Friday that it had been the victim of a hack resulting in the theft of 4,502.9 bitcoin, or about $305 million.  According to…

Hackers steal $305M from DMM Bitcoin crypto exchange

This is not a drill! Today marks the final day to secure your early-bird tickets for TechCrunch Disrupt 2024 at a significantly reduced rate. At midnight tonight, May 31, ticket…

Disrupt 2024 early-bird prices end at midnight

Instagram is testing a way for creators to experiment with reels without committing to having them displayed on their profiles, giving the social network a possible edge over TikTok and…

Instagram tests ‘trial reels’ that don’t display to a creator’s followers

U.S. federal regulators have requested more information from Zoox, Amazon’s self-driving unit, as part of an investigation into rear-end crash risks posed by unexpected braking. The National Highway Traffic Safety…

Feds tell Zoox to send more info about autonomous vehicles suddenly braking

You thought the hottest rap battle of the summer was between Kendrick Lamar and Drake. You were wrong. It’s between Canva and an enterprise CIO. At its Canva Create event…

Canva’s rap battle is part of a long legacy of Silicon Valley cringe

Voice cloning startup ElevenLabs introduced a new tool for users to generate sound effects through prompts today after announcing the project back in February.

ElevenLabs debuts AI-powered tool to generate sound effects

We caught up with Antler founder and CEO Magnus Grimeland about the startup scene in Asia, the current tech startup trends in the region and investment approaches during the rise…

VC firm Antler’s CEO says Asia presents ‘biggest opportunity’ in the world for growth

Temu is to face Europe’s strictest rules after being designated as a “very large online platform” under the Digital Services Act (DSA).

Chinese e-commerce marketplace Temu faces stricter EU rules as a ‘very large online platform’

Meta has been banned from launching features on Facebook and Instagram that would have collected data on voters in Spain using the social networks ahead of next month’s European Elections.…

Spain bans Meta from launching election features on Facebook, Instagram over privacy fears

Stripe, the world’s most valuable fintech startup, said on Friday that it will temporarily move to an invite-only model for new account sign-ups in India, calling the move “a tough…

Stripe curbs its India ambitions over regulatory situation