Featured Article

How engineers fought the CAP theorem in the global war on latency

CockroachDB EC-1 Part 2: Technical design

Comment

Image Credits: Nigel Sussman

CockroachDB was intended to be a global database from the beginning. The founders of Cockroach Labs wanted to ensure that data written in one location would be viewable immediately in another location 10,000 miles away. The use case was simple, but the work needed to make it happen was herculean.

The company is betting the farm that it can solve one of the largest challenges for web-scale applications. The approach it’s taking is clever, but it’s a bit complicated, particularly for the non-technical reader. Given its history and engineering talent, the company is in the process of pulling it off and making a big impact on the database market, making it a technology well worth understanding. In short, there’s value in digging into the details.

In part 1 of this EC-1, I provided a general overview and a look at the origins of Cockroach Labs. In this installment, I’m going to cover the technical details of the technology with an eye to the non-technical reader. I’m going to describe the CockroachDB technology through three questions:

  1. What makes reading and writing data over a global geography so hard?
  2. How does CockroachDB address the problem?
  3. What does it all mean for those using CockroachDB?

What makes reading and writing data over a global geography so hard?

Spencer Kimball, CEO and co-founder of Cockroach Labs, describes the situation this way:

There’s lots of other stuff you need to consider when building global applications, particularly around data management. Take, for example, the question and answer website Quora. Let’s say you live in Australia. You have an account and you store the particulars of your Quora user identity on a database partition in Australia.

But when you post a question, you actually don’t want that data to just be posted in Australia. You want that data to be posted everywhere so that all the answers to all the questions are the same for everybody, anywhere. You don’t want to have a situation where you answer a question in Sydney and then you can see it in Hong Kong, but you can’t see it in the EU. When that’s the case, you end up getting different answers depending where you are. That’s a huge problem.

Reading and writing data over a global geography is challenging for pretty much the same reason that it’s faster to get a pizza delivered from across the street than from across the city. The essential constraints of time and space apply. Whether it’s digital data or a pepperoni pizza, the further away you are from the source, the longer stuff takes to get to you.

The essential objective of data architecture is bringing data as close as possible to the consuming party. One approach is just to have an offline copy on your own device. This is why Amazon downloads an e-book onto your Kindle. There’s nothing between you and the data — no network, no firewall, nothing. You can’t get much closer to the data than holding it in your hand.

While a single download works for data that doesn’t change often, like an e-book, rapidly changing data such as the price of a stock requires a technology far more sophisticated. This is where databases come into play.

A database is intended to be a central location that handles massive amounts of reading and writing of data that occurs quickly, often at intervals of a fraction of second. While often only one author will write a Kindle book, a database is designed to allow millions, maybe billions of “authors” to write data to it and then read that new data once it’s written. It’s a very powerful technology, but it’s not magical. The constraints of time and space are always present.

Image Credits: Andriy Onufriyenko / Getty Images

Let’s go back to the Quora example Kimball describes above. A New York-based user writing the answer to a question to a database hosted in NYC is going to see that answer well before users in Sydney. Depending on the condition of the network, that lag could be anywhere from a half a second to a few seconds. As Kimball says, “That’s a huge problem.”

Global applications address this problem by setting up database servers all over the world. When a user makes a change in one nearby database server, it’s replicated to all the other database servers around the planet. Then when it comes time to read the data, users access the database server closest to them.

Figure 1. Under the right conditions, replicating data across the globe can allow users to read the latest write to a database. Image Credits: Bob Reselman with Bryce Durbin

Replication makes it so users don’t have to take a trip across the planet to read data, but it creates another problem: Synchronization. Again, let’s look at the Quora scenario, which is depicted in Figure 1.

Let’s say a user in NYC asks a question, “Who is the greatest guitarist of all time?” That user uploads the question to the NYC database server, and users in NYC will see the question immediately. Users in Sydney won’t see it for a second or two. Thus, the database server in Sydney is out of sync with the database server in NYC. Sydney will eventually get the new data as part of the worldwide replication process, but there will be lag. Once all the database servers catch up, if there’s no more activity against the guitarist question, things are in sync.

But then, somebody in Paris provides an answer, “Eric Clapton is the greatest guitarist of all time.” The user in Paris answers the question by making a write to a database server in Western Europe. At that point once again, all the other replicating database servers are out-of-sync. People in Paris know the Eric Clapton answer. But users in NYC don’t and neither do the users in Sydney. It’s going to take time for the replication to propagate so all the database servers know about Eric Clapton.

In a way, we’re back to where we started: The immutable law of time and space that says the closer you are to the data, the faster you can read it. Only in this case, instead of the distance being defined as the distance between user and database server, it’s the distance between the database servers themselves.

This problem of data inconsistency has been around for a long time, and in fact, it has a formal definition called the CAP theorem.

According to the CAP theorem (which stands for consistency, availability, partition), when you spread data out over many database servers or partitions (P), you must make a sacrifice. You can have the most consistent data (C) eventually or you can have older, inconsistent data available (A) immediately. What you can’t have is the most current data immediately.

This problem is what makes reading and writing data over a global geography hard, and it’s what CockroachDB is attempting to solve.

How does CockroachDB address the problem?

Cockroach Labs staff. Image Credits: Cockroach Labs

CockroachDB addresses the problem of writing and reading data as fast as possible on a global basis by placing geographic optimization front and center in its database design while also automating a lot of the detailed work around performance. According to Andy Woods, group product manager at Cockroach Labs, “This is about making it easy for you to have reliable, low-latency applications on a global scale.”

To this end, Cockroach Labs introduces a feature in database design called multiregion database architecture. Let’s dive into the details.

Understanding multiregion databases

To handle the scale of most datasets, database technologies often segment data using a logical strategy called sharding, which defines that certain data will be placed on certain database servers.

Traditionally, shards are based on some standardized feature of the dataset. For instance, the full range of zip codes could be subdivided into 10 categories based on the first number of the zip code. Each shard would be stored on a distinct computer, hard drive or even a data center, according to the given sharding logic.

The benefit of sharding is that it makes retrieving data easy. Whenever a user executes a query against data in a database, the database engine scours the database to retrieve and assemble the data according to criteria of the query. For example, “give me all the users who live in a range of zip codes between 50000 and 50009” would be directed to the database server that hosts that particular range of zip codes.

In a relational database, this means going through each table’s rows to find the data of interest. This can take a lot of time in a large database that might have millions of rows of data. However, when the rows are grouped together in close proximity on a computer’s hard disk or even within the same regional data center according to a logical rule, the amount of time to execute the query decreases, in many cases dramatically as depicted in Figure 2. Sharding data alleviates some of the burden imposed by the CAP theorem.

Figure 2. Sharding data according to region makes data retrieval faster. Image Credits: Bob Reselman with Bryce Durbin

The downside of sharding is that it takes a lot of technical know-how and ongoing administration to get it to work reliably. Doing something as simple as adding a new column to a table or changing the sharding logic are risky, time-consuming tasks that require a good deal of technical expertise. One wrong move and the company’s database can be in a world of pain.

Most sharding logic is based on data, but there is a different approach: Taking into account physical geography to group data based on where a user is located. In a multiregion database, data is segmented and spread out across a cluster of physical database servers according to a policy defined by geography. CockroachDB uses a multiregion database architecture to make a lot of the pain of sharding go away.

With CockroachDB, a database engineer will declare a logical database to be multiregion. Then the logical database or even individual tables will be assigned to one or many regions throughout the planet.

Figure 3 illustrates a CockroachDB database named MyDB that’s bound to database servers in the US EAST and US WEST regions. Notice that three tables in the MyDB database are bound to the US WEST region and one table is bound to the US EAST region. This is an example of using the multiregion feature to segment tables in a CockroachDB to specific geographic locations.

Figure 3. CockroachDB’s multiregion feature abstracts away data partitioning and replication according to region. Image Credits: Bob Reselman with Bryce Durbin

Using CockroachDB’s multiregion feature to segment data according to geographic proximity fulfills Cockroach Labs’ primary directive: To get data as close to the user as possible.

Understanding row-level partitioning

There’s more to this. Cockroach Labs takes the concept of geographical data segmentation even further by providing the ability to assign individual rows of a table to a region. This is called row-level geopartitioning, and in the world of database geekiness, it’s a very, very big deal. So much so, in fact, that Cockroach Labs offers it only to paying enterprise customers.

Take a look at Figure 4, which shows a simple table of customers listed by last name, first name, city, state and zip code.

Figure 4. CockroachDB allows developers to assign rows in a table to specific geographic regions. Image Credits: Bob Reselman with Bryce Durbin

Notice that the first two rows of the table have customers with zip codes that correspond to the Western United States while the last row has a customer with a zip code that corresponds to the East Coast. When CockroachDB’s row-level partitioning feature is enabled, it automatically puts the data for the West Coast customers in the US WEST region and the East Coast customers in the US EAST region. The benefit is that applications running in the West Coast will access West Coast customer data lightning fast.

In a way, the company is trying to defy the essential principle the CAP theorem. Remember, in a partitioned database, you can have consistent data eventually or inconsistent data immediately, but you can’t have consistent data now. CockroachDB is cheating the theorem by focusing on the “P” — partitioning the data in a way that most users can have both consistent and available data most, if not all, of the time.

Row-level geopartitioning is a very big deal. It’s very hard to pull off. But it is well worth the effort because it provides the benefits of fine-grain sharding without the huge operational headaches. CockroachDB wants to do all the work.

What does it all mean for those using CockroachDB?

CockroachDB has a lot to offer companies that need to manage data on a global scale. The data management features that support partitioning, whether at the database, table or row levels, are done automatically by the CockroachDB server engine. It’s a powerful addition to database architecture.

But it does incur a learning curve. Developers will need to take some time to learn the enhancements that Cockroach Labs implemented in its version of SQL, the standard query language used to program databases.

CockroachDB has another feature that provides greater control over fault tolerance. called a Survival Goal. Fault tolerance is the ability of a database to recover from system failure. When you have a worldwide database infrastructure that can have hundreds, if not a few thousand, physical database servers, losing a server due to machine or network failure is commonplace.

A Survival Goal is a configuration setting by which a CockroachDB database administrator (DBA) declares how the logical database will replicate data across a cluster of database servers. Survival Goals loom large in Cockroach Labs’ approach to its database architecture. Developers and DBAs will need to take the time to learn how to work with Survival Goals, as it is a concept baked into the CockroachDB way of doing things.

Finally, a real-world impact that companies will encounter when deciding upon CockroachDB is accepting the degree of commitment that goes with adopting the technology. Deciding on a database is a big decision. When adopted, a database typically will be with a company for years, if not decades. Once a company chooses to use CockroachDB, it can be a career-enhancing or a career-ending event for those making the decision. It all depends on the success of the adoption. Careful planning about how the database fits with the architecture of an application is critical.

While it can be used for any application, CockroachDB is best suited for enterprises that want their users across the world to have lightning-fast access to global data in an accurate, reliable manner. There are simpler databases for applications with fewer requirements and less need for performance.

All that said, CockroachDB offers something special and unique for very valuable customers, and that makes its business incredibly interesting, the topic of the next part of this EC-1.

“Developers, as you know, do not like to pay for things”


CockroachDB EC-1 Table of Contents

Also check out other EC-1s on Extra Crunch.


More TechCrunch

Avendus, the top investment bank for venture deals in India, confirmed on Wednesday it is looking to raise up to $350 million for its new private equity fund.  The new…

Avendus, India’s top venture advisor, confirms it’s looking to raise a $350 million fund

China has closed a third state-backed investment fund to bolster its semiconductor industry and reduce reliance on other nations, both for using and for manufacturing wafers — prioritizing what is…

China’s $47B semiconductor fund puts chip sovereignty front and center

Apple’s annual list of what it considers the best and most innovative software available on its platform is turning its attention to the little guy.

Apple’s Design Awards nominees highlight indies and startups, largely ignore AI (except for Arc)

The spyware maker’s founder, Bryan Fleming, said pcTattletale is “out of business and completely done,” following a data breach.

Spyware maker pcTattletale says it’s ‘out of business’ and shuts down after data breach

AI models are always surprising us, not just in what they can do, but what they can’t, and why. An interesting new behavior is both superficial and revealing about these…

AI models have favorite numbers, because they think they’re people

On Friday, Pal Kovacs was listening to the long-awaited new album from rock and metal giants Bring Me The Horizon when he noticed a strange sound at the end of…

Rock band’s hidden hacking-themed website gets hacked

Jan Leike, a leading AI researcher who earlier this month resigned from OpenAI before publicly criticizing the company’s approach to AI safety, has joined OpenAI rival Anthropic to lead a…

Anthropic hires former OpenAI safety lead to head up new team

Welcome to TechCrunch Fintech! This week, we’re looking at the long-term implications of Synapse’s bankruptcy on the fintech sector, Majority’s impressive ARR milestone, and more!  To get a roundup of…

The demise of BaaS fintech Synapse could derail the funding prospects for other startups in the space

YouTube’s free Playables don’t directly challenge the app store model or break Apple’s rules. However, they do compete with the App Store’s free games.

YouTube’s free games catalog ‘Playables’ rolls out to all users

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the first months of 2024. Smaller-sized…

13 hours ago
A comprehensive list of 2024 tech layoffs

OpenAI has formed a new committee to oversee “critical” safety and security decisions related to the company’s projects and operations. But, in a move that’s sure to raise the ire…

OpenAI’s new safety committee is made up of all insiders

Time is running out for tech enthusiasts and entrepreneurs to secure their early-bird tickets for TechCrunch Disrupt 2024! With only four days left until the May 31 deadline, now is…

Early bird gets the savings — 4 days left for Disrupt sale

AI may not be up to the task of replacing Google Search just yet, but it can be useful in more specific contexts — including handling the drudgery that comes…

Skej’s AI meeting scheduling assistant works like adding an EA to your email

Faircado has built a browser extension that suggests pre-owned alternatives for ecommerce listings.

Faircado raises $3M to nudge people to buy pre-owned goods

Tumblr, the blogging site acquired twice, is launching its “Communities” feature in open beta, the Tumblr Labs division has announced. The feature offers a dedicated space for users to connect…

Tumblr launches its semi-private Communities in open beta

Remittances from workers in the U.S. to their families and friends in Latin America amounted to $155 billion in 2023. With such a huge opportunity, banks, money transfer companies, retailers,…

Félix Pago raises $15.5 million to help Latino workers send money home via WhatsApp

Google said today it’s adding new AI-powered features such as a writing assistant and a wallpaper creator and providing easy access to Gemini chatbot to its Chromebook Plus line of…

Google adds AI-powered features to Chromebook

The dynamic duo behind the Grammy Award–winning music group the Chainsmokers, Alex Pall and Drew Taggart, are set to bring their entrepreneurial expertise to TechCrunch Disrupt 2024. Known for their…

The Chainsmokers light up Disrupt 2024

The deal will give LumApps a big nest egg to make acquisitions and scale its business.

LumApps, the French ‘intranet super app,’ sells majority stake to Bridgepoint in a $650M deal

Featured Article

More neobanks are becoming mobile networks — and Nubank wants a piece of the action

Nubank is taking its first tentative steps into the mobile network realm, as the NYSE-traded Brazilian neobank rolls out an eSIM (embedded SIM) service for travelers. The service will give customers access to 10GB of free roaming internet in more than 40 countries without having to switch out their own existing physical SIM card or…

21 hours ago
More neobanks are becoming mobile networks — and Nubank wants a piece of the action

Infra.Market, an Indian startup that helps construction and real estate firms procure materials, has raised $50M from MARS Unicorn Fund.

MARS doubles down on India’s Infra.Market with new $50M investment

Small operations can lose customers by not offering financing, something the Berlin-based startup wants to change.

Cloover wants to speed solar adoption by helping installers finance new sales

India’s Adani Group is in discussions to venture into digital payments and e-commerce, according to a report.

Adani looks to battle Reliance, Walmart in India’s e-commerce, payments race, report says

Ledger, a French startup mostly known for its secure crypto hardware wallets, has started shipping new wallets nearly 18 months after announcing the latest Ledger Stax devices. The updated wallet…

Ledger starts shipping its high-end hardware crypto wallet

A data protection taskforce that’s spent over a year considering how the European Union’s data protection rulebook applies to OpenAI’s viral chatbot, ChatGPT, reported preliminary conclusions Friday. The top-line takeaway…

EU’s ChatGPT taskforce offers first look at detangling the AI chatbot’s privacy compliance

Here’s a shoutout to LatAm early-stage startup founders! We want YOU to apply for the Startup Battlefield 200 at TechCrunch Disrupt 2024. But you’d better hurry — time is running…

LatAm startups: Apply to Startup Battlefield 200

The countdown to early-bird savings for TechCrunch Disrupt, taking place October 28–30 in San Francisco, continues. You have just five days left to save up to $800 on the price…

5 days left to get your early-bird Disrupt passes

Venture investment into Spanish startups also held up quite well, with €2.2 billion raised across some 850 funding rounds.

Spanish startups reached €100 billion in aggregate value last year

Featured Article

Onyx Motorbikes was in trouble — and then its 37-year-old owner died

James Khatiblou, the owner and CEO of Onyx Motorbikes, was watching his e-bike startup fall apart.  Onyx was being evicted from its warehouse in El Segundo, near Los Angeles. The company’s unpaid bills were stacking up. Its chief operating officer had abruptly resigned. A shipment of around 100 CTY2 dirt bikes from Chinese supplier Suzhou…

2 days ago
Onyx Motorbikes was in trouble — and then its 37-year-old owner died

Featured Article

Iyo thinks its GenAI earbuds can succeed where Humane and Rabbit stumbled

Iyo represents a third form factor in the push to deliver standalone generative AI devices: Bluetooth earbuds.

2 days ago
Iyo thinks its GenAI earbuds can succeed where Humane and Rabbit stumbled