For operators, architects and developers alike, data domiciling can be one tough technical nemesis.
Privacy regulations like GDPR create strict requirements where data can live in the world. To meet these requirements companies traditionally had to run separate databases in different geographic regions. This results in a huge operational overhead and reliability also takes a hit, since it can be hard to guarantee region survival while complying with data domiciling restrictions.
Even after distributed cloud databases came along, data domiciling requirements would force developers to create multiple logically equivalent copies of a database and add complicated logic to their applications to route traffic to the appropriate database. When CockroachDB 21.1 introduced simplified multi-region capabilities, we made it possible to deploy an application across multiple regions in just three steps. (We also made it possible to do data homing at the table or even row level.) Unfortunately, for users looking to survive a region failure, there was no easy way to constrain data to a set of regions for domiciling purposes.
So we built in functionality to make sure this never happens: new database capability that grants your application the power to leap multiple regions in a single bound (with zero query changes) while making sure your domiciled data never strays outside its designated borders.
Is it a bird? Is it a plane? Is it…a supersonic cockroach in a purple cape? (Not yet, but we should definitely add that to the product roadmap). No! It’s CockroachDB 22.1’s new Super Regions.
What are super regions?
CockroachDB 22.1 introduces super regions (now in preview) to make it easier to keep data in specific geographic regions and help comply with data domiciling regulations. Super regions group multiple cloud regions into a larger geographical area — a handy feature for a potentially massively distributed database with nodes in locations that do not all match your data domiciling requirements.
Super regions enforce data placement to an overarching region. For example, you create a super region called “Europe,” which includes Ireland (eu-west-1), Frankfurt (eu-central-1), and Paris (eu-west-3). Doing this allows you to group cloud provider regions together into a larger geographic region, and ensure data never leaves the super region — even during a region failure. In our example super region “Europe,” Ireland can fail but your data still won’t leave the EU.
The great thing about super regions is now you get to have both data domiciling and region survivability. Previously, you had to pick one or the other, or you created a bunch of custom code that needed to be applied across each table and database in hope of getting both.
Now all that logic is built straight into CockroachDB.
How do super regions work?
Since 21.1, it has been possible to simply add something like a country code into a table and instruct CockroachDB to filter accordingly to dictate where data should physically reside.
Geo partitioning without super regions: In this sample architecture you have three nodes in the UK (eur-north-1), three in the US (us-east-1) and three in Australia (ap-southeast-2). The problem: with data domiciling, if you lose that UK region, one of two things must happen: either you lose access to everything in Europe, or you maintain availability by having data homed outside of a given region (and by doing so, you’d have no control over which region it gets placed in).
In an application using this sample architecture, you could try to comply with domiciling requirements or you could maintain availability of your data — but not both:
Geo partitioning now, with super regions: Super Regions lets us now have one node across three regions within the same continent. This gives regional resiliency because you can lose a single cloud region and still operate within the borders of that same continent, or even — depending how you configure your super region — in the general vicinity of the lost region. And it happens in a single line of SQL.
After applying super regions to our sample architecture, we are still spread globally across the US, Australia, and the EU. But now we have made a super region called “europe.” Instead of three nodes in Ireland, we now have one in Ireland (eur-north-1), one in Stockholm (eu-west-2), and one in Milan (eu-south-1).
Question: Do I want to share that one node across each of those separate regions within that continent, or should I actually have three nodes in each of those regions?
Answer: Ah, the “single nodes in multiple regions” vs “multiple nodes in single regions” conundrum. Ideally you want both: multiple nodes in multiple regions.
The ideal pattern for both performance and availability is still three nodes per region, so you can lose a node in the region without needing to perform reads from a remote region. It comes down to a cost benefit analysis: whether you want to pay for additional nodes to essentially guarantee node resiliency within a region — or whether you are happy with the latency involved and the performance of having one single node in multiple regions.
CockroachDB’s super regions capability makes life seriously easier by removing the requirement around granular placement of data in every single one of those regions. You can just say, I want these three regions to be within this super region “europe,” and then you pin data to “europe.” After that, CockroachDB does all the balancing automatically under the hood while still keeping you super resilient. The only thing you need to do is make a few schema changes — everything else is handled in the database.
Surviving regional failure while complying with data domicile requirements has never been easier than it is now with CockroachDB 22.1 and super regions.
Super regions use cases
The main benefit super regions deliver is resiliency within close proximity of a given region. Use cases that most benefit from super regions are:
Multinational businesses: Data domiciling regulations for multi-regional and multinational businesses that need to satisfy government regulations like GDPR that create strict rules on where data can live in the world. These localization regulations dictate how the data of a nation’s residents must be collected, cleaned, processed and stored, and set strict rules about transfers of that data to other nations.
Heavily regulated industries: Data domiciling is not just a problem for companies crossing borders. There are entire industries, like banking and online gambling/betting operators, that also face stringent data domiciling requirements. In the US, for example, sports betting in the US must comply with The Wire Act, which effectively mandates that all betting data must remain within the state lines of the location where the bet was placed. The Wire Act requires compliance with the patchwork of state laws allowing, limiting or outright banning online betting. (Really: have you ever tried to place your usual DraftKings wagers while on vacation in Hawaii? Can’t be done).
By the way: Super regions pair well with another new CockroachDB 22.1 capability: row-level TTL (now in preview). Time-to-live lets you set a lifespan for your data at the row level, and when that lifespan expires the data will be automatically deleted — no more writing custom application code for deletion logic. It’s now built straight into CockroachDB.
TTL is particularly useful for data domiciling use cases, since the same regulations requiring data locality typically have additional data management requirements. For example, one customer experience insights platform with users all around the globe is using CockroachDB 22.1’s TTL to delete non-anonymized data after three months, in order to comply with GDPR.
Get started with super regions
CockroachDB’s new super regions capability is part of the distributed database’s already powerful multi-region capabilities. If you are ready to try it in your application, here are resources to get you up and running.
Please note that super regions are still in preview. Also, because the capability is part of CockroachDB’s abstraction layer, super regions are available in CockroachDB Self-hosted enterprise and CockroachDB Dedicated (get started for free).
Resources
Multi-Region Capabilities Overview https://www.cockroachlabs.com/docs/stable/multiregion-overview
Tutorial & Demo: Low-Latency Reads and Writes in a Multi-Region Cluster
https://www.cockroachlabs.com/docs/stable/demo-low-latency-multi-region-deployment
Data Domiciling with CockroachDB