Product
Rubbing control theory on the Go scheduler
For multi-tenant mixed-workload systems like CockroachDB, performance predictability and isolation are critical. Most forms of shared infrastructure approximate these properties, be it through physical isolation within data centers, virtualized resource limits, drastic over-provisioning, and more. For CockroachDB it’s not just about protecting latencies across workload/tenant boundaries, it’s also about isolation from the system’s internal/elastic work like LSM compactions, MVCC garbage collection, and backups, and also from user-initiated bulk work like changefeed backfills. For ill-considered reasons this is something they let me work on. Here we’ll describe generally applicable techniques we applied under the umbrella of admission control, how we arrived at them, and why they were effective. We’ll use control theory, study CPU scheduler latencies, build forms of cooperative scheduling, and patch the Go runtime. We hope for it to be relevant to most systems builders (and aspiring ones!), even if the problems motivating the work were found in this oddly-named database.
Irfan Sharif
December 15, 2022
System
Living without atomic clocks: Where CockroachDB and Spanner diverge
The design of CockroachDB is based on Google’s Spanner data storage system. One of the most surprising and inspired facets of Spanner is its use of atomic clocks and GPS clocks to give participating nodes really accurate wall time synchronization. The designers of Spanner call this “TrueTime”, and it provides a tight bound on clock offset between any two nodes in the system. This lets them do pretty nifty things! We’ll elaborate on a few of these below, but chief among them is their ability to leverage tightly synchronized clocks to provide a high level of external consistency (we’ll explain what this is). If someone knows even a little about Spanner, one of the first questions they have is: “You can’t be using atomic clocks if you’re building an open source database; so how the heck does CockroachDB work?”
Irfan Sharif
January 27, 2022
Engineering
From interns, with love: CockroachDB internship projects
While not exactly envious of our current crop of interns (because, you know, the whole work from home thing), I’ll admit I find myself reminiscing back to when I was one myself. I’m still surprised the engineering team let me anywhere near the stuff they did. When I first interned four years ago, we had declared a just code yellow to focus our energy towards stabilizing CRDB. Having joined the newly-formed distributed query execution1 team, but now with its focus directed elsewhere, what this meant for me was free rein to flesh out distributed hash and merge joins2, few aggregation primitives (think SUM, COUNT, DISTINCT, etc.), and some sorting algorithms.
Irfan Sharif
January 21, 2021