Blog
Engineering
SQLsmith: Randomized SQL testing in CockroachDB
Randomized testing is a way for programmers to automate the discovery of interesting test cases that would be difficult or overly time consuming to come up with by hand. CockroachDB uses randomized testing in many parts of its code. I previously wrote about generating random, valid SQL. Since then we’ve added an improved SQL generator to our suite called SQLsmith, inspired by a C compiler tester called Csmith. It improves on the previous tool by generating type and column-aware SQL that usually passes semantic checking and tests the execution logic of the database. It has found over 40 new bugs in just a few months that the previous tool was unable to produce. Here I’ll discuss the evolution of our randomized SQL testing, how the new SQLsmith tool works, and some thoughts on the future of targeted randomized testing.
Matt Jibson
June 27, 2019
Product
Query plan caching in CockroachDB
Since the 2.1 release, CockroachDB has had a cost-based optimizer. Rewriting a big component of an existing system is always challenging. In particular, we really wanted to avoid regressing on any workloads, including simple transactional queries where the existing planner was already generating the best plan. A cost-based optimizer inherently does more work and thus involves longer planning times. To mitigate this, we worked on caching and reusing optimizer state across multiple instances of the same query. For now, we chose a conservative path: only cache state from which we can still generate the best plan (the one we would have generated without caching), making caching invisible to the user. We’ll start with an overview of the stages of the query planning process; then we’ll go over the methods clients can use to issue queries against CockroachDB, and finally we’ll discuss the caching work we have done to speed up query planning.
Radu Berinde
June 20, 2019
Product
Vectorizing the merge joiner in CockroachDB
Everybody loves a fast query. So how can we make the best use of the existing information to make joins on sorted data faster? The answer is lies in vectorizing the merge join operator. Today we’ll be looking into what a merge joiner is (or what it used to be), followed by what vectorization means and how it changes the problem, and ending with how we decided to make the merge join operator faster and what this means for your queries.
George Utsin
June 18, 2019
Performance
Automatic table statistics in CockroachDB
Last year, we rebuilt our cost-based optimizer from scratch for CockroachDB’s 2.1 release. We’ve been continuing to improve the optimizer since then, and we’ve added a number of new features for the CockroachDB 19.1 release. One of the new features is automatic collection of table statistics. Automatic statistics enables the optimizer to make better decisions when choosing query plans.
Rebecca Taft
May 9, 2019
Product
Introducing CockroachDB 19.1
It’s been a little over four years since we started our mission to deliver an enterprise-ready distributed SQL database. Today, we’re excited to release CockroachDB 19.1. With this release, we enhanced distributed SQL capabilities, and expanded upon enterprise-grade features for security and data integrations. 19.1 continues to solve the challenge of complex, distributed SQL while meeting all the “enterprise” requirements that are expected of a database. Here’s Nate Stewart, our VP of Product, with a quick intro on what you can expect in CockroachDB 19.1. And for a deeper tutorial with Nate, register for our CockroachDB 19.1 webinar.
Performance
Why are my Go executable files so large?
This blog post was originally published on the author's personal blog. Overview I built some tooling to extract details about the contents of a Go executable file, and a small D3 application to visualize this information interactively as zoomable tree maps. Here’s a static screenshot of how the app illustrates the size of the compiled code, in this example for a group modules in CockroachDB:
Raphael Kena Poss
April 18, 2019
GDPR & Data Regulations
Where is data regulatory compliance worth the cost?
In 2016, LinkedIn chose not to comply with Russia's requirement for data to be stored locally. As a result, they were kindly blocked from doing business in the country. Facebook and Twitter, on the other hand, both decided that compliance in Russia is worth the effort. Neither has fully met Russia's requirements but they have shown enough progress to avoid being blocked.
Dan Kelly
March 19, 2019
System
The future of data protection law
GDPR went into effect less than a year ago. And still, the era of conducting global business with limited legislative obstructions already feels like some free-spirited, far away past. Right now the global landscape of data protection law is littered with obstacles and exceptions. GDPR has been the loudest but there are plenty of other regions and countries with regulations in place. Even within the E.U., countries like Germany and Switzerland have their own unique protection regulations. Russia and China have very draconian laws, and they're changing quickly. There are around 120 countries now with data protection laws in place.
Spencer Kimball
February 26, 2019
Product
Why we're switching to calendar versioning
One small step for Cockroach Labs, one giant leap for our release numbering. Since our initial launch, Cockroach Labs has used semantic versioning in our release cycle guidelines. Two years, one major release, and n-patch fixes later, we're making the switch to Calendar Versioning. This means subscribers to our release notes will see quite the jump in today's version numbering, from last week's 2.1.5 to today's 19.1 beta.
Peter Mattis
February 25, 2019