300-Node Clusters Now Supported in CockroachDB

As AI-driven and agentic applications push data platforms into new territory, data architects are increasingly forced to choose between correctness, simplicity, and scale. Today, we’re removing that tradeoff — announcing support for 300-node clusters with 2.2M tpmC and 1.2PB of data in CockroachDB v25.4.4 and beyond. On CockroachDB Cloud, we’re announcing support for 64 vCPU per node. Testing to this scale is important to ensure that we’re testing ahead of customer deployments, and this milestone does just that. We believe these tests represent the largest node-scale testing completed by any distributed SQL vendor.

And we’re just getting started.

From Resilience to Scale

Along with resilience, scale is a cornerstone of CockroachDB’s value proposition. Last year, we made major progress in defining and testing resilience with our Performance under Adversity benchmark. Resilience testing is an important part of helping people understand the value of distributed SQL, because we can show them the promise of active-active resilience, and the resulting zero-downtime-continued-operations even in the event of a failure as severe as a regional failure. However, while it was useful for messaging, it also became a north-star for our engineering teams who grew to constantly improve resilience as a result of it being quantified. One example, we saw really strong latency improvements, in particular between v25.1 and v25.2, as we dropped p90 latency by 90% from ~20ms to ~2ms.

We expect similar results from our engineering teams now that we’re measuring scale. In fact, that’s why it’s so important that we define a measure of scale in a way that’s meaningful. There are many ways to measure scale, including ways that we’ll explore in the future. For now, we’re starting with cluster size and data per cluster so we can inform our customers with clear recommendations, and drive engineering with clear measures of success.

Scale Testing

To validate this level of scale, Cockroach Labs conducted one of its most extensive test cycles to date on its latest 25.4 release. This cycle started with 4 million TPC-C warehouses with 500K active warehouses at any point of time. The test ran for ten days while layering on real-world operational stress including continuous backups, change data capture (CDC), online schema changes, disk stalls, network partitions and node restarts. It achieved 2.2M tpmC, 610K QPS with 9ms p90 latency at peak with 90% CPU. At 40-75% CPU utilization we achieved 820K tpmC. Each scenario was executed three times:

once to identify bottlenecks
once to confirm improvements
a final certification run to validate end-to-end support for the 300-node, 1PB configuration.

Some of the highlights include:

~610K QPS, which when compared to PUA on a 9-node cluster with 17K QPS shows that CockroachDB near linearly scales with the size of the cluster.
Compared to a previous run on 25.2, a run with the same amount of imported data on 25.4 took 30% less storage space than the previous run, as a result of introducing value separation and enhanced compression in 25.4.
Imports for this run on 25.4 were 2× faster compared to 25.1, assuring faster migrations and time-to-value for customers migrating to CockroachDB.
The test stood up to CockroachDB’s "Performance Under Adversity" promise during chaos testing (disk stalls, node restarts, network partitioning), backup/restore, CDC, and online schema changes, achieving performance consistent to the baseline 820K tpmC, 225K QPS.
ADD COLUMN across 120 B rows completed without regression, proving data agility for evolving business at massive scale.
330TB backup completed in 2 hours and 40 min with no impact on foreground traffic.
6 concurrent changefeeds stayed caught up with no impact on foreground traffic.

Scale tests of this sort are non-trivial, expensive, and important. Our customers are running clusters of nearly this size, and we want to be in a position where we can positively recommend best practices to deploying and operating such large clusters.

CockroachDB Cloud

Along with this release, we’re also announcing support for 64 vCPU nodes cloud-wide. All customers will be able to self-serve and select these larger instance types if desired.

Future Plans

Scale is more than just the technical ability to scale clusters to such large sizes. It also includes observability, supportability, and support for multi-tenant workloads.

Near-term testing goals include 500-node clusters, 5PB of data per cluster, new QPS benchmarks across workload types, validation on 64 vCPU hosts, and a 400-billion-row, 200TB schema change. Over time, we will also introduce new measurement vectors—including regions supported, user and service account density, and tenants per cluster—to provide customers with clear, validated guidance for deploying CockroachDB at massive scale.

These measurements will give our customers clear guidelines on scaling their CockroachDB infrastructure to meet their data demands as AI and agentic applications expand in production.

Try CockroachDB Today

Spin up your first CockroachDB Cloud cluster in minutes. Start with $400 in free credits. Or get a free 30-day trial of CockroachDB Enterprise on self-hosted environments.

scalability