Logging for Detection and Response at Cockroach Labs

Modern applications rely on CockroachDB for workloads where resilience, correctness and availability are absolutely critical. Whether it’s being deployed for payment systems, identity provider systems, or transactional systems, one thing is certain: If the underlying database platform of these applications isn’t secure, nothing built on top of it can be truly secure, either.

For Cockroach Labs, security logging is not about collecting “as much as possible” and sending it to a Security Information Event Management (SIEM) system. It is about being intentional with our security logging architecture with the aim of capturing the right signals, enabling a proactive approach to event detection, and freeing our team to only investigate meaningful security events.

This post gives a high-level view of how Cockroach Labs’ security teams designed logging specifically for Detection and Response (D&R), how we prioritize log sources, and how we apply a Detection-as-Code philosophy. The result of this stringent process: Detection rules go through a thorough review and testing process similar to software development.

Why doesn’t every log make the starting lineup?

When we designed security logging for the distributed SQL database CockroachDB and its supporting cloud services, we started with this question:

What behaviors must we be able to see as they happen, and what evidence must we be able to reconstruct later, so that we can reliably detect, understand, and prevent meaningful security events?

That work is guided by these principles:

Logging exists to enable detection and incident response, not just to satisfy a checkbox.
We design what we log so that we can see important behaviors as they happen, and reliably reconstruct events later.
We collect logs that answer concrete questions about who did what, where, when, and how in a way that can be correlated across the environments where CockroachDB runs.
We accept that some logs are more valuable than others and prioritize those that give the most security value for their cost.

In practice, this meant we did not just turn on every log source and hope it would be useful later. Instead, we designed a logging architecture where each source has a clear purpose in the D&R pipeline, giving us sharper signals, less noise, and faster answers when we do need to investigate something.

Does every log get a seat at the table?

Not all logs are equal. Some are critical for spotting and understanding malicious activity, while others are more useful for narrow troubleshooting or audit use cases.

Internally, we use a risk-based approach to decide where to invest. For each log source, we explicitly look at both its proactive value, meaning what it lets us see and act on early, and its reactive value, meaning how it helps us investigate and learn from events after the fact. We determine these values based on several factors, including:

Impact on customer data and availability: Does this log help us see actions that could materially affect the confidentiality or integrity of customer data, or the availability of the services that process it, either as an early warning or as evidence during and after an incident?

Usefulness for detection: Does it map well to the kinds of attacks and misconfigurations we care about, including earlier-stage behaviors we want to detect before they become full incidents?

Usefulness for investigation: Does it carry enough context, such as who did what, where it happened, and which system was involved, to reconstruct what happened and understand scope?

Correlation potential: Can it be cleanly joined with other sources using stable identifiers so we can link related activity together for both proactive detections and incident investigations?

Timeliness and reliability: Can we receive it quickly and consistently enough to support near real time detection, and rely on it to be complete when we need to look back?

Compliance relevance: Some log types, for example certain audit trails, are mandatory for standards such as PCI-DSS or SOC 2. Those are non-negotiable.

Compliance-related logs that are required by standards are always collected. Beyond that, this lens helps us prioritize where we build deep integrations, enrichment, and detection rules, and where it is sufficient to retain logs primarily for forensics.

Preventing the Breach: Vulnerability Management at Cockroach Labs

What are the security signals we care about?

Without going into exhaustive detail, it is useful to look at the kinds of events that tend to generate important signals in a distributed SQL database environment. Two categories in particular stand out when we think about potential unauthorized access, changes to data integrity, and intentional disruption of availability.

1. Cloud audit activity tied to powerful changes

Cloud audit events can describe who changed what in the infrastructure around a database deployment in ways that a malicious actor could abuse. Examples include:

Modifying network paths or firewall rules in a way that exposes previously internal services
Changing or disabling security controls such as encryption settings, key usage, or access policies
Deleting or reconfiguring critical components in a way that could disrupt clusters or data access

These kinds of log events help investigators answer, “Did someone with the right (or wrong) permissions change the environment in a way that could impact the security or availability of customer data?”

2. Identity and privilege behavior that enables those changes

Identity and access logs show how people and services authenticate and gain privileges before they perform sensitive actions. They can reveal, for example:

Unusual sign-ins to privileged accounts or roles, especially from new locations or devices.
Rapid privilege changes (such as granting broad admin rights) followed by high-risk configuration changes.
Access patterns that do not match normal operational behavior for a given identity.

These log events are critical for answering, “Was this disruptive or risky change performed by the right identity, in the right way, or does it look more like an attacker preparing or executing an action?”

In our Detection-as-Code workflow, we treat such signals as building blocks: structured events that can be combined, reviewed, and tested like software, without publicly exposing the exact rule logic or full coverage map.

Detection-as-Code: Shipping Rules Like Features

Logs by themselves are just data points. The context around them – such as who did what, where, and under which conditions – is what turns them into useful evidence. Detection rules then codify how we use that evidence to distinguish benign behavior from suspicious or malicious activity.

Cockroach Labs leverages a Detection as Code (DaC) framework, where we treat detection rules like software features. They live in a version-controlled repository, go through review, and ship through an automated pipeline.

1. Start from an attack scenario

We begin with the security problem, not the rule syntax. We identify one or more attack vectors or misuse scenarios we care about (for example, risky privilege changes or disruptive configuration actions) and turn that into a candidate detection. A security engineer adds the rule logic, metadata (severity, category, owner), and example events, then opens a pull request in the rules repository.

2. Validate, test, and review

A CI/CD pipeline validates that the rule complies, follows our template, and passes basic checks. Sample events that should and should not match are run through the rule to confirm behavior. Reviewers then look at the change for security value and overlap: Is it tied to a real risk, scoped appropriately, and not a duplicate of an existing detection?

3. Deploy, monitor, and refine

Approved detection rules are deployed through the same pipeline, often starting in a lower-urgency or monitor-only mode. We watch alert volume and accuracy in real environments and tune or refine the rule through the same pull request process. The goal is that detection rules in production are reliable, understandable, and clearly owned.

4. Reuse for threat hunting

The same detection rules can also be run against historical log data to look for past instances of the same pattern, so improvements to a rule benefit both real-time alerting and retrospective threat hunting.

Taken together, our DaC approach keeps detection rules grounded in concrete threats. This ensures that when an alert fires it reflects logic we understand, have tested, and intentionally shipped.

All of this work, from how we choose what to log to how we design detection rules, is aimed at reducing the likelihood and impact of attacks by catching suspicious activity earlier and more accurately.

When an alert fires, responders typically see:

A detection rule that started from a potential attack scenario or risk-motivated need and has been reviewed and tested before deployment.
Log events enriched with identity, resource, and environment context.
Links to related activity from key security-relevant log sources, such as cloud audit and identity.
A runbook that outlines the usual investigative and containment steps.

That reduces the time spent figuring out what the alert is telling us and increases the time spent confirming impact, containing the issue, and learning from it. Logs give us the context and evidence we need to drive our detection rules. Subsequently, those rules produce reusable signals we can compose into more advanced coverage for real attack scenarios without flooding responders with noise.

What do these security protocols mean for CockroachDB users?

Here are the key points for CockroachDB users to know:

Our logging is built with a purpose. We choose and shape log data so that when something looks suspicious, these events are directly useful to the detection rules and investigations that follow, rather than just filling storage.
We treat detection rules as code, with review, testing, and controlled rollout, rather than unmanaged rules that no one owns.
The overall approach is designed to work consistently across the various cloud environments where CockroachDB runs, and to evolve as the product and threat landscape change.

Protecting your workloads is a responsibility that Cockroach Labs takes seriously every day. We design detection rules from real attack scenarios and risk-motivated needs, investing engineering effort into keeping signals sharp and noise low. By guiding our detection work with a DaC philosophy, CockroachDB’s logging is resilient, performant, and as consistent as possible across the major clouds.

Ready to learn how Cockroach Labs builds security signals you can trust at scale? Talk to an expert.

Munir Jaber is a Staff Security Engineer at Cockroach Labs, working across cloud and infrastructure security, application security, and detection and response. He led the initial design and build-out of the company’s security detection and response program, establishing Detection-as-Code as a core practice, and continues to work across the company on broader security engineering efforts.

Logging for Detection and Response: How We Build Security Signals at Cockroach Labs

Why doesn’t every log make the starting lineup?

Does every log get a seat at the table?

Related

What are the security signals we care about?

1. Cloud audit activity tied to powerful changes

2. Identity and privilege behavior that enables those changes