CockroachDB has lots of customers who’ve switched from Oracle and other legacy SQL databases, saving both money and time. Generally speaking, though, we can’t share their names or specific details about their internal processes and workloads.
So in this article we’ll be doing something a bit different. We wanted to tell the story of what it’s like for a finance company to transition from Oracle to CockroachDB. The names and quotes in this narrative are fictional, but the story itself is real. It comes from the things we’ve heard working with real-world customers in the finance industry – including Fortune 50 banks – who are moving transactional workloads off of Oracle and onto CockroachDB.
Day 1000
Laura Nash usually dreaded board meetings. She’d gotten into this career to build things, but apparently she’d been too good at that. Now, she mused, a big part of her job had somehow become translation. Code into English. Budgets into features. Architecture into business outcomes.
But today she had reason to smile.
“Ladies and gentlemen of the board,” she began, “I’ve got good news. We’ve reached phase four of our MYP, and all of our tier 0 applications are off of Oracle. In another couple of years, we anticipate being able to leave Oracle entirely, but in the interim, let me share with you some of the results we’ve seen so far.”
“The move to distributed multi-region architecture in both the application and database layers has significantly increased our operational resilience, and over the past year we’ve seen five-nines uptime across all of our tier 0 applications. As you can see–” she gestured towards the chart displayed on a screen at the front of the room – “this has led to a significant increase in customer satisfaction, and consequently, revenue.”
“We have also significantly reduced the risk of running afoul of regulations related to data use and storage,” she continued, “Our new system allows us to quickly geo-locate data wherever data domiciling regulations are passed. And we can roll out new regions quickly too, if need be.”
“Deploying to more regions has increased our infrastructure overhead, but labor costs associated with data management are actually down more than sixty percent because our new database solution automates most of the manual ops work we had to do previously. That has allowed us to increase feature velocity, and you’ll hear from Patrick in a few minutes about some of the exciting things that we’re doing on the product side with all of the time that we’re saving on DB ops and related development tasks.”
“This move towards localized data storage has also reduced latency for most of our customers, and that’s another factor driving the increased customer satisfaction we’re seeing,” she said.
“So, how did we get here? Next slide, please.”
Day 0
Shankar was nervous when Dr. Nash caught his eye and gestured for him to step into a small conference room. The end of the New York office’s weekly technical staff all-hands was a common time for impromptu catch-ups, but the CTO didn’t generally grab Shankar unless she had something significant to ask of him and his team. Or unless there was a major problem.
Had there been? Shankar wracked his brain trying to remember the last time the customer support chat had gone offline. Other than the regularly-scheduled maintenance downtime, nothing came to mind. If something was broken, he didn’t know about it. Which might make it even worse.
But as he shuffled into the room, he saw that Dr. Nash was smiling. “I have good news,” she said. “At least, I think it’s good news. The strategic MYP we’ve been talking about for years is here, and your team is going to run the pilot program for moving our transactional database workloads off of Oracle.”
“Finally!” said Shankar.
“Right?” Laura smiled even more broadly. “Listen, I chose the chatbot because I know this move is something you’ve been pushing for. Well, and because it’s important but not tier 0, so I can sell it to the board as an ideal test case. I know you’ve done some testing of Oracle alternatives, so what do you recommend?”
“Yeah, I’ve looked at a few different distributed SQL databases,” said Shankar. “If we’ve got the budget for it, I’d like to try benchmarking a few different ones before we actually make a commitment. Based on features, though, I’m most excited about CockroachDB.”
Dr. Nash laughed. “You’ve mentioned that before. Well, I guess you’ve mentioned a few different databases before, but that name is hard to forget! What’s got you excited about it?”
Shaktar had been making this pitch for several years, and knew it by heart at this point. Dr. Nash probably knew it, too, but without the go-ahead to move off of Oracle, there hadn’t been much she could actually do with the information. Until now.
“First,” Shaktar said, “it uses a multi-active system so all nodes can handle reads and writes, and they’re totally interchangeable–”
“Wait,” said Dr. Nash. “I’ve heard of active-passive and active-active, but what’s multi-active?”
“Oh, right. Yeah, that’s just a term they use. Basically it’s like active-active in that any database node can serve reads and writes, but the database also automatically maintains consistency between all those nodes.”
“How?”
“It uses the Raft consensus protocol. Raft organizes replicas of data into groups of followers and a leader, and then ensures that the followers remain consistent with the leader. If a node with a leader goes down, a follower on a different node is elected leader to maintain consistency and availability. As long as there are enough nodes to maintain consensus, the database won’t go offline.”
“So this is a resilience play? I think I can sell that upstairs,” said Dr. Nash. “I know the board has been reading more about operational resilience and they want to see us move in that direction.”
“Yeah, it would definitely increase our resilience significantly and reduce RTO and RPO to zero, or damn close to it,” said Shaktar. “Even when other parts of the application are down, the support bot would stay online thanks to CockroachDB, which I expect would boost our NPS scores. And we can make it as resilient as we want, it can handle multi-region and even multi-cloud right out of the box. And it’ll slot right into our existing data pipelines because it comes with built-in changefeeds, too.”
“Well,” said Dr. Nash, “I would certainly love to stop paying for GoldenGate. And all of the other line items that get tacked onto our Oracle bill. How does the multi-region work? Compliance is a big part of the MYP, and we’ll definitely need to add regions for some workloads to ensure we’re keeping up with the local data domiciling.”
“From what I can tell, it’s pretty easy. Deploying a cluster across multiple regions just requires a couple lines of SQL, and tables can have regional flags down to the row level. So like, with our chatbot database, we can keep a single records table but still ensure that a user’s specific chat records are stored in their region.”
“And we can self-host this if we want to? I know your workload is on AWS, but we still have a few we’d want to keep on-prem for now, if this all works out.”
“Yeah, you can run it on-prem or on any of the clouds, or hybrid- or multi-cloud. Basically, wherever we want it to be, it’ll be there.”
“OK,” said Dr. Nash. “but ultimately our charge with this MYP is to move all of our mission-critical workloads off of Oracle. Can CockroachDBt handle the scale of something like our IAM workload, or payments?”
“Well, that’s why I want to do the benchmarking, but I’m pretty sure it will. That’s the kind of workload it’s built for, and I know it’s already in production for a variety of workloads at least one other Fortune 50 bank.”
“Well, that sounds good enough to get started. Try to keep the benchmarking cost reasonable, but feel free to go ahead with it. Let me know who wins, and then we’ll start figuring out how to migrate your workload.”
Day 180
Darren spotted Shaktar across the cafeteria, and made his way over. The enterprise architect had been looking for a way to grab a minute with the head of the chatbot team all morning.
“Shaktar,” he said, absentmindedly shoveling salad onto a plate-less tray.. “Enjoying some salad, eh?”
Shaktar looked at Darren’s tray and chucked. “Darren my friend, you are terrible at small talk. What can I do for you?” He gestured toward an empty spot at a nearby table.
“So,” Darren said, sitting down and entirely ignoring his tray of salad, “how did you guys fare in the outage yesterday? I know you have a cluster on us-east-1a, just like we do, but I took a look at the telemetry data and I didn’t see much. How much data did you actually lose failing over to the cluster on the west?”
“None.”
“Yeah, that’s what I thought. So the Cockroach thing is actually working, huh?”
“I mean, so far so good,” said Shaktar. “We actually haven’t had a second of downtime since we finished the migration, although we did have a few performance hiccups early on.”
Darren leaned in. “And how was the migration?”
“Honestly? Not that bad.” Shaktar shrugged. “A migration is a migration, it’s never going to be totally painless. But CockroachDB has a migration tool for schema that helped us translate the Oracle schema into compatible schema for CockroachDB, and then the actual data we just moved over with AWS DMS. It’s Postgres-compatible, so it’s not really all that different.”
RELATED
Guide to migrating from Oracle to CockroachDB
“What about the integrations, like with Kafka and stuff. How much refactoring did you have to do?”
“Not as much as you’d think,” Shaktar said. “There was some work, of course, but on the application side, you can just treat CockroachDB like a single-instance Postgres database, so it’s pretty straightforward. And on the other side, it comes with changefeeds built in, so it wasn’t too difficult to get that plugged into Kafka. Again, CockroachDB is Postgres compatible, so it’ll work fine with most of the tools we use, I’d imagine. We haven’t run into any issues yet, at least.”
“Wait,” Darren said, “this is a distributed database, right? So what did you have to do to set that distribution up? How much time did it take you to build the data routing layer?”
“None. CockroachDB handles data routing automatically. And quite efficiently, as long as you follow their best practices.”
“And those performance hiccups you mentioned?”
This was beginning to feel a bit like an interrogation.
“Yeah, that’s what I meant about best practices. Initially we had a few queries that were unintentionally creating hotspots and slowing things down. The way it works with CockroachDB is a little different because every node can handle reads and writes, so it takes a little adjustment. But Cockroach has some good telemetry tools that helped us identify and fix those pretty quickly, and the Cockroach people were super helpful when we reached out with a tricky one.”
Darren frowned. “My god. Don’t call them ‘cockroach people.’”
“Hey, I didn’t name the thing!” Shaktar laughed. “Folks from a couple other teams have asked me about this, though. You’re actually the second person to ask me today after that AZ outage yesterday. I think I’m going to talk to Laura about setting up a lunch-and-learn about this, maybe next week.”
“Perfect. Shoot me an invite for that once it’s on the calendar,” Darren said, getting up to leave.
Shaktar briefly considered calling after him that he’d forgotten his tray of salad, but then just smiled and shook his head instead: architects.
Day 365
Laura Nash knocked twice on the wall of Darren’s office and stepped in. “Hey Darren, how’s the migration going?”
Darren shrugged. “Could be worse! Shaktar wasn’t kidding about CockroachDB. e’re still ironing out the kinks a bit but so far I’m pretty impressed.”
“That’s great. So, I know it’s a bit early, but what do you think? Is CockroachDB ready for primetime?”
Darren looked up. “For what workloads?”
“Well, obviously we’re going to wait and see how things go with your team here. And we’re not changing anything else until after the holidays. But after that, if things are still looking good, we were thinking about moving over metadata, and then maybe IAM.”
“So you really do mean primetime.”
“Well yeah,” Dr. Nash said. “Ultimately we want to move everything transactional off of Oracle, even the deposits and payments workloads. And obviously we need to be sure it works at scale first, but it’s been working for Shaktar’s chatbot. And, cards on the table, the price-for-performance we’re getting there is crazy. It’s like 20x what we were getting with Oracle. So if you’re seeing similar numbers, I want to put my foot on the gas a little bit.”
“Well,” Darren said, “like I said, it’s early days. But so far I think what we’re seeing is pretty comparable to what I’ve heard from Shaktar. The integration platform is a spikier workload, generally, so we’re still fine-tuning the scaleup/down parameters, but we’re already spending less than we did on Oracle for the performance we’re getting. And I expect that to get a lot better as we wrap up the migration.”
“Oh?”
“Yeah, I was talking to Shaktar the other day, and he said that ultimately one of the biggest advantages for him was just the amount of time that his team has gotten back. Once everything’s configured properly, the ops workload is so much less that the whole team has way more time to work on other things.” Darren paused. “At least, that’s what Shaktar says. We’ll see how it goes here. But I certainly wouldn’t mind my guys having more time to spend on features rather than ops.”
“Perfect,” said Dr. Nash. “Just keep me posted. The exec team has a meeting in a few weeks and if you’re still bullish on it, I want to get those metadata and IAM migrations cemented on the calendar once the holidays are over.”
“You got it,” Darren said. “Do you really think we’ll get off of Oracle, completely?”
“I think we actually might,” Dr. Nash said. “For the first time in 20 years, I’m actually seeing light at the end of the tunnel.”
If you relate to this story and are interested in making the switch, get in touch with Cockroach Labs experts here, or check out this guide on how to migrate from Oracle to CockroachDB.