blog-banner

Always On at Nexo: Assuring Resilience and Scale with CockroachDB

Last edited on January 22, 2025

0 minute read

    When crypto prices rise, the fortunes of this dynamic sector’s service providers also increase. However, industry upswings have a hazard: Rapid growth can mean outgrowing your database. 

    That’s the dilemma that was facing Nexo, a premier digital assets wealth platform that empowers its clients to grow, manage, and preserve their crypto holdings. As the digital assets market matured and client expectations escalated, Nexo discovered how limited the reliability and scalability of their legacy database had become. 

    With crypto competition intensifying, Nexo’s engineering team saw they were hitting the wall. To realize their full potential and assure an always-on experience for their customers, they knew they had to make a change. The criteria to level-up their database was clear:

    • increase platform availability

    • provide horizontal scalability for read and write traffic through native sharding capabilities 

    • assure zero downtime during server version updates and maintenance operations 

    The answer was CockroachDB, the distributed SQL database created to enable global scale and unmatched resilience.

    Growing Nexo: $320B+ in transactions & credit issued Copy Icon

    Nexo has evolved into a major player in digital assets since its 2018 launch, building a reputation for driving customer wealth with tailored solutions that create long-term value. Backed by 24/7 client care, just a few of the Nexo platform offerings include flexible and fixed-term savings, crypto-backed loans, advanced trading tools, and liquidity solutions through the first debit/credit crypto card. 

    Their innovative business model is backed up by Nexo’s deep industry expertise, a sustainable business model, robust infrastructure, security, and global licensing. The proof is in the payout of $1 billion in interest, with $320 billion in transactions and collateralized credit issued to its users since 2019 – all executed with a steadfast commitment to compliance and security. 

    Modernizing a monolithCopy Icon

    Powering this progress behind the scenes is deep data infrastructure. A resilient database is a critical component as Nexo strives for seamless daily operations and a high level of customer satisfaction.  

    “We store users’ transactions, so reliability and consistency are essential for the products we offer,” says Nikolay Dobromirov, Engineering Director for Nexo. “Consistency in this scenario can not be sacrificed.”

    As Nexo’s user base expanded, however, its original legacy relational database became overwhelmed, setting off a domino effect of issues that hindered every corner of the company. “A lot of the workload was hitting the primary server in the cluster – a lot of the time,” Dobromirov says. “As reliability and consistency started degrading, our non-functional requirements and commitments for availability, latency and operational efficiency – both for adding new features and maintaining existing ones – started to suffer.”

    Nikolay Dobromirov Engineering Director Nexo 900

    Nikolay Dobromirov is Engineering Director for Nexo.

    Before they migrated to CockroachDB, Nexo had been using a managed replication setup of MySQL in AWS RDS for all user financial data. “This choice was made due to the presence of transactions and the relationships SQL provides,” explains Dobromirov, “as well as the data types to work with financial data arithmetic directly in the database, so aggregations and similar workloads could be performed directly on the database. 

    Nexo’s monolithic system was responsible for everything – not just user transactions. This would eventually cause Nexo to experience database issues that affected more than just the ledger. 

    “The ledger system has a small set of long tables reaching into the billions of records and terabytes of storage,” Dobromirov says. “Having millions of users earning crypto rewards on over 50 different assets daily was causing an explosion in the transactions Nexo needed to handle in our database.” 

    “We are an extensive NoSQL user for non-transactional workloads,” he continues. “However some of those were also related to the user transactions to keep custom metadata per transaction, and allow for an extendable schema to keep such information that is often vendor- or domain-specific.”

    “Over time, the system grew to such a size that it was operationally impossible to work with. Times for backup restores were getting prohibitively slow to consider any reasonable SLA. Meanwhile, time to run BI and data-intensive queries reached a point where dedicated BI replicas were designated for reporting purposes only, and moved outside the user-facing part of the cluster.”

    To make matters worse, user-facing queries were degrading in latency and critical back-office sub-systems were reaching time limits with little to no wiggle room left – or stopped entirely. Nexo’s costs were only increasing, and the workload was reaching the hardware limits of their AWS RDS primary server. 

    The exhaustion was in CPU and Write I/O operations,” recalls Dobromirov. “We were running six replicas in the cluster to bear the read workloads and allow for spare capacity to handle user traffic spikes. The vertical size of the cluster we were using was the second-to-last option available in RDS at the time: It was giving us a run-rate of 6 to 12 months, based on the growth rates we were experiencing, before we hit the upper limit of the available AWS hardware.” 

    Alter tables were impossible to run on the server with no downtime,” Dobromirov adds, “and the needed downtime for such operations was reaching multiple hours of runtime for adding a column. In the end we stopped and prohibited schema modifications, unless it was unavoidable.”

    The zero downtime solution: CockroachDB Copy Icon

    Clearly, this could not go on if Nexo aimed to keep building its client base – or even remain operational. Dobromirov and his colleagues mapped out a new system that would accommodate their very necessary next-level database.

    “We decided to resolve most of these issues with a migration of the internal ledger to a dedicated microservice that would encapsulate the users’ transactions and all the related complexities,” he says. “The legacy monolith would integrate with that new microservice to be functionally compatible with the old system and decouple from the actual storage layer underneath. There would be a per-user live migration, with no downtime for our end users.” 

    visual-nexo-desktop-and-mobile-dashboard-row 900

    The Nexo engineering team had firm requirements for the new system’s database: It had to deliver high availability, provide horizontal scalability for read and write traffic through native sharding capabilities, while experiencing zero downtime during server version updates and maintenance operations. 

    An initial test run with AWS Keyspaces was a dead end, as its Apache Cassandra NoSQL database proved too slow for Nexo’s OLTP workload. Fortunately, Nexo’s next research round led them in the right direction, to CockroachDB.  

    We found CockroachDB and implemented a PoC with it in a matter of one sprint for the new system,” says Dobromirov. “The new microservices architecture we were aiming for had PostgreSQL as part of the recommended stack for simple workloads. With that in mind, integration with the application to CockroachDB was fairly straightforward on binary level through the PostgreSQL client bindings for Java. We had connection pooling working out of the box and all Java 17 (at the time) Hibernate primitives as well.”

    Just a few of the key advantages that Nexo experienced on switching to CockroachDB include: 

    • Support for live cluster updates with zero downtime, based on rolling restarts of the nodes in the cluster 

    • Support for native sharding of each table based on its primary key that is transparent on the application level

    • Support for horizontal and vertical scaling – in a matter of a couple of clicks in the cloud UI

    • Decimals with fixed precision - critical for working with financial data. 

    • Availability of a fully managed Infrastructure as a Service (IaaS) similar to AWS RDS

    • Write queries running consistently within Nexo’s constraints.

    • Support semi-structured data models in a transactional context natively.

    Resilience and stability from Day One Copy Icon

    Once Nexo had committed to CockroachDB, their data infrastructure progressed quickly. They finished the implementation of the new microservice in six months, and successfully migrated all of their users in three months with zero downtime or user impact. 

    Today, CockroachDB is a key component in Nexo’s continued growth, solving the myriad problems of their legacy database while achieving serious new efficiencies. “After stress testing, finishing the migration and reaching normal operational load, the new CockroachDB production cluster was handling the same users’ transactions data as the old solution without issues,” Dobromirov reports. 

    Meanwhile, Nexo’s system was now running with a significantly reduced hardware capacity. “We managed to shrink from a managed AWS RDS MySQL cluster with eight nodes (one main, six read replicas, one analytics), 16 vCPUs and 128GB RAM each to a CockroachDB Cloud cluster running three nodes, 16 vCPUs and 32GB RAM,” he says. “Even with the hardware reduced by over four times, the write capacity was increased by about 50% and the writes latency was consistent and stable at around 1 ms – about a 2X improvement. During the migration the new solution was handling millions of transactions ingested per hour, with stable operations in OLTP workloads during that time.”

    CockroachDB’s famed resilience and stability have been on full display from Day One at Nexo. “We did not experience DB-related outages in the last year for infrastructure reasons, thanks to the always-on implementation of the database,” Dobromirov says. “With CockroachDB, we moved to a stable, always-on database that will auto-update without affecting the system.” 

    “This lowers the planning and operational complexities around updates and maintenance operations,” he continues. “The new sharded system provides read and write horizontal scaling without the main-to-replica latencies and data inconsistency issues we had experienced before – all while reducing the operational costs for our database workload by 25-30%.”

    “The best tool for the job” Copy Icon

    With their database modernization executed to perfection, Nexo can accelerate its plans for digital asset leadership with complete confidence. 

    “We strive to use the best tool for the job, with the aim to standardize the tech stack around a small set of core technologies and train the whole development team on them – CockroachDB has currently found a place in that set,” Dobromirov concludes. “Nexo’s CockroachDB journey has just begun.” 

    Ready to learn how distributed SQL can bring unmatched resilience to your platform? Visit here to talk to an expert

    scalability
    resilience
    fintech