After choosing CockroachDB as your next-generation storage system, you may wonder, “How do we move our organization’s data into CockroachDB?”
Migrating to a new database also includes migrating platforms and supporting services, application code, and your database (i.e., Schema, Queries, Data). No single tool is the solution for an entire application migration. This blog post focuses on database migrations and the tooling that enables it.
Historically, database migrations have been challenging. When migrating a database, there are many factors to consider, including managing downtime, schema compatibility, sharding, export methods, import methods, etc. Once you've created a migration plan, there's still work to do to determine the best tooling to leverage as part of the migration (or roll your own, if necessary).
The uncertainty around what to do next can quickly dash the excitement of discovering a database that provides consistency, geo-partitioning/multi-region capabilities, scalability, and resiliency. At Cockroach Labs, we're constantly exploring how we can make database migrations easier.
We focused our efforts on addressing the most common data migration tooling challenges:
Difficulty of setup and use
Lack of observability and ability to troubleshoot
Inability to recover from transient failure states
Lengthy migration times
In the following sections, we’ll cover these problems in more depth and discuss how MOLT Fetch, a user-friendly CockroachDB solution, can solve these challenges.
With these considerations in mind, we built MOLT Fetch to provide an opinionated and easy-to-use in-house solution for moving customers’ most critical workload data to CockroachDB. However, the landscape for our data movement story was not always so apparent. It’s worth revisiting the past to see how our data movement strategy has continued to evolve.
A history of migration tooling
Over the years, we’ve witnessed a range of strategies to tackle the data movement challenge that reflect the variety of data migration strategies, each with their pros and cons. Migration strategies are typically driven by customer requirements (both technical and business). Among these requirements, probably the most significant are:
How much downtime is allowed
How much data needs to be migrated
How similar the source and destination database schemas are.
Simple technologies can work for simple requirements. But customers increasingly need more sophisticated data migration technologies to meet their requirements. For these more demanding migrations, we found that existing technologies fell short in critical areas. These include observability, debuggability, restartability, data transformation, flexibility, and support for low downtime.
Some customers opted for custom solutions, utilizing the existing EXPORT to CSV and IMPORT INTO CockroachDB features. Others wrote batches of SELECT queries and executed a series of BULK INSERTS into their target database. While these methods may seem straightforward, they are only suitable for smaller data migrations. Moreover, each customer had to devise their own logic to address the same issue, highlighting the limitations of these approaches.
Additionally, there are complexities around type conversions, data formatting, and target schema preparation. These all add complexity and more surface area for bugs to fester. We asked ourselves, “How do we make this process more repeatable and flexible?”
Cockroach Labs’ R&D Migrations team extensively researched and evaluated third-party tools, ultimately identifying compatible options from various vendors, such as AWS DMS, Oracle Golden Gate, and Striim. This shift towards third-party tools eliminated the need for custom solutions and paved the way for a more repeatable approach. As a result, many of our customers have successfully migrated large-volume workloads to CockroachDB. Using third-party tools remains a viable option, especially for customers familiar with the tooling or those with strong relationships with the tooling’s vendor.
As we shepherded more customers onto supported third-party tooling, a key issue began to manifest for more complex migrations: the lack of observability and debuggability. Most, if not all, of these tools are closed-source and need explicit support for CockroachDB, making troubleshooting difficult. We sought explicit support for CockroachDB in these third-party tools and achieved this with several vendors. However, since many of these tools are closed-source, contributions can only be made by owners of the third-party tooling. This leads to additional administrative overhead to coordinate with other companies who may need to consider prioritizing CockroachDB support.
Furthermore, because troubleshooting is hampered by the closed-source nature of other solutions, providing a solution or fix proved to be non-deterministic in time since it would depend on the priorities and timelines of the vendor:
In the worst case, a customer would have to switch away entirely from specific third-party tooling.
In the best case, CockroachDB could fix the behavior on the database side to accommodate the third-party tooling, but the priorities of the CockroachDB database teams constrain this.
Last, many third-party tools don’t have a clear workflow for recovering from failures or continuing from a checkpoint during a migration, necessitating that a user restart an entire migration – even if it just failed during the last 10%-- which leads to a frustrating user experience and wasted time re-running a migration.
These considerations and challenges from our past experiences with data movement tooling were the spark for us to build MOLT Fetch.
MOLT Fetch: a data-movement tool tailor-made for CockroachDB
MOLT Fetch is the CockroachDB-focused data movement tool. Using MOLT Fetch, developers can bulk export (i.e., one-time migration) and stream (i.e., incremental migration) the source database to the target CockroachDB database.
Compared to the third-party migration tools mentioned above, MOLT Fetch is designed to provide easier configuration and better observability, debuggability, and compatibility with CockroachDB. Because Cockroach Labs owns MOLT Fetch, we can provide agile and nimble support for our customers' needs.
How MOLT Fetch simplifies data migrations
The logic of MOLT Fetch is elegantly simple – it is a standalone binary that:
First establishes a snapshot of the source database
Pulls the data out of the source
Stores the data in configurable intermediate storage, and
Finally pushes the data into the CockroachDB target database
Once this initial data load of the snapshot is finished, data changes made after the snapshot can optionally be streamed to the target database via the replication functionality. Eventually, the target database will catch up with the source database, leading to guaranteed consistency.
MOLT Fetch prioritizes easing the pain of a migrations journey. Benefits include:
Unique compatibility with CockroachDB: You never need to worry about data type inconsistency or rough edges from tools not purpose-built for CockroachDB.
Auto schema creation: There is no need to create the schema on the target database manually – MOLT Fetch can detect the target table's schema and auto-create the CockroachDB compatible schema.
Export with parallelism: If you have a table with hundreds of millions of rows that you want to speed up, you can leverage the always-on-by-default sharding feature to drastically reduce data export duration.
Continuation token: In the case where a lengthy import process fails in the middle, the user can resume from a checkpoint rather than restart the entire process.
Exception logs: If any error happens during the initial load, you can always view it in the exceptions table.
An easy-to-use data migration tool
We all know the pain of being overwhelmed by the massive number of settings when getting started with a migration tool, especially when we just want to test a simple data load. Even though MOLT Fetch enables granular customizations, we've implemented sane defaults, which make MOLT Fetch manageable to use.
The bare minimum you need to provide to MOLT Fetch is the connection URL to both the source and target databases, the storage option, and a table filter. Voila! MOLT Fetch will auto-detect the source dialect, create schemas, and handle data movement with the default settings.
Here is an example of moving data from a PostgreSQL source to CockroachDB using GCP as an intermediate data store:
./molt fetch --source 'postgres://postgres:postgres@127.0.0.1:5432/defaultdb?sslmode=disable' --target 'postgres://root@localhost:26257/defaultdb?sslmode=disable' --bucket-path 'gs://demo-bucket' –table-handling='drop-on-target-and-recreate' --table-filter 'mytable'
With MOLT Fetch you have the power to customize the advanced settings, enabling you to fully exploit its versatility for your specific use cases.
For the export phase, MOLT Fetch currently supports PostgreSQL, MySQL, and CockroachDB as the source database dialects (with more dialects support coming soon!). MOLT Fetch allows you to stage data on local file systems or standard object storage technology (e.g., AWS S3, GCP GCS), or in-memory as the intermediate storage.
For the data import phase, MOLT Fetch can leverage the two options provided by CockroachDB. The user can choose to ingest data with the fast IMPORT mode, which takes the target table offline, or the COPY mode, which keeps the target table online.
You can find these configurations and more in the MOLT Fetch documentation.
Your unique data migration plan
We understand that no migration is standard – each one is a complex and unique project. That's why Cockroach Labs is dedicated to providing tooling to make a seamless migration journey to CockroachDB easy. We're here to help you succeed!
Try MOLT Fetch today for your next migration to CockroachDB.
Ryan Luu is Member of Technical Staff, Cockroach Labs, Andy Woods is Director, Product Management, Cockroach Labs, and Jane Xing is Member of Technical Staff, Cockroach Labs.