RocksDB Backup and Restore

This document explains how to perform backup and restore in RocksDB, including snapshot-based and file-based (checkpoint) approaches, their use cases, and examples for syncing replicas.

1. Overview of Backup Methods

RocksDB provides two primary backup methods:

Snapshot-based Backup
- Uses RocksDB's internal SST snapshots.
- Captures a consistent point-in-time view of the database.
- Typically implemented by traversing column families and exporting SSTs.
- Useful for replication, incremental backup, and fast restores.
File-based Backup (Checkpoint)
- Uses RocksDB checkpoint API to create a copy of the database files.
- Produces a consistent directory snapshot of the entire database.
- Easier to use than manual SST export, but typically larger and less flexible for selective restore.

2. File-based Backup Using Checkpoints

RocksDB provides rocksdb::Checkpoint for creating backups at the file level.

2.1 Creating a Checkpoint

#include <rocksdb/checkpoint.h>

rocksdb::Checkpoint* checkpoint;
rocksdb::Status s = rocksdb::Checkpoint::Create(db, &checkpoint);
if (!s.ok()) {
    // handle error
}

s = checkpoint->CreateCheckpoint("/path/to/backup_dir");
if (!s.ok()) {
    // handle error
}

delete checkpoint;

Produces a directory containing all database files, including:
MANIFEST
LOG files
All SST files for all column families
Ensures point-in-time consistency.

2.2 Restoring from Checkpoint

Copy the checkpoint directory to a new location.
Open a new RocksDB instance using the checkpoint path:

rocksdb::DB* new_db;
rocksdb::Options options;
s = rocksdb::DB::Open(options, "/path/to/backup_dir", &new_db);

All CFs and data are restored exactly as they were at checkpoint creation.

3. Backup vs. Restore with Snapshots

3.1 Creating Backup via Snapshots

As shown previously, snapshot backup involves exporting SSTs per column family.
Useful for incremental backup and replication.
Can selectively backup specific CFs.

3.2 Restoring Backup via Snapshots

for (const auto &cf_name : snapshot_order) {
    auto cf_handle = db->GetColumnFamilyHandle(cf_name);
    db->IngestExternalFile(cf_handle, sst_files_for_cf[cf_name], ingest_options);
}

Must respect original SST ingestion order.
Can merge multiple SST backups into an existing DB.

4. Catching Up a Replica Using Snapshots

For replication or secondary nodes catching up:

Take a snapshot of the primary node (or upstream DB).
Transfer the SSTs to the replica node.
Ingest SSTs in the same column family order.
Resume normal replication using WALs or transaction logs.

// Example: applying snapshot to a follower
for (const auto &cf_name : snapshot_order) {
    auto cf_handle = follower_db->GetColumnFamilyHandle(cf_name);
    follower_db->IngestExternalFile(cf_handle, sst_files_for_cf[cf_name], ingest_options);
}

Ensures the replica has exactly the same state as the snapshot point.
Useful for fast bootstrap or catching up lagging replicas.

5. Comparison of Backup Methods

Feature	Snapshot Backup	File-based Checkpoint
Granularity	Column-family level, can be selective	Whole DB only
Size	Smaller if incremental	Typically larger
Restore flexibility	Can merge into existing DB	Must restore whole DB
Consistency	Point-in-time per SST	Atomic point-in-time for entire DB
Use cases	Replication, incremental backup, selective restore	Full backup, disaster recovery, migration
Complexity	Higher, requires tracking SSTs and CF order	Lower, just copy directory

6. Best Practices

Use snapshots for:

Replication
Incremental backup
Selective CF restore

Use checkpoints for:

Full database backup
Disaster recovery
Moving or migrating the database

Tracking backup metadata

For snapshots, maintain:
CF names
SST file list
Ingestion order

Apply backups in order

Critical for consistency, especially when restoring multiple CFs.

Verify backup

Always use iterators to check data integrity after restore.

1. Overview of Backup Methods​

2. File-based Backup Using Checkpoints​

2.1 Creating a Checkpoint​

2.2 Restoring from Checkpoint​

3. Backup vs. Restore with Snapshots​

3.1 Creating Backup via Snapshots​

3.2 Restoring Backup via Snapshots​

4. Catching Up a Replica Using Snapshots​

5. Comparison of Backup Methods​

6. Best Practices​