RocksDB Overview

RocksDB is a high performance, embeddable key-value store designed for fast storage, large datasets, and efficient range queries. It is widely used in distributed systems for both OLTP and analytical workloads.

1. Column Family Support

RocksDB supports multiple Column Families (CFs) for logical separation of data.
Each CF has its own WAL, memtable, and SST files.
Recommendation: Minimize CFs where possible; each CF adds memory and compaction overhead.

2. Handling Large Data Volumes

RocksDB is optimized for terabytes of data.
Data is organized in an LSM-tree structure:
Memtables buffer writes in memory.
SSTables store immutable files on disk.
Compactions merge SST files to maintain read efficiency.
Efficient for workloads with heavy writes and sequential or prefix-based scans.

3. Snapshot and Backup Strategies

Snapshot-based backup
Lightweight, point-in-time view of DB.
Used for replication and catching up follower nodes.
Minimal I/O overhead.
File-based backup (Checkpoint / BackupEngine)
Durable copy of SST and WAL files.
Useful for disaster recovery or migration.
Recommendation: Use snapshot for replication, file-based backup for full persistence. Minimize CFs to reduce backup complexity.

4. Key and Prefix Design

Fixed-length prefix keys improve range scan performance.
Design keys so frequently scanned ranges share a common prefix.
Example:
Prefix: RegionID + EntityType
Suffix: Timestamp or unique ID
Avoid variable-length prefixes in hot paths, as they reduce prefix indexing efficiency.

5. Typical Performance Metrics (SSD-based)

These numbers are empirical references from medium-to-large deployments, with RocksDB tuned for batch writes and prefix scans:

Metric	Typical Value (SSD/NVMe)	Notes
Write throughput (random writes)	`50k–200k ops/sec` per DB instance (`16–32 MB` memtable)	Depends on write batch size
Write amplification	`2–5x`	With tuned compaction and CF count ~1–2
Read throughput (point lookup)	`100k–500k ops/sec`	Using 8–16 GB block cache
Read throughput (prefix scan)	`200–800 MB/sec`	With fixed prefix keys
Compaction I/O	`100–400 MB/sec`	Tuned via `level0_file_num_compaction_trigger` and `max_bytes_for_level_base`
Latency (write)	`~0.5–2` ms	Depends on WAL sync policy
Latency (read, cached)	`<0.1` ms	Cached in block cache
SST file size	64 MB (default)	Tunable via `target_file_size_base`
Max DB size	`~10–100` TB+	Depends on hardware and LSM tuning

Notes on tuning:

SSD/NVMe storage is essential for predictable write performance.
Prefix-based fixed keys allow memtable prefix bloom and reduce disk seeks.
Backup and snapshot strategy affect I/O; prefer incremental snapshots for high-frequency backups.

1. Column Family Support​

2. Handling Large Data Volumes​

3. Snapshot and Backup Strategies​

4. Key and Prefix Design​

5. Typical Performance Metrics (SSD-based)​

1. Column Family Support

2. Handling Large Data Volumes

3. Snapshot and Backup Strategies

4. Key and Prefix Design

5. Typical Performance Metrics (SSD-based)