Skip to main content

Key-Value Storage Overview

Key-Value (KV) storage is the foundation for many database and storage systems. Its choice significantly impacts performance, scalability, and operational complexity.

1. Types of KV Storage

KV stores can be roughly classified into:

  1. In-memory KV stores
  • Examples: std::unordered_map, folly::F14, custom hash tables.

  • Characteristics:

  • Extremely low latency (nanoseconds to microseconds).

  • No persistence unless combined with snapshotting or WAL.

  • No ordering guarantees unless explicitly added.

  • Use cases: caching, fast indexing, ephemeral state.

  1. Embedded on-disk KV stores (LSM-tree / B+Tree based)
  • Examples: RocksDB, LevelDB, LMDB.

  • Characteristics:

  • Persist data on disk.

  • Support large data volumes beyond RAM.

  • Provide ordered keys (prefix/range scans).

  • Support snapshots, backups, and transactions (to varying degrees).

  1. Distributed KV stores
  • Examples: TiKV, RocksDB+Raft, Cassandra (KV-like API).

  • Characteristics:

  • Scale horizontally across nodes.

  • Handle replication, failover, and consistency.

  • Often build on embedded KV engines internally.


2. Focused Comparison: RocksDB, LevelDB, LMDB

Feature / EngineRocksDBLevelDBLMDB
Storage typeLSM-treeLSM-treeB+Tree (memory-mapped)
Max DB size10–100 TB+~10 TB~16 TB (64-bit OS)
Write throughput50k–200k ops/sec (SSD)30k–100k ops/sec5k–50k ops/sec (disk-backed)
Read throughput100k–500k point/sec50k–200k point/sec50k–200k point/sec
Read range / scan200–800 MB/sec (prefix optimized)100–400 MB/sec100–400 MB/sec
Memory usageConfigurable memtable + block cacheMemtable + block cacheMapped entirely into address space
Persistence & durabilityWAL + SSTWAL + SSTMemory-mapped + sync
TransactionsSingle key atomic, batch writesSingle key atomic, batch writesACID, MVCC
Column family supportYesNoNo
Best suited forLarge-scale, high write workloads, prefix scan, multi-CFSmall-medium DB, embeddedRead-heavy workloads, ACID compliance, memory-mapped scenarios
NotesHeavy tunable options; suitable for hot LSM tuningSimpler; less configurable; limited compaction tuningVery low read latency; writes block if file system sync is slow

3. Key Operational Guidelines

For production deployment, KV engines have different operational considerations:

RocksDB

  • SST file management:

  • RocksDB writes LSM-tree SST files per column family.

  • Frequent compaction can produce many small SSTs; excessive SSTs can hurt read performance and increase disk usage.

  • Avoid having too many column families if not necessary; prefer fewer CFs and use key prefix for logical separation.

  • Backup & snapshot:

  • Prefer snapshot-based backups for incremental replication or quick point-in-time copies.

  • Full file-level backups (checkpointing SSTs) should be limited to avoid high I/O load.

  • Key design for range scan:

  • Fixed-length prefix keys improve prefix scan throughput.

  • Consider partitioning large datasets by key range to simplify compaction and backup.

  • I/O and memory tuning:

  • Adjust memtable size, block cache, compaction threads to match SSD I/O capabilities.

  • Monitor SST size distribution and write amplification.

LMDB

  • Single-file simplicity:

  • LMDB uses a single memory-mapped file per database (or per environment).

  • Simple to manage operationally; no compaction is required.

  • Read-heavy workload:

  • Optimal for mostly-read workloads; multiple readers can access concurrently without locking.

  • Write considerations:

  • Single-writer limitation means write bursts can block reads if the disk sync is slow.

  • Avoid placing multiple LMDB databases in the same file if high write concurrency is expected.

General Ops Notes

  • For small-scale KV usage, LMDB is simpler and less operationally demanding.
  • For large-scale or write-intensive workloads, RocksDB offers higher flexibility and tunability but requires monitoring of SSTs, CFs, and compaction cycles.
  • Use snapshots and incremental backups to limit I/O impact and maintain consistent backups without pausing writes.

This provides a full KV layer overview, including storage types, engine characteristics, benchmark data, usage scenarios, and operational guidance for RocksDB and LMDB.


If you want, I can also add a small diagram showing RocksDB SST file layout vs. LMDB single-file layout, which makes these operational points visually clearer.

Do you want me to do that?