RocksDB Configuration and Initialization
RocksDB is a high-performance key-value store widely used in Kumo for local metadata, control-plane state, and small structured data. This document describes the configuration modules, native initialization workflow, column family considerations, and the impact of key configuration options.
1. Configuration Overview
RocksDB exposes several layers of configuration. Understanding them is crucial for predictable performance:
| Layer | Scope | Description |
|---|---|---|
| DB Options | Entire database | Global settings: parallelism, background jobs, WAL behavior, direct I/O, statistics. |
| ColumnFamilyOptions (CF Options) | Per column family | Compaction style, write buffer size, merge operators, compression, Bloom filters, Blob files. |
| TableOptions / BlockBasedTableOptions | Per CF, per table | SSTable layout: block size, cache policy, index/filter type, partitioned index filters. |
| WriteOptions | Per-write | WAL sync, asynchronous flush, durability guarantees. |
| ReadOptions | Per-read | Snapshots, cache usage, prefix seeks. |
| TransactionDBOptions | Transaction-enabled DB | Lock timeouts, custom mutex factories. |
Typical tuning targets:
- Write throughput →
WriteBufferSize,MaxWriteBufferNumber,Level0FileNumCompactionTrigger - Read latency →
BlockCache,FilterPolicy,BloomFilter,PartitionedIndexFilters - Disk usage →
Compression,TargetFileSizeBase,MaxBytesForLevelBase - Transactional guarantees →
TransactionDBOptions, lock timeout
2. Native RocksDB Initialization
The standard sequence to initialize a RocksDB instance is:
2.1 Prepare DB Options
#include <rocksdb/db.h>
#include <rocksdb/options.h>
rocksdb::Options db_options;
db_options.create_if_missing = true;
db_options.IncreaseParallelism(8); // background threads
db_options.max_background_compactions = 8; // compaction threads
db_options.use_direct_reads = false;
db_options.use_direct_io_for_flush_and_compaction = false;
db_options.statistics = rocksdb::CreateDBStatistics();
2.2 Prepare Column Family Options
Each column family may have specific options:
rocksdb::ColumnFamilyOptions cf_options;
cf_options.OptimizeLevelStyleCompaction();
cf_options.write_buffer_size = 128 * 1024 * 1024; // 128MB
cf_options.max_write_buffer_number = 4;
cf_options.min_write_buffer_number_to_merge = 2;
cf_options.level0_file_num_compaction_trigger = 4;
cf_options.compression = rocksdb::kLZ4Compression;
2.3 Prepare Table Options
rocksdb::BlockBasedTableOptions table_options;
table_options.block_size = 64 * 1024; // 64 KB
table_options.block_cache = rocksdb::NewLRUCache(8 * 1024 * 1024 * 1024ULL); // 8GB
table_options.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, false));
Assign table options to CF:
cf_options.table_factory.reset(
rocksdb::NewBlockBasedTableFactory(table_options)
);
2.4 Open the Database
rocksdb::DB* db;
std::vector<std::string> cf_names = {"default", "meta", "data"};
std::vector<rocksdb::ColumnFamilyDescriptor> cf_descs;
for (const auto& name : cf_names) {
cf_descs.emplace_back(name, cf_options);
}
std::vector<rocksdb::ColumnFamilyHandle*> cf_handles;
rocksdb::Status s = rocksdb::DB::Open(db_options, "/data/kumo/rocksdb", cf_descs, &cf_handles, &db);
if (!s.ok()) {
// handle error
}
2.5 Write and Read Options
rocksdb::WriteOptions write_options;
write_options.sync = true; // ensures WAL durability
rocksdb::ReadOptions read_options;
read_options.fill_cache = true; // utilize block cache
2.6 TransactionDB Initialization (Optional)
#include <rocksdb/utilities/transaction_db.h>
rocksdb::TransactionDBOptions txn_options;
txn_options.transaction_lock_timeout = 20000; // ms
txn_options.default_lock_timeout = 30000; // ms
rocksdb::TransactionDB* txn_db;
rocksdb::Status s_txn = rocksdb::TransactionDB::Open(
db_options, txn_options, "/data/kumo/rocksdb_txn", &txn_db
);
3. Column Family Considerations
- Default CF: Always required. Do not delete or rename.
- Multiple CFs: Useful to separate metadata, OLAP data, and write-heavy data.
- Option isolation: Each CF can have distinct compaction, compression, and table options.
- CF handles: Must be properly closed to avoid memory leaks.
for (auto handle : cf_handles) {
db->DestroyColumnFamilyHandle(handle);
}
delete db;
Tip: Initialize all column families before any writes to avoid runtime errors or inconsistent options.
4. Configuration Impact and Best Practices
- WriteOptions.sync = true ensures durability but slows down throughput.
- Block cache size balances memory usage vs read latency.
- Write buffer size and number control memtable flush frequency and compaction load.
- Compression per level affects disk space and read amplification.
- Level0 compaction triggers prevent stalls during burst writes.
- TransactionDB lock timeouts must be tuned based on workload concurrency.
Guidelines:
- Avoid excessive small SST files → tune
Level0FileNumCompactionTriggerandTargetFileSizeBase. - Use column-family separation to isolate workloads.
- For OLAP-heavy workloads, consider disabling Bloom filters to reduce CPU overhead.
- Monitor pending compaction bytes to prevent write stalls.
5. Summary
RocksDB initialization involves:
- Configuring DB-level options (parallelism, WAL, direct I/O).
- Configuring Column Family options (compaction, write buffers, compression).
- Configuring Table options (block size, cache, Bloom filters).
- Opening the database and all required column families.
- Optionally enabling TransactionDB for multi-key transactional semantics.
Careful tuning of these options is essential for achieving high throughput, predictable latency, and efficient resource usage in Kumo's key-value storage scenarios.