Skip to main content

RocksDB Configuration and Initialization

RocksDB is a high-performance key-value store widely used in Kumo for local metadata, control-plane state, and small structured data. This document describes the configuration modules, native initialization workflow, column family considerations, and the impact of key configuration options.


1. Configuration Overview

RocksDB exposes several layers of configuration. Understanding them is crucial for predictable performance:

LayerScopeDescription
DB OptionsEntire databaseGlobal settings: parallelism, background jobs, WAL behavior, direct I/O, statistics.
ColumnFamilyOptions (CF Options)Per column familyCompaction style, write buffer size, merge operators, compression, Bloom filters, Blob files.
TableOptions / BlockBasedTableOptionsPer CF, per tableSSTable layout: block size, cache policy, index/filter type, partitioned index filters.
WriteOptionsPer-writeWAL sync, asynchronous flush, durability guarantees.
ReadOptionsPer-readSnapshots, cache usage, prefix seeks.
TransactionDBOptionsTransaction-enabled DBLock timeouts, custom mutex factories.

Typical tuning targets:

  • Write throughputWriteBufferSize, MaxWriteBufferNumber, Level0FileNumCompactionTrigger
  • Read latencyBlockCache, FilterPolicy, BloomFilter, PartitionedIndexFilters
  • Disk usageCompression, TargetFileSizeBase, MaxBytesForLevelBase
  • Transactional guaranteesTransactionDBOptions, lock timeout

2. Native RocksDB Initialization

The standard sequence to initialize a RocksDB instance is:

2.1 Prepare DB Options

#include <rocksdb/db.h>
#include <rocksdb/options.h>

rocksdb::Options db_options;
db_options.create_if_missing = true;
db_options.IncreaseParallelism(8); // background threads
db_options.max_background_compactions = 8; // compaction threads
db_options.use_direct_reads = false;
db_options.use_direct_io_for_flush_and_compaction = false;
db_options.statistics = rocksdb::CreateDBStatistics();

2.2 Prepare Column Family Options

Each column family may have specific options:

rocksdb::ColumnFamilyOptions cf_options;
cf_options.OptimizeLevelStyleCompaction();
cf_options.write_buffer_size = 128 * 1024 * 1024; // 128MB
cf_options.max_write_buffer_number = 4;
cf_options.min_write_buffer_number_to_merge = 2;
cf_options.level0_file_num_compaction_trigger = 4;
cf_options.compression = rocksdb::kLZ4Compression;

2.3 Prepare Table Options

rocksdb::BlockBasedTableOptions table_options;
table_options.block_size = 64 * 1024; // 64 KB
table_options.block_cache = rocksdb::NewLRUCache(8 * 1024 * 1024 * 1024ULL); // 8GB
table_options.filter_policy.reset(rocksdb::NewBloomFilterPolicy(10, false));

Assign table options to CF:

cf_options.table_factory.reset(
rocksdb::NewBlockBasedTableFactory(table_options)
);

2.4 Open the Database

rocksdb::DB* db;
std::vector<std::string> cf_names = {"default", "meta", "data"};
std::vector<rocksdb::ColumnFamilyDescriptor> cf_descs;

for (const auto& name : cf_names) {
cf_descs.emplace_back(name, cf_options);
}

std::vector<rocksdb::ColumnFamilyHandle*> cf_handles;
rocksdb::Status s = rocksdb::DB::Open(db_options, "/data/kumo/rocksdb", cf_descs, &cf_handles, &db);
if (!s.ok()) {
// handle error
}

2.5 Write and Read Options

rocksdb::WriteOptions write_options;
write_options.sync = true; // ensures WAL durability

rocksdb::ReadOptions read_options;
read_options.fill_cache = true; // utilize block cache

2.6 TransactionDB Initialization (Optional)

#include <rocksdb/utilities/transaction_db.h>

rocksdb::TransactionDBOptions txn_options;
txn_options.transaction_lock_timeout = 20000; // ms
txn_options.default_lock_timeout = 30000; // ms

rocksdb::TransactionDB* txn_db;
rocksdb::Status s_txn = rocksdb::TransactionDB::Open(
db_options, txn_options, "/data/kumo/rocksdb_txn", &txn_db
);

3. Column Family Considerations

  • Default CF: Always required. Do not delete or rename.
  • Multiple CFs: Useful to separate metadata, OLAP data, and write-heavy data.
  • Option isolation: Each CF can have distinct compaction, compression, and table options.
  • CF handles: Must be properly closed to avoid memory leaks.
for (auto handle : cf_handles) {
db->DestroyColumnFamilyHandle(handle);
}
delete db;

Tip: Initialize all column families before any writes to avoid runtime errors or inconsistent options.


4. Configuration Impact and Best Practices

  • WriteOptions.sync = true ensures durability but slows down throughput.
  • Block cache size balances memory usage vs read latency.
  • Write buffer size and number control memtable flush frequency and compaction load.
  • Compression per level affects disk space and read amplification.
  • Level0 compaction triggers prevent stalls during burst writes.
  • TransactionDB lock timeouts must be tuned based on workload concurrency.

Guidelines:

  • Avoid excessive small SST files → tune Level0FileNumCompactionTrigger and TargetFileSizeBase.
  • Use column-family separation to isolate workloads.
  • For OLAP-heavy workloads, consider disabling Bloom filters to reduce CPU overhead.
  • Monitor pending compaction bytes to prevent write stalls.

5. Summary

RocksDB initialization involves:

  1. Configuring DB-level options (parallelism, WAL, direct I/O).
  2. Configuring Column Family options (compaction, write buffers, compression).
  3. Configuring Table options (block size, cache, Bloom filters).
  4. Opening the database and all required column families.
  5. Optionally enabling TransactionDB for multi-key transactional semantics.

Careful tuning of these options is essential for achieving high throughput, predictable latency, and efficient resource usage in Kumo's key-value storage scenarios.