Skip to main content

Cloud Storage Overview for Kumo Stack

Kumo Stack supports integration with several cloud storage backends to store KV backups, SST files, snapshots, and other large datasets. The focus is on practical integration and operational guidance, not vendor recommendation.


1. Supported Cloud Storage Backends

CloudIntegration MethodNotes
AWS S3aws-c-sdk / aws-crt-cppFull support for object storage, multi-part upload, concurrent writes. Recommended for RocksDB SST backups.
Microsoft Azureazure-core-cpp / azure-storage-blobs-cppSupports object storage (Blobs), EventHub, IoT, KeyVault. Use C++ SDK for upload/download of KV snapshots.
Google Cloud Storagegoogle-cloud-cpp (storage-grpc)GRPC-based object storage interface. Suitable for SST and snapshot storage.
HDFSlibhdfs3Native C++ library for HDFS. Preferred for enterprise Hadoop environments, allows direct file upload/download.

All integrations use native C/C++ SDKs to ensure high performance, predictable behavior, and easy operational control.


2. Integration Patterns

  • KV Backup / RocksDB SST Files

    • Most platforms support direct SST upload.
    • Directory structure by environment/date recommended.
    • Minimize number of files in a single folder for better operational reliability.
  • Snapshots

    • Use platform SDKs to stream snapshot files to storage.
    • Prefer single snapshot file per operation for simpler restore.
  • Concurrency

    • Multi-threaded uploads are supported on all backends for large datasets.
    • Ensure SDK configuration matches throughput needs.

3. Operational Recommendations

  • Use native SDKs (aws-c-sdk, azure-storage-blobs-cpp, google-cloud-cpp, libhdfs3) rather than external CLI or FUSE mounts for production workloads.
  • Organize directories to avoid operational bottlenecks (too many files per folder).
  • For HDFS, use single large files rather than many small files to reduce NameNode overhead.
  • Validate restores on staging environments before production.

4. Use Case Summary

Use CaseRecommended StorageNotes
RocksDB SST BackupAWS S3, Azure Blob, GCSUse multi-part or gRPC upload, minimal CFs
Snapshot StreamingAll backendsSingle snapshot per upload, verify consistency
Long-term retentionHDFS, AWS S3Organize by date/environment, minimal directory depth
High-concurrency writesAWS S3, GCSMulti-threaded uploads, tune SDK thread pools

5. Summary

Kumo Stack currently supports all major cloud storage backends with native C/C++ SDKs, focusing on operational simplicity and maintainable backup strategies.

  • AWS S3: most mature, multi-part uploads.
  • Azure Blob: supports EventHub and KeyVault integration.
  • GCS: gRPC-based interface.
  • HDFS: native C++ file operations with libhdfs3.

Integration patterns are consistent: single-file SST uploads, organized directories, minimal column families, ensuring high performance and easy operational management.