HNSWLIB (Hierarchical Navigable Small Worlds Library) Overview

Repository: https://github.com/nmslib/hnswlib
Language: C++ (header-only core, Python bindings available)

Purpose & Use Cases

HNSWLIB is a lightweight, minimalistic library dedicated to the pure implementation of the Hierarchical Navigable Small Worlds (HNSW) algorithm – extracted from NMSLIB to focus on raw query speed and minimal overhead. It supports incremental vector insertion (unlike pure offline engines) but is NOT optimized for high-frequency real-time dynamic workloads (e.g., sub-millisecond insertion, high-concurrency writes). It is suitable for:

Approximate nearest neighbor (ANN) search over large vector datasets (millions to billions of vectors).
Scenarios with low-to-medium frequency incremental insertion (e.g., hourly/daily batch updates, not real-time single-vector writes).
Resource-constrained environments (edge/embedded systems) due to its header-only, dependency-free C++ core.
HNSW hyperparameter tuning (M, ef_construction, ef) for research/prototyping (no extra NMSLIB complexity).

Typical applications:

Semi-dynamic embedding retrieval (e.g., product embedding libraries updated daily).
High-speed batch feature matching for image/video analytics.
Edge-side vector search (e.g., on-device face recognition with occasional feature updates).

Algorithms Supported

Hierarchical Navigable Small Worlds (HNSW) – the only native algorithm (no brute-force/VP-tree/hybrid alternatives).

Configurable hyperparameters:
M: Number of bi-directional links per node (balances index size/query speed).
ef_construction: Index build accuracy (higher = better recall, slower build).
ef: Query-time accuracy (higher = better recall, slower query).

Core Technical Specifications

1. Supported Metric Spaces (Official Implementation Only)

HNSWLIB has narrow, strict support for 3 metric spaces – no non-metric (Jaccard/Hamming/Edit Distance) or other metric (L1/L∞) support:

Metric Type	Full Name	C++ Identifier	Python Parameter	Use Case
L2	Euclidean Distance	`hnswlib::L2`	`'l2'`	General numerical vectors (image embeddings).
Cosine	Cosine Similarity	`hnswlib::COSINE`	`'cosine'`	Text embeddings (vectors auto-normalized).
Inner Product	Dot Product	`hnswlib::IP`	`'ip'`	Normalized vector similarity (equivalent to cosine for unit vectors).

2. Supported Data Types (Official Implementation Only)

HNSWLIB ONLY supports 32/64-bit floating-point vectors – NO Float16 (FP16), integer, or binary type support (no official or stable community workarounds):

Data Type	Precision	C++ Type	Python Binding Mapping	Support Status	Key Note
Float32	32-bit	`float`	`numpy.float32`	Fully supported (optimal performance)	Official recommended type (SIMD optimized).
Float64 (Double)	64-bit	`double`	`numpy.float64`	Fully supported (slower)	Only for high-precision scenarios.
Float16 (FP16)	16-bit	-	`numpy.float16`	Not supported	Input throws error; manual conversion to FP32 has no FP16 benefits.
Integer/Binary	-	-	-	Not supported	No native handling; custom code is unmaintained.

3. Dynamic Data Operations (Insert/Delete/Modify)

HNSWLIB’s dynamic capabilities are limited to incremental insertion – deletion/modification are NOT supported (no official API):

Operation	Support Status	Critical Limitations
Incremental Insertion	Supported (non-real-time optimized)	- Insert latency increases with index size (0.1ms/vector for small indexes → 10ms+/vector for 100M+ vectors). - No async/batch insertion API (high-concurrency writes trigger lock contention). - Max sustainable QPS: ~500 (vs 10k+ for real-time engines).
Deletion	Not supported	No API to delete vectors; "soft delete" (filter post-query) bloats index and degrades performance.
Modification	Not supported	Must re-insert updated vectors (no in-place modification); inherits insertion limitations.

Characteristics

Feature	Description
Incremental Updates	Incremental insertion supported (non-real-time); deletion/modification not supported.
Query Speed	Industry-leading for HNSW (faster than NMSLIB’s HNSW due to minimal overhead); Float32 > Float64.
Index Type	Pure graph-based (HNSW only; no tree/hybrid structures).
Scalability	Handles 100M+ vectors (disk-backed storage); memory footprint ~30-50% lower than NMSLIB.
Language Bindings	C++ (full feature set); Python (only core features – no custom distance functions).

Notes

HNSWLIB supports incremental insertion but is NOT a real-time engine – it lacks write buffers, async flushing, and distributed write support.
For semi-dynamic workloads (hourly/daily updates), batch insertion (1k+ vectors) is recommended to minimize latency.
Float32 is the only type for optimal performance – Float64 is 2-3x slower with no meaningful precision gain for most embeddings.
No non-metric space support – use NMSLIB if Jaccard/Hamming/Edit Distance is required.
Real-time alternative: Pair HNSWLIB (static historical data) with Milvus/PGVector (real-time hot data) for dynamic workloads.

Purpose & Use Cases​

Algorithms Supported​

Core Technical Specifications​

1. Supported Metric Spaces (Official Implementation Only)​

2. Supported Data Types (Official Implementation Only)​

3. Dynamic Data Operations (Insert/Delete/Modify)​

Characteristics​

Notes​