NGT (Neighborhood Graph and Tree) Overview

Repository: https://github.com/yahoojapan/NGT
Language: C++ (Python bindings available)

Purpose & Use Cases

NGT is a vector similarity search engine designed for high-dimensional data. It is suitable for:

Approximate nearest neighbor (ANN) search over large datasets.
Scenarios where batch index building is acceptable.
Systems where insertion is occasional, rather than fully real-time dynamic updates.

Typical applications include:

Recommendation systems with offline embedding indexing.
Image or text feature retrieval for search or ranking pipelines.
Large-scale analytics where high-dimensional vectors need similarity queries.

Algorithms Supported

NSW (Navigable Small World graph) for fast approximate nearest neighbor search.
NGT-tree (combines graph and tree structures for indexing).
Supports metric spaces (detailed below) such as L2 (Euclidean) and cosine similarity.

Core Technical Specifications

1. Supported Metric Spaces

NGT supports a comprehensive set of metric and non-metric similarity measures, covering most common use cases for high-dimensional vector comparison:

Metric Type	Full Name	Use Case Scenarios
L2	Euclidean Distance	General numerical vector similarity (e.g., image feature vectors, dense embeddings).
Cosine	Cosine Similarity/Distance	Text embedding similarity (e.g., BERT embeddings), direction-based vector comparison.
L1	Manhattan Distance (City Block)	Sparse vector scenarios, feature difference aggregation (e.g., tabular data vectors).
L∞ (Chebyshev)	Chebyshev Distance	Maximizing feature difference (e.g., outlier detection in high-dimensional data).
Angle	Angular Distance	Pure direction similarity (ignores vector magnitude, alternative to cosine).
Hamming	Hamming Distance	Binary vector comparison (e.g., bitwise feature vectors, hash-based retrieval).
Jaccard	Jaccard Similarity/Distance	Set-based vector similarity (e.g., sparse boolean feature vectors).

Note: For non-metric spaces (e.g., Jaccard), NGT maintains approximate nearest neighbor search performance but does not guarantee strict metric axioms.

2. Supported Data Types

NGT optimizes for numerical vector data, with explicit support for the following scalar types (consistent across C++ and Python bindings, including FP16/float16 support):

Data Type	Precision	C++ Type	Python Binding Mapping	Use Case
Float16 (FP16)	16-bit	`float16_t`/`half`	`numpy.float16`	Memory-efficient high-dimensional vectors (e.g., large-scale embedding datasets, GPU-generated embeddings).
Float32	32-bit	`float`	`numpy.float32`	Default (balance of precision/speed/memory).
Float64 (Double)	64-bit	`double`	`numpy.float64`	High-precision scenarios (e.g., scientific computing).
Int32	32-bit	`int32_t`	`numpy.int32`	Discrete feature vectors (e.g., count-based embeddings).
Int64	64-bit	`int64_t`	`numpy.int64`	Large discrete value vectors.
Binary	Bit-level	`uint8_t` (packed)	`numpy.uint8` (bitpacked)	Hamming distance scenarios (space-efficient binary vectors).

Key Notes:

Float16 support: NGT natively supports 16-bit floating-point vectors (FP16/half-precision) in both C++ core and Python bindings. This is particularly valuable for reducing memory footprint in large-scale embedding systems (e.g., millions/billions of high-dim vectors) and aligning with GPU-generated embeddings (common in deep learning workflows).

Python bindings: numpy.float16 arrays are directly compatible with NGT’s insert/build_index APIs; no manual type conversion is required (NGT handles FP16 internally for indexing/querying).

Precision tradeoff: Float16 reduces memory usage by 50% vs Float32, with minimal precision loss for most embedding similarity search scenarios (recommended for production when precision requirements allow).

Binary vectors: Require explicit bitpacking via ngtpy utilities for optimal performance with Hamming distance.

3. Real-Time Data Insertion/Deletion Support

NGT’s support for dynamic data updates (insertion/deletion) is partial and non-real-time optimized, with clear limitations for high-frequency dynamic workloads:

Operation	Support Level	Details & Constraints
Incremental Insertion	Partial support (semi-real-time)	- Small-batch incremental insertion is allowed (e.g., hundreds/thousands of vectors at a time). - Inserted vectors require a `build_index`/`optimize` call to be fully integrated into the search graph (otherwise query performance degrades). - Not designed for high-frequency single-vector insertions (e.g., >1000 inserts/sec) – high latency and index fragmentation will occur.
Real-Time Deletion	Limited support (non-real-time)	- Deletion of vectors is possible via `remove` API, but requires a full index rebuild/optimization to clean up deleted entries from the graph structure. - Deleted vectors are marked as "invalid" in the index (not immediately purged), leading to gradual memory bloat. - No native support for low-latency (sub-second) deletion of vectors in production indexes.
Vector Modification	No direct support	- To modify a vector, users must delete the original entry and re-insert the updated vector (inherits all deletion/insertion constraints above).

Critical Note: NGT is not optimized for real-time dynamic workloads (e.g., user-generated content embeddings requiring immediate indexing, or high-turnover datasets). For use cases needing sub-second insertion/deletion with consistent query performance, consider complementary tools (e.g., combining NGT for batch indexing with a lightweight in-memory ANN for real-time vectors) or alternative engines (e.g., FAISS with IVF indexes, or Milvus for fully dynamic workloads).

Characteristics

Feature	Description
Incremental Updates	Partial incremental insertion supported; real-time deletion/modification not fully supported.
Query Speed	High, especially on pre-built indexes; optimized for ANN search (FP16 queries are as fast as FP32 with lower memory overhead).
Index Type	Graph-based and tree hybrid.
Scalability	Handles millions/billions of vectors efficiently with disk-backed storage (FP16 enables larger datasets on the same hardware).
Language Bindings	C++ and Python.

Notes

NGT is more suitable for offline or semi-online workloads, where indexes can be built or updated in batches.
Not designed for high-frequency real-time insertion/deletion scenarios.
For binary vectors (Hamming distance), NGT uses optimized bitwise operations to reduce memory overhead and accelerate queries.
When using non-L2 metrics (e.g., cosine), NGT normalizes vectors during index building by default (configurable via normalize parameter); normalization is compatible with all float types (FP16/FP32/FP64).
Float16 is recommended for large-scale embedding deployments (e.g., >10M vectors with dim ≥ 128) to balance memory, speed, and precision.
For dynamic workloads: If real-time updates are required, NGT is best used in a "hybrid" architecture (batch-indexed NGT for historical data + in-memory ANN for real-time vectors) or with scheduled batch reindexing (e.g., hourly/daily) to mitigate deletion/insertion limitations.

Purpose & Use Cases​

Algorithms Supported​

Core Technical Specifications​

1. Supported Metric Spaces​

2. Supported Data Types​

3. Real-Time Data Insertion/Deletion Support​

Characteristics​

Notes​