NGT (Neighborhood Graph and Tree) Overview
- Repository: https://github.com/yahoojapan/NGT
- Language: C++ (Python bindings available)
Purpose & Use Cases
NGT is a vector similarity search engine designed for high-dimensional data. It is suitable for:
- Approximate nearest neighbor (ANN) search over large datasets.
- Scenarios where batch index building is acceptable.
- Systems where insertion is occasional, rather than fully real-time dynamic updates.
Typical applications include:
- Recommendation systems with offline embedding indexing.
- Image or text feature retrieval for search or ranking pipelines.
- Large-scale analytics where high-dimensional vectors need similarity queries.
Algorithms Supported
- NSW (Navigable Small World graph) for fast approximate nearest neighbor search.
- NGT-tree (combines graph and tree structures for indexing).
- Supports metric spaces (detailed below) such as L2 (Euclidean) and cosine similarity.
Core Technical Specifications
1. Supported Metric Spaces
NGT supports a comprehensive set of metric and non-metric similarity measures, covering most common use cases for high-dimensional vector comparison:
| Metric Type | Full Name | Use Case Scenarios |
|---|---|---|
| L2 | Euclidean Distance | General numerical vector similarity (e.g., image feature vectors, dense embeddings). |
| Cosine | Cosine Similarity/Distance | Text embedding similarity (e.g., BERT embeddings), direction-based vector comparison. |
| L1 | Manhattan Distance (City Block) | Sparse vector scenarios, feature difference aggregation (e.g., tabular data vectors). |
| L∞ (Chebyshev) | Chebyshev Distance | Maximizing feature difference (e.g., outlier detection in high-dimensional data). |
| Angle | Angular Distance | Pure direction similarity (ignores vector magnitude, alternative to cosine). |
| Hamming | Hamming Distance | Binary vector comparison (e.g., bitwise feature vectors, hash-based retrieval). |
| Jaccard | Jaccard Similarity/Distance | Set-based vector similarity (e.g., sparse boolean feature vectors). |
Note: For non-metric spaces (e.g., Jaccard), NGT maintains approximate nearest neighbor search performance but does not guarantee strict metric axioms.
2. Supported Data Types
NGT optimizes for numerical vector data, with explicit support for the following scalar types (consistent across C++ and Python bindings, including FP16/float16 support):
| Data Type | Precision | C++ Type | Python Binding Mapping | Use Case |
|---|---|---|---|---|
| Float16 (FP16) | 16-bit | float16_t/half | numpy.float16 | Memory-efficient high-dimensional vectors (e.g., large-scale embedding datasets, GPU-generated embeddings). |
| Float32 | 32-bit | float | numpy.float32 | Default (balance of precision/speed/memory). |
| Float64 (Double) | 64-bit | double | numpy.float64 | High-precision scenarios (e.g., scientific computing). |
| Int32 | 32-bit | int32_t | numpy.int32 | Discrete feature vectors (e.g., count-based embeddings). |
| Int64 | 64-bit | int64_t | numpy.int64 | Large discrete value vectors. |
| Binary | Bit-level | uint8_t (packed) | numpy.uint8 (bitpacked) | Hamming distance scenarios (space-efficient binary vectors). |
Key Notes:
- Float16 support: NGT natively supports 16-bit floating-point vectors (FP16/half-precision) in both C++ core and Python bindings. This is particularly valuable for reducing memory footprint in large-scale embedding systems (e.g., millions/billions of high-dim vectors) and aligning with GPU-generated embeddings (common in deep learning workflows).
- Python bindings:
numpy.float16arrays are directly compatible with NGT’sinsert/build_indexAPIs; no manual type conversion is required (NGT handles FP16 internally for indexing/querying).- Precision tradeoff: Float16 reduces memory usage by 50% vs Float32, with minimal precision loss for most embedding similarity search scenarios (recommended for production when precision requirements allow).
- Binary vectors: Require explicit bitpacking via
ngtpyutilities for optimal performance with Hamming distance.
3. Real-Time Data Insertion/Deletion Support
NGT’s support for dynamic data updates (insertion/deletion) is partial and non-real-time optimized, with clear limitations for high-frequency dynamic workloads:
| Operation | Support Level | Details & Constraints |
|---|---|---|
| Incremental Insertion | Partial support (semi-real-time) | - Small-batch incremental insertion is allowed (e.g., hundreds/thousands of vectors at a time). - Inserted vectors require a build_index/optimize call to be fully integrated into the search graph (otherwise query performance degrades).- Not designed for high-frequency single-vector insertions (e.g., >1000 inserts/sec) – high latency and index fragmentation will occur. |
| Real-Time Deletion | Limited support (non-real-time) | - Deletion of vectors is possible via remove API, but requires a full index rebuild/optimization to clean up deleted entries from the graph structure.- Deleted vectors are marked as "invalid" in the index (not immediately purged), leading to gradual memory bloat. - No native support for low-latency (sub-second) deletion of vectors in production indexes. |
| Vector Modification | No direct support | - To modify a vector, users must delete the original entry and re-insert the updated vector (inherits all deletion/insertion constraints above). |
Critical Note: NGT is not optimized for real-time dynamic workloads (e.g., user-generated content embeddings requiring immediate indexing, or high-turnover datasets). For use cases needing sub-second insertion/deletion with consistent query performance, consider complementary tools (e.g., combining NGT for batch indexing with a lightweight in-memory ANN for real-time vectors) or alternative engines (e.g., FAISS with IVF indexes, or Milvus for fully dynamic workloads).
Characteristics
| Feature | Description |
|---|---|
| Incremental Updates | Partial incremental insertion supported; real-time deletion/modification not fully supported. |
| Query Speed | High, especially on pre-built indexes; optimized for ANN search (FP16 queries are as fast as FP32 with lower memory overhead). |
| Index Type | Graph-based and tree hybrid. |
| Scalability | Handles millions/billions of vectors efficiently with disk-backed storage (FP16 enables larger datasets on the same hardware). |
| Language Bindings | C++ and Python. |
Notes
- NGT is more suitable for offline or semi-online workloads, where indexes can be built or updated in batches.
- Not designed for high-frequency real-time insertion/deletion scenarios.
- For binary vectors (Hamming distance), NGT uses optimized bitwise operations to reduce memory overhead and accelerate queries.
- When using non-L2 metrics (e.g., cosine), NGT normalizes vectors during index building by default (configurable via
normalizeparameter); normalization is compatible with all float types (FP16/FP32/FP64). - Float16 is recommended for large-scale embedding deployments (e.g., >10M vectors with dim ≥ 128) to balance memory, speed, and precision.
- For dynamic workloads: If real-time updates are required, NGT is best used in a "hybrid" architecture (batch-indexed NGT for historical data + in-memory ANN for real-time vectors) or with scheduled batch reindexing (e.g., hourly/daily) to mitigate deletion/insertion limitations.