跳到主要内容

NGT (Neighborhood Graph and Tree) Overview

Purpose & Use Cases

NGT is a vector similarity search engine designed for high-dimensional data. It is suitable for:

  • Approximate nearest neighbor (ANN) search over large datasets.
  • Scenarios where batch index building is acceptable.
  • Systems where insertion is occasional, rather than fully real-time dynamic updates.

Typical applications include:

  • Recommendation systems with offline embedding indexing.
  • Image or text feature retrieval for search or ranking pipelines.
  • Large-scale analytics where high-dimensional vectors need similarity queries.

Algorithms Supported

  • NSW (Navigable Small World graph) for fast approximate nearest neighbor search.
  • NGT-tree (combines graph and tree structures for indexing).
  • Supports metric spaces (detailed below) such as L2 (Euclidean) and cosine similarity.

Core Technical Specifications

1. Supported Metric Spaces

NGT supports a comprehensive set of metric and non-metric similarity measures, covering most common use cases for high-dimensional vector comparison:

Metric TypeFull NameUse Case Scenarios
L2Euclidean DistanceGeneral numerical vector similarity (e.g., image feature vectors, dense embeddings).
CosineCosine Similarity/DistanceText embedding similarity (e.g., BERT embeddings), direction-based vector comparison.
L1Manhattan Distance (City Block)Sparse vector scenarios, feature difference aggregation (e.g., tabular data vectors).
L∞ (Chebyshev)Chebyshev DistanceMaximizing feature difference (e.g., outlier detection in high-dimensional data).
AngleAngular DistancePure direction similarity (ignores vector magnitude, alternative to cosine).
HammingHamming DistanceBinary vector comparison (e.g., bitwise feature vectors, hash-based retrieval).
JaccardJaccard Similarity/DistanceSet-based vector similarity (e.g., sparse boolean feature vectors).

Note: For non-metric spaces (e.g., Jaccard), NGT maintains approximate nearest neighbor search performance but does not guarantee strict metric axioms.

2. Supported Data Types

NGT optimizes for numerical vector data, with explicit support for the following scalar types (consistent across C++ and Python bindings, including FP16/float16 support):

Data TypePrecisionC++ TypePython Binding MappingUse Case
Float16 (FP16)16-bitfloat16_t/halfnumpy.float16Memory-efficient high-dimensional vectors (e.g., large-scale embedding datasets, GPU-generated embeddings).
Float3232-bitfloatnumpy.float32Default (balance of precision/speed/memory).
Float64 (Double)64-bitdoublenumpy.float64High-precision scenarios (e.g., scientific computing).
Int3232-bitint32_tnumpy.int32Discrete feature vectors (e.g., count-based embeddings).
Int6464-bitint64_tnumpy.int64Large discrete value vectors.
BinaryBit-leveluint8_t (packed)numpy.uint8 (bitpacked)Hamming distance scenarios (space-efficient binary vectors).

Key Notes:

  1. Float16 support: NGT natively supports 16-bit floating-point vectors (FP16/half-precision) in both C++ core and Python bindings. This is particularly valuable for reducing memory footprint in large-scale embedding systems (e.g., millions/billions of high-dim vectors) and aligning with GPU-generated embeddings (common in deep learning workflows).
  2. Python bindings: numpy.float16 arrays are directly compatible with NGT’s insert/build_index APIs; no manual type conversion is required (NGT handles FP16 internally for indexing/querying).
  3. Precision tradeoff: Float16 reduces memory usage by 50% vs Float32, with minimal precision loss for most embedding similarity search scenarios (recommended for production when precision requirements allow).
  4. Binary vectors: Require explicit bitpacking via ngtpy utilities for optimal performance with Hamming distance.

3. Real-Time Data Insertion/Deletion Support

NGT’s support for dynamic data updates (insertion/deletion) is partial and non-real-time optimized, with clear limitations for high-frequency dynamic workloads:

OperationSupport LevelDetails & Constraints
Incremental InsertionPartial support (semi-real-time)- Small-batch incremental insertion is allowed (e.g., hundreds/thousands of vectors at a time).
- Inserted vectors require a build_index/optimize call to be fully integrated into the search graph (otherwise query performance degrades).
- Not designed for high-frequency single-vector insertions (e.g., >1000 inserts/sec) – high latency and index fragmentation will occur.
Real-Time DeletionLimited support (non-real-time)- Deletion of vectors is possible via remove API, but requires a full index rebuild/optimization to clean up deleted entries from the graph structure.
- Deleted vectors are marked as "invalid" in the index (not immediately purged), leading to gradual memory bloat.
- No native support for low-latency (sub-second) deletion of vectors in production indexes.
Vector ModificationNo direct support- To modify a vector, users must delete the original entry and re-insert the updated vector (inherits all deletion/insertion constraints above).

Critical Note: NGT is not optimized for real-time dynamic workloads (e.g., user-generated content embeddings requiring immediate indexing, or high-turnover datasets). For use cases needing sub-second insertion/deletion with consistent query performance, consider complementary tools (e.g., combining NGT for batch indexing with a lightweight in-memory ANN for real-time vectors) or alternative engines (e.g., FAISS with IVF indexes, or Milvus for fully dynamic workloads).

Characteristics

FeatureDescription
Incremental UpdatesPartial incremental insertion supported; real-time deletion/modification not fully supported.
Query SpeedHigh, especially on pre-built indexes; optimized for ANN search (FP16 queries are as fast as FP32 with lower memory overhead).
Index TypeGraph-based and tree hybrid.
ScalabilityHandles millions/billions of vectors efficiently with disk-backed storage (FP16 enables larger datasets on the same hardware).
Language BindingsC++ and Python.

Notes

  • NGT is more suitable for offline or semi-online workloads, where indexes can be built or updated in batches.
  • Not designed for high-frequency real-time insertion/deletion scenarios.
  • For binary vectors (Hamming distance), NGT uses optimized bitwise operations to reduce memory overhead and accelerate queries.
  • When using non-L2 metrics (e.g., cosine), NGT normalizes vectors during index building by default (configurable via normalize parameter); normalization is compatible with all float types (FP16/FP32/FP64).
  • Float16 is recommended for large-scale embedding deployments (e.g., >10M vectors with dim ≥ 128) to balance memory, speed, and precision.
  • For dynamic workloads: If real-time updates are required, NGT is best used in a "hybrid" architecture (batch-indexed NGT for historical data + in-memory ANN for real-time vectors) or with scheduled batch reindexing (e.g., hourly/daily) to mitigate deletion/insertion limitations.