SPTAG (Space Partition Tree And Graph) Overview
- Repository: https://github.com/microsoft/SPTAG
- Language: C++ (Python/RESTful API bindings available)
Purpose & Use Cases
SPTAG is an open-source vector similarity search library developed by Microsoft, focused on balanced performance between recall rate and query speed for high-dimensional vector datasets (10s to 1000s of dimensions). It integrates multiple index structures to adapt to different data scales and query requirements, and is oriented to offline batch indexing + semi-real-time query scenarios (limited dynamic update capabilities). It is suitable for:
- Medium-to-large scale (10M–1B) high-dimensional vector approximate nearest neighbor (ANN) search.
- Scenarios requiring adjustable recall-speed tradeoffs (e.g., semantic search with flexible precision requirements).
- Enterprise-level AI applications (text/image embedding retrieval, recommendation system feature matching).
Typical applications include:
- NLP (semantic search, text embedding similarity matching).
- Computer vision (image feature retrieval, small-scale face recognition).
- Recommendation systems (user/item embedding matching for cold start scenarios).
Algorithms Supported
SPTAG provides a variety of index algorithms with different architectural designs, covering tree-based, graph-based and hybrid types:
| Algorithm | Key Characteristics | Use Case |
|---|---|---|
KDTree | Space-partitioning tree index – fast build speed, suitable for low-dimensional vectors (<50 dims). | Low-dimensional vector search (e.g., simple feature vectors). |
BKT (Balanced K-D Tree) | Optimized KDTree variant – better balance of build/query speed for medium dimensions (50–200 dims). | Medium-dimensional static vector datasets. |
Annoy | Tree-based index (inspired by Spotify Annoy) – high query speed, moderate memory usage. | Real-time query scenarios with medium recall requirements. |
HNSW (Hierarchical Navigable Small Worlds) | Graph-based index – high recall rate, optimal for high-dimensional vectors (>200 dims). | High-dimensional vector search (e.g., BERT embeddings, image deep features). |
BruteForce | Exact nearest neighbor search – no approximation, baseline for recall comparison. | Small datasets (<10M vectors) with strict recall requirements. |
Core Technical Specifications
1. Supported Metric Spaces
SPTAG supports mainstream metric spaces for numerical vectors, with no native non-metric space support:
| Metric Type | Full Name | Supported Indexes | Use Case |
|---|---|---|---|
| L2 | Euclidean Distance | All indexes | General numerical vectors (image/video embeddings, low-dimensional features). |
| Cosine | Cosine Similarity/Distance | All indexes | Text embeddings (direction-based similarity, e.g., BERT/LLM outputs). |
| Inner Product (IP) | Dot Product | All indexes (via normalization) | Normalized vector similarity (equivalent to cosine). |
Note: Non-metric spaces (Jaccard/Hamming/L1/L∞) are not supported; custom preprocessing for these spaces is unmaintained and not recommended for production.
2. Supported Data Types
SPTAG is optimized for 32-bit floating-point vectors, with limited support for other numerical types:
| Data Type | Precision | C++ Type | Python Binding Mapping | Support Status | Use Case |
|---|---|---|---|---|---|
| Float32 | 32-bit | float | numpy.float32 | Fully supported (optimal performance) | Default – core optimized type for all indexes. |
| Float64 (Double) | 64-bit | double | numpy.float64 | Partial (2–3x slower) | High-precision scientific computing scenarios only. |
| Int32 | 32-bit | int32_t | numpy.int32 | Partial (only L2 distance) | Discrete feature vectors (e.g., count-based features). |
| Float16 (FP16)/Int8/Binary | - | - | - | Not supported | No native handling; manual conversion to Float32 has no performance/memory benefits. |
3. Dynamic Data Operations (Insert/Delete/Modify)
SPTAG’s dynamic update capabilities are limited and non-real-time optimized:
| Operation | Support Level | Constraints |
|---|---|---|
| Incremental Insertion | Partial support (HNSW/Annoy only) | - Batch insertion recommended (single-vector insertion latency: ~1ms/vector); - Max sustainable QPS: ~500; - Index build time increases significantly after frequent inserts. |
| Real-Time Deletion | Not supported | No native deletion API; "soft delete" (filter results by ID post-query) leads to index bloat and query performance degradation. |
| Vector Modification | Not supported | Must re-insert updated vectors (no in-place modification), inheriting insertion latency constraints. |
Characteristics
| Feature | Description |
|---|---|
| Incremental Updates | Partial batch insertion (HNSW/Annoy); no deletion/modification; non-real-time optimized. |
| Query Speed | HNSW > Annoy > BKT/KDTree; 5–20ms/query (100M 768-dim vectors), 0.5–5ms/query (10M vectors). |
| Index Type | Hybrid (tree-based: KDTree/BKT/Annoy; graph-based: HNSW; brute-force). |
| Scalability | Handles up to 1B vectors (single-node); no distributed multi-node support. |
| Language Bindings | C++ (full feature set), Python (core features), RESTful API (deployment-oriented). |
| Non-Metric Support | No native support (requires custom preprocessing). |
| GPU Support | Not supported |
Notes
- SPTAG’s core advantage is rich index types – select KDTree for low dimensions, HNSW for high dimensions to balance recall and speed.
- For high-speed query scenarios (
<100Mvectors) requiring sub-millisecond latency, FAISS HNSW/GPU is more recommended. - Dynamic updates are only suitable for low-frequency batch insertion (hourly/daily); avoid using it for real-time single-vector write scenarios (e.g., real-time user behavior embedding).
- Limitations: No distributed support; Python bindings lack partial HNSW tuning parameters; no non-metric space support; community maintenance is slow (last major update in 2023).
- Best Practices: Use HNSW index for high-dimensional static vectors (10M–1B); pair with Redis for hot data caching to improve query speed; use batch insertion for incremental updates.