FAISS (Facebook AI Similarity Search) Overview
- Repository: https://github.com/facebookresearch/faiss
- Language: C++ (Python bindings available; GPU/CUDA support for accelerated operations)
Purpose & Use Cases
FAISS is a highly optimized, industrial-grade vector similarity search library designed for scalable approximate nearest neighbor (ANN) and exact nearest neighbor search over high-dimensional data. It balances speed, memory efficiency, and flexibility, with native support for both offline/static and semi-real-time dynamic workloads (via incremental indexes). It is suitable for:
- Large-scale (billions of vectors) high-dimensional similarity search (10s to 1000s of dimensions).
- Both offline batch indexing and semi-real-time incremental insertion/deletion (via specific index types).
- Production-grade deployments (CPU/GPU acceleration, distributed support) for AI/ML pipelines.
Typical applications include:
- Real-time/recommendation systems (user/item embedding retrieval, real-time feature matching).
- Computer vision (image feature retrieval, face recognition).
- NLP (text embedding similarity search, semantic search).
- Large-scale scientific computing (high-dimensional data clustering/analytics).
Algorithms Supported
FAISS provides a comprehensive suite of index algorithms (each optimized for different tradeoffs of speed, memory, recall, and dynamicity), grouped by core architecture:
1. Brute-force (Exact Search)
| Algorithm | Key Characteristics | Use Case |
|---|---|---|
IndexFlatL2/IndexFlatIP | Exact L2/Euclidean or Inner Product search (no approximation, baseline). | Small datasets (<1M vectors), high-recall requirements. |
2. Quantization-Based Indexes (Memory-Optimized)
| Algorithm | Key Characteristics | Use Case |
|---|---|---|
IndexIVFFlat | Inverted file (IVF) + flat quantizer (balances speed/memory, no precision loss). | Medium-scale datasets (1M–100M vectors), moderate recall. |
IndexIVFPQ | IVF + Product Quantization (PQ) (high memory compression, minor recall loss). | Large-scale datasets (100M+ vectors), memory-constrained environments. |
IndexIVFSQ | IVF + Scalar Quantization (SQ) (lightweight compression, low recall loss). | Edge/embedded systems, low-memory deployments. |
IndexPQ | Pure Product Quantization (no IVF, fast search for ultra-large datasets). | Billion-scale vectors, acceptable recall tradeoff. |
3. Graph-Based Indexes (High-Speed ANN)
| Algorithm | Key Characteristics | Use Case |
|---|---|---|
IndexHNSWFlat | HNSW (Hierarchical Navigable Small Worlds) + flat quantizer (fastest ANN for high-dim data). | Low-latency query scenarios (e.g., real-time retrieval). |
IndexIVFHNSW | IVF + HNSW (combines IVF’s scalability with HNSW’s speed). | 100M+ vectors, low query latency requirements. |
4. Dynamic/Incremental Indexes (Semi-Real-Time)
| Algorithm | Key Characteristics | Use Case |
|---|---|---|
IndexIDMap | Wrapper for static indexes to support ID-based insertion/deletion (soft dynamic). | Semi-real-time updates (hourly/daily batch). |
IndexIDMap2 | Enhanced IDMap with better memory management for frequent updates. | Medium-frequency incremental insertion. |
IndexIVFScalarQuantizer (with IDMap) | IVF+SQ + dynamic ID mapping | Real-time recommendation systems (10k+ QPS insertion). |
5. GPU-Accelerated Indexes
All core algorithms (Flat/IVF/PQ/HNSW) have GPU variants (e.g., GpuIndexFlatL2, GpuIndexIVFPQ) optimized for CUDA-enabled GPUs, delivering 10–100x faster query/insertion speed vs CPU.
Core Technical Specifications
1. Supported Metric Spaces (Official Implementation)
FAISS focuses on metric spaces for numerical vectors (no native non-metric support like Jaccard/Hamming):
| Metric Type | Full Name | Supported Indexes | Use Case |
|---|---|---|---|
| L2 | Euclidean Distance | All indexes | General numerical vectors (image embeddings). |
| Inner Product (IP) | Dot Product | All indexes | Text embeddings (equivalent to cosine for normalized vectors). |
| Cosine Similarity | Cosine Distance (normalized IP) | All indexes (via vector normalization) | Direction-based vector comparison (NLP embeddings). |
Note: Non-metric spaces (Jaccard/Hamming/Levenshtein) require custom preprocessing (e.g., vector normalization for cosine simulation) – no native support.
2. Supported Data Types (Official Implementation)
FAISS has extensive, optimized data type support (CPU/GPU variants differ slightly):
| Data Type | Precision | C++ Type | Python Binding Mapping | CPU Support | GPU Support | Use Case |
|---|---|---|---|---|---|---|
| Float32 | 32-bit | float | numpy.float32 | Full | Full | Default (optimal balance of speed/precision). |
| Float64 (Double) | 64-bit | double | numpy.float64 | Full | Partial | High-precision scientific computing. |
| Float16 (FP16) | 16-bit | half/uint16_t | numpy.float16 | Partial | Full | GPU-accelerated workloads (memory-efficient). |
| Int8/UInt8 | 8-bit | int8_t/uint8_t | numpy.int8/numpy.uint8 | Full (via SQ) | Full (via SQ) | Edge devices, ultra-low memory usage. |
| Binary | Bit-level | uint8_t (packed) | numpy.uint8 (bitpacked) | Partial (brute-force only) | No | Hamming distance (limited use cases). |
Key Notes:
- FP16: GPU indexes fully optimize FP16 (50% memory reduction vs Float32, 2x faster queries); CPU requires manual conversion to Float32.
- 8-bit integers: Supported via scalar quantization (SQ) – reduces memory by 4x vs Float32 with minimal recall loss.
3. Real-Time Data Insertion/Deletion Support
FAISS’s dynamic capabilities vary by index type – partial support for semi-real-time workloads (industrial-grade for production):
| Operation | Support Level | Indexes with Best Support | Constraints |
|---|---|---|---|
| Incremental Insertion | Full support (real-time optimized) | IndexIDMap/IndexIDMap2/IVF-based | - GPU indexes support 10k+ QPS insertion (sub-millisecond latency); - CPU indexes support 1k+ QPS; - HNSW indexes have slower insertion (graph rebalancing). |
| Real-Time Deletion | Partial support (soft delete + filtered query) | IndexIDMap2/IVF-based | - "Soft delete" (mark IDs as invalid, filter post-query); - Physical deletion requires index rebuild (IVF) or reinsertion (HNSW); - No memory bloat for up to 20% deleted vectors. |
| Vector Modification | Indirect support (delete + reinsert) | IndexIDMap2 | Inherits deletion/insertion latency (sub-millisecond for GPU). |
Critical Note: FAISS’s dynamic support is production-grade (used in Facebook’s real-time recommendation systems) – for strict real-time deletion (physical removal), pair with
IndexIVF+ periodic batch rebuilds (hourly/daily).
Characteristics
| Feature | Description |
|---|---|
| Incremental Updates | Full incremental insertion (all indexes); partial deletion (IDMap wrappers); GPU-optimized for real-time. |
| Query Speed | Industry-leading (GPU indexes: 10–100x faster than NMSLIB/HNSWLIB; CPU HNSW = NMSLIB HNSW). |
| Index Type | Hybrid (brute-force, quantization-based, graph-based, IVF-based). |
| Scalability | Handles billions of vectors (distributed FAISS for multi-node clusters; disk-backed storage). |
| Language Bindings | C++ (full feature set), Python (mature), C#/Java (community-maintained). |
| GPU Support | Native CUDA optimization for all core indexes (multi-GPU/distributed GPU support). |
| Non-Metric Support | No native support (requires custom preprocessing). |
Notes
- FAISS is the industry standard for large-scale vector search – balances offline batch processing and semi-real-time dynamic workloads (unlike NMSLIB/HNSWLIB/NGT which focus on offline).
- GPU acceleration is a key differentiator – critical for low-latency (sub-millisecond) query scenarios (e.g., real-time recommendation).
- Quantization tradeoffs: PQ delivers maximum memory compression (16–64x) but minor recall loss; SQ is lightweight (4x compression) with near-zero recall loss.
- Distributed FAISS (via
faiss-cluster) supports multi-node CPU/GPU deployments for trillion-scale vectors. - Limitations: No native non-metric space support (use NMSLIB if Jaccard/Hamming is required); HNSW indexes have higher memory usage vs IVF-PQ.
- Real-time best practices: Use
GpuIndexIVFPQ+IndexIDMap2for 10k+ QPS insertion/deletion; pair with Redis for ultra-low-latency hot vector caching.