Posting List Implementations (C++ / Industrial)

Overview

A posting list (or inverted list) is a fundamental data structure in large‑scale retrieval and search systems. It stores the list of document IDs (and optionally positions) where a given term appears. An efficient posting list engine must balance:

memory footprint
scan and random access performance
intersection / skip efficiency
compression / unpacking cost

In practical systems, posting lists are implemented as either uncompressed arrays or compressed integer lists with optional block/skip indexing.

1) Uncompressed Posting List

Overview

An uncompressed posting list is a sorted array of uint32_t document IDs (or positions). It is conceptually simple and provides fast access without decode overhead.

Characteristics

Space: largest (no compression)
Scan performance: best for sequential / random access
Skip support: trivial
Serialization: direct binary dump

Typical Use Cases

Short posting lists
Memory‑rich systems
Scenarios where CPU decode cost must be minimal

Simple C++ Sketch

std::vector<uint32_t> postlist = {3, 7, 15, 29, 102};

// Sequential scan
for (uint32_t doc : postlist) {
    process(doc);
}

// Random access
uint32_t doc = postlist[2];

2) Block / Skip Structures

Overview

Block or skip indexing partitions a posting list into fixed or variable sized blocks, and stores skip pointers (or summary metadata) to accelerate traversal and intersection. Many modern retrieval engines combine block indexing with compressed posting formats for optimal performance.

Characteristics

Skip support: allows jumping past irrelevant blocks
Intersection speed: improved for long lists
Extra metadata: cost proportional to block count

Typical Use Cases

Long posting lists
High term intersection frequencies
Real‑time query processing

3) Delta + Compressed Integer Lists

Principle

Delta encoding exploits the fact that sorted document IDs often have small differences between adjacent values. For a posting list:

original: [d1, d2, d3, ...]
delta:    [d1, d2 - d1, d3 - d2, ...]

Delta values tend to be small and more compressible. A second stage integer codec then compresses these deltas.

Common Industrial Codecs

Industry uses several established integer compression codecs:

Codec Category	Example Algorithms
Variable‑byte	VByte
Bit‑packing	PFor (Frame of Reference)
SIMD‑friendly	SIMD‑FOR / SIMD‑PFor
Simple integer	Simple9 / Simple16
Legacy gamma	Elias Gamma / Delta / Rice

Example Libraries

These are real, industrially used libraries:

FastPForLib

GitHub: https://github.com/lemire/FastPFor
Implements:
Frame of Reference (FOR)
Patched Frame of Reference (PFor)
SIMD‑PFor / SIMD‑FOR variants
Variable‑byte, Simple9, Simple16, etc.
Widely used in inverted index systems for posting list compression.

Encoding / Decoding Pattern (Pseudo)

#include "common/headers.hpp"
#include "FastPFor/headers.hpp"

// assume posting list in 'postings'
auto deltas = delta_encode(postings);

// choose a codec, e.g., PFor
FastPForLib::PFor<32> codec;

// encode into compressed buffer
std::vector<uint32_t> compressed(deltas.size());
size_t outSize = 0;
codec.encodeArray(deltas.data(), deltas.size(), compressed.data(), outSize);

// decode back
std::vector<uint32_t> decoded(deltas.size());
size_t decodedSize = 0;
codec.decodeArray(compressed.data(), outSize, decoded.data(), decodedSize);

// restore original
auto restored = delta_decode(decoded);

Note: actual API names vary by FastPForLib version; above is schematic.

4) Simple9 / Simple16 / Variable Byte

These codecs compress integer lists without SIMD optimization but remain common in older systems:

Variable Byte (VByte): easy decode, moderate compression
Simple9 / Simple16: better compression, slightly higher decode cost
Elias / Gamma / Delta / Rice: good theoretical compression, weaker random access

Typical Uses

Legacy systems
Compatibility scenarios
Lightweight compression where SIMD is unnecessary

5) Emerging SIMD‑Optimized Schemes

Newer schemes focus on balancing compression with SIMD decode throughput:

Bit‑packing with SIMD instructions
Block‑oriented ANS / Bulk coding
Byte/bit shuffle + entropy coding (zstd‐style)

While research demonstrates high throughput, adoption in large retrieval stacks is still catching up compared to established codecs like SIMD‑PFor / FastPFor.

Industrial Comparison

Scheme	Space Efficiency	Decode Speed	Random Access	Skip Support	SIMD Friendly
Uncompressed	❌	✔️	✔️	✔️	—
Block / Skip Structure	⬆️	✔️	✔️	✔️	optional
PFor / PFORDelta	✔️	✔️✔️	⚠️	⬆️ via blocks	✔️
Simple9 / Simple16	⬆️	✔️	⚠️	⬆️	—
Variable Byte	➖	✔️✔️	⚠️	limited	—
New SIMD Schemes	✔️✔️	✔️✔️✔️	⚠️	plugin	✔️

Typical Scenarios

Uncompressed

Very short lists
Memory available
Lowest CPU cost

Compressed (PFor / PFORDelta)

Long posting lists (millions of entries)
Disk / memory bandwidth constrained
Intersection and batch processing dominant

Block / Skip Structure

Frequent skipping required
Multi‐term intersection heavy workloads
Real‐time low latency retrieval

Implementation Notes

Practical industrial systems often combine:

Block Index + Delta Encoding + SIMD‑friendly Codec

to achieve a balance of compact storage, fast sequential scan, and efficient intersection.

Posting lists are typically part of a larger inverted index which may include:

Position lists (for phrase queries)
Payloads (term frequency, field boosts)
Bitmap filters (for quick disqualification)

Engineering considerations include:

Memory pool / mmap support
Batch decode interfaces
Cache‑friendly layout
Concurrency support (read parallelism)

Posting List Encoding Example (FastPForLib)

#include "FastPFor/headers.hpp"
#include <vector>

// delta_encode / delta_decode implemented separately

std::vector<uint32_t> postings = {10, 15, 102, 205, 2200};

// compute deltas
auto deltas = delta_encode(postings);

// choose PFor codec from FastPForLib
FastPForLib::PFor<32> codec;

// encode
std::vector<uint32_t> compressed(deltas.size());
size_t outSize = 0;
codec.encodeArray(deltas.data(), deltas.size(), compressed.data(), outSize);

// decode
std::vector<uint32_t> recovered(deltas.size());
size_t recSize = 0;
codec.decodeArray(compressed.data(), outSize, recovered.data(), recSize);

// restore postings
auto restored = delta_decode(recovered);

Overview​

1) Uncompressed Posting List

Overview​

Characteristics​

Typical Use Cases​

Simple C++ Sketch​

2) Block / Skip Structures

Overview​

Characteristics​

Typical Use Cases​

3) Delta + Compressed Integer Lists

Principle​

Common Industrial Codecs​

Example Libraries​

FastPForLib​

4) Simple9 / Simple16 / Variable Byte

Typical Uses​

5) Emerging SIMD‑Optimized Schemes

Industrial Comparison

Typical Scenarios

Uncompressed​

Compressed (PFor / PFORDelta)​

Block / Skip Structure​

Implementation Notes

Posting List Encoding Example (FastPForLib)

Overview

Overview

Characteristics

Typical Use Cases

Simple C++ Sketch

Overview

Characteristics

Typical Use Cases

Principle

Common Industrial Codecs

Example Libraries

FastPForLib

Typical Uses

Uncompressed

Compressed (PFor / PFORDelta)

Block / Skip Structure