跳到主要内容

marisa

MARISA Trie

Overview

MARISA Trie is a static, space-efficient trie implementation in C++, designed for large-scale English dictionaries. It supports exact match, prefix search, and reverse lookup. It also provides memory-mapped loading (mmap) for fast initialization and minimal memory usage.

The trie is built once and read-only, optimized for offline construction and repeated deployment.


Key Features

  • Static and compact: optimized for memory efficiency in large English vocabularies.
  • Fast lookup: supports exact match, prefix search, and reverse lookup.
  • Memory-mapped support: enables zero-copy initialization and fast cold start.
  • Offline construction: build once, deploy many times.

Typical Use Cases

  • English spell-check dictionaries
  • Autocomplete / prefix suggestion for search engines
  • Static vocabulary lookup in English text analysis tools
  • Offline text filtering or preprocessing in English corpora

Not Suitable For

  • Real-time or dynamic updates
  • High-concurrency write-heavy workloads
  • Non-English or Unicode-heavy applications

Industrial Fit

RequirementMARISA Support
mmap-able & serializable dictionary✔️
Fast prefix & exact search✔️
Dynamic update
Attaching payload (bitmap / counts)

Conclusion: MARISA Trie is suitable as a static English dictionary layer or prefix lookup layer in production systems, particularly for cold-start or read-only applications.


C++ Example

Build and Install

git clone https://github.com/s-yata/marisa-trie.git
cd marisa-trie
cmake -S . -B build-release -DCMAKE_BUILD_TYPE=Release
cmake --build build-release
sudo cmake --install build-release

Construct a Trie

#include <marisa.h>
#include <iostream>
#include <vector>

int main() {
marisa::Trie trie;
std::vector<std::string> keys = {"apple", "application", "banana", "band"};

for (const auto& key : keys) {
trie.push_back(key.c_str());
}
trie.build();

trie.save("dictionary.marisa");

return 0;
}

Load and Query a Trie

#include <marisa.h>
#include <iostream>

int main() {
marisa::Trie trie;
trie.load("dictionary.marisa");

marisa::Agent agent;
agent.set_query("app");
if (trie.lookup(agent)) {
std::cout << "Found: " << agent.key() << std::endl;
}

// Prefix search
agent.set_query("ap");
trie.predictive_search(agent);
do {
std::cout << "Prefix match: " << agent.key() << std::endl;
} while (trie.predictive_search_next());

return 0;
}