marisa
MARISA Trie
- Project: MARISA — Matching Algorithm with Recursively Implemented StorAge
- Repository: https://github.com/s-yata/marisa-trie
Overview
MARISA Trie is a static, space-efficient trie implementation in C++, designed for large-scale English dictionaries. It supports exact match, prefix search, and reverse lookup. It also provides memory-mapped loading (mmap) for fast initialization and minimal memory usage.
The trie is built once and read-only, optimized for offline construction and repeated deployment.
Key Features
- Static and compact: optimized for memory efficiency in large English vocabularies.
- Fast lookup: supports exact match, prefix search, and reverse lookup.
- Memory-mapped support: enables zero-copy initialization and fast cold start.
- Offline construction: build once, deploy many times.
Typical Use Cases
- English spell-check dictionaries
- Autocomplete / prefix suggestion for search engines
- Static vocabulary lookup in English text analysis tools
- Offline text filtering or preprocessing in English corpora
Not Suitable For
- Real-time or dynamic updates
- High-concurrency write-heavy workloads
- Non-English or Unicode-heavy applications
Industrial Fit
| Requirement | MARISA Support |
|---|---|
| mmap-able & serializable dictionary | ✔️ |
| Fast prefix & exact search | ✔️ |
| Dynamic update | ❌ |
| Attaching payload (bitmap / counts) | ❌ |
Conclusion: MARISA Trie is suitable as a static English dictionary layer or prefix lookup layer in production systems, particularly for cold-start or read-only applications.
C++ Example
Build and Install
git clone https://github.com/s-yata/marisa-trie.git
cd marisa-trie
cmake -S . -B build-release -DCMAKE_BUILD_TYPE=Release
cmake --build build-release
sudo cmake --install build-release
Construct a Trie
#include <marisa.h>
#include <iostream>
#include <vector>
int main() {
marisa::Trie trie;
std::vector<std::string> keys = {"apple", "application", "banana", "band"};
for (const auto& key : keys) {
trie.push_back(key.c_str());
}
trie.build();
trie.save("dictionary.marisa");
return 0;
}
Load and Query a Trie
#include <marisa.h>
#include <iostream>
int main() {
marisa::Trie trie;
trie.load("dictionary.marisa");
marisa::Agent agent;
agent.set_query("app");
if (trie.lookup(agent)) {
std::cout << "Found: " << agent.key() << std::endl;
}
// Prefix search
agent.set_query("ap");
trie.predictive_search(agent);
do {
std::cout << "Prefix match: " << agent.key() << std::endl;
} while (trie.predictive_search_next());
return 0;
}