Skip to main content

index

Trie Implementations Overview

Tries are specialized data structures primarily used for SUG (suggestion / autocomplete) systems and fast prefix matching. They enable extremely low-latency lookup of prefixes in large English dictionaries or query sets. Depending on requirements such as dynamic updates, memory efficiency, and query performance, different trie implementations are suitable.

This section summarizes four widely used trie implementations in industrial English dictionary workloads: MARISA, DARTS, TSL, and Cedar.


1. MARISA Trie

  • Type: Static, memory-efficient trie
  • Primary Use Case: High-performance prefix matching / autocomplete in read-only English dictionaries
  • Strengths: Compact memory footprint, fast exact and prefix search, memory-mapped support
  • Limitations: Read-only after build; no dynamic insert/delete

2. DARTS Trie

  • Type: Static double-array trie
  • Primary Use Case: Autocomplete / SUG systems with very large static key sets
  • Strengths: Fast exact match and prefix search, compact double-array representation
  • Limitations: Read-only; no dynamic payload attachment

3. TSL Trie (Ternary Search Tree)

  • Type: Dynamic trie (ternary search tree)
  • Primary Use Case: Dynamic autocomplete or evolving dictionary scenarios
  • Strengths: Supports online insertions and deletions, prefix search, memory-efficient for sparse sets
  • Limitations: Slightly slower than double-array tries for large static datasets

4. Cedar Trie

  • Type: Fully dynamic updatable double-array trie
  • Primary Use Case: High-performance SUG systems with online updates
  • Strengths: Online insert/delete supported, exact match and prefix search, serializable and mmap-able
  • Limitations: Slightly larger memory footprint than static double-array; payload support limited

Comparison Table

TrieDynamic Insert/DeleteStatic Read-OnlyPrefix SearchExact MatchMemory-MappedNotes
MARISA✔️✔️✔️✔️Best for static SUG / autocomplete
DARTS✔️✔️✔️✔️High-performance static SUG
TSL✔️✔️✔️Dynamic SUG / evolving dictionary
Cedar✔️Partial✔️✔️✔️Fully dynamic SUG / autocomplete

Scene Selection Guidelines

  1. Static, large English SUG / autocomplete dictionaries:
  • Choose MARISA or DARTS
  • Build once, deploy many times; memory-mapped for fast cold start
  1. Dynamic autocomplete / evolving query dictionaries:
  • Choose TSL or Cedar
  • TSL if simpler dynamic operations suffice; Cedar if high-performance online updates are required
  1. Prefix-heavy SUG workloads:
  • All four support prefix search
  • Memory efficiency: MARISA > DARTS > Cedar ≈ TSL
  • Dynamic prefix queries: Cedar or TSL
  1. Persistent SUG dictionary for production:
  • MARISA and Cedar offer serialization / mmap
  • DARTS can be serialized but updates require rebuild
  • TSL usually in-memory for dynamic operations

Conclusion

  • Primary scenario: Tries excel in SUG / autocomplete and fast prefix matching

  • Static vs Dynamic: MARISA & DARTS → static, TSL & Cedar → dynamic

  • Scene fit:

  • MARISA / DARTS → read-only autocomplete dictionaries

  • TSL → dynamic autocomplete / evolving dictionary

  • Cedar → dynamic, high-performance autocomplete with serialization