Skip to main content

Regular Expression Engines for Retrieval

Overview

Regular expressions are widely used in retrieval systems for pattern matching, filtering, and text validation. The choice of regex engine directly impacts performance, safety, and maintainability in production environments.

In our system, we support two main engines: RE2 and PCRE. Among them, RE2 is preferred for most industrial use cases due to its linear-time guarantees, memory safety, and ease of integration. PCRE is suitable only for controlled offline processing where full Perl syntax is required.


Engine Comparison

EngineSyntax CoverageSafety / PerformanceDynamic CompilationIndustrial FitNotes
RE2Subset of PCREGuaranteed linear-time, memory-safe✔️Online systems, large-scale text scanning, logsNo backreferences, limited lookarounds, recommended default
PCREFull Perl regexBacktracking engine, unsafe on untrusted input✔️Offline batch processing, advanced pattern matchingSupports captures, lookarounds, complex regex

Recommendation

  • Use RE2 as the default engine for all retrieval and search-related text processing.
  • Use PCRE only in controlled offline environments requiring full Perl-compatible features.
  • Avoid unsafe engines for user-provided patterns to prevent ReDoS attacks.