Skip to main content

Hadar – Chinese Simplified/Traditional Conversion

Project: Hadar GitHub Language: C++

Overview

Hadar is a high-performance Chinese text conversion engine, designed for Simplified ↔ Traditional Chinese conversion. It is an upgraded evolution of OpenCC, maintaining the core dictionary-based, rule-driven conversion approach while optimizing for speed and memory efficiency.

The engine uses static word/phrase dictionaries and applies deterministic rules for conversion. It is primarily suited for offline preprocessing, NLP pipelines, and batch text normalization.


Key Features

  • Simplified ↔ Traditional conversion: Accurate conversion at word and phrase level, not only character-by-character.
  • Rule-driven mapping: Handles ambiguous mappings with priority rules.
  • Static dictionaries: Memory-efficient, fast lookup for large-scale corpora.
  • Batch processing optimized: Capable of processing millions of characters efficiently.
  • Unicode support: Fully supports UTF-8 encoded Chinese text.

Typical Use Cases

  • Preprocessing text for Chinese NLP pipelines.
  • Normalizing user-generated content across Simplified and Traditional Chinese.
  • Large-scale corpus conversion for search engines or text analytics.
  • Integration into offline or batch processing systems requiring deterministic conversion.

Industrial Fit

RequirementHadar Support
Simplified ↔ Traditional conversion✔️
Word/phrase level accuracy✔️
Dynamic updates
High-volume batch processing✔️
Unicode support✔️