Hadar – Chinese Simplified/Traditional Conversion

Project: Hadar GitHub Language: C++

Overview

Hadar is a high-performance Chinese text conversion engine, designed for Simplified ↔ Traditional Chinese conversion. It is an upgraded evolution of OpenCC, maintaining the core dictionary-based, rule-driven conversion approach while optimizing for speed and memory efficiency.

The engine uses static word/phrase dictionaries and applies deterministic rules for conversion. It is primarily suited for offline preprocessing, NLP pipelines, and batch text normalization.

Key Features

Simplified ↔ Traditional conversion: Accurate conversion at word and phrase level, not only character-by-character.
Rule-driven mapping: Handles ambiguous mappings with priority rules.
Static dictionaries: Memory-efficient, fast lookup for large-scale corpora.
Batch processing optimized: Capable of processing millions of characters efficiently.
Unicode support: Fully supports UTF-8 encoded Chinese text.

Typical Use Cases

Preprocessing text for Chinese NLP pipelines.
Normalizing user-generated content across Simplified and Traditional Chinese.
Large-scale corpus conversion for search engines or text analytics.
Integration into offline or batch processing systems requiring deterministic conversion.

Industrial Fit

Requirement	Hadar Support
Simplified ↔ Traditional conversion	✔️
Word/phrase level accuracy	✔️
Dynamic updates	❌
High-volume batch processing	✔️
Unicode support	✔️

Overview​

Key Features​

Typical Use Cases​

Industrial Fit​

Overview

Key Features

Typical Use Cases

Industrial Fit