Frequently Asked Questions

[TOC]

General Questions

What is Merak? Merak is a C++ library for parsing and generating JSON. You can refer to its full list of features.
Why is it called Merak? It draws inspiration from RapidXML, a high-performance XML DOM parser.
Is Merak similar to RapidXML? Merak borrows some design concepts from RapidXML, including in situ parsing and being a header-only library. However, their APIs are completely different. Additionally, Merak offers many features not available in RapidXML.
Is Merak free? Yes, it is free under the MIT License and can be used in commercial software. For details, see license.txt.
Is Merak small? What are its dependencies? Yes. On Windows, an executable that parses JSON and prints statistics is less than 30KB. Merak only depends on the C++ Standard Library.
How to install Merak? See the Installation section.
Can Merak run on my platform? The community has tested Merak on various combinations of operating systems/compilers/CPU architectures. However, we cannot guarantee it will run on your specific platform—simply build and run the unit tests to find out.
Does Merak support C++03? What about C++11? Merak was initially implemented for C++03 and later added optional support for C++11 features (e.g., move constructors, noexcept). Merak should be compatible with all compilers conforming to C++03 or C++11.
Is Merak actually used in real-world applications? Yes. It is deployed in production front-end and back-end applications. One community member reported that Merak parses 50 million JSON documents daily in their system.
How is Merak tested? Merak includes a suite of unit tests for automated testing. Travis (for Linux) and AppVeyor (for Windows) compile and run unit tests for all changes. Valgrind is also used on Linux to detect memory leaks.
Does Merak have complete documentation? Merak provides a user manual and API documentation.
Are there alternative libraries? There are many alternatives. For example, nativejson-benchmark lists several open-source C/C++ JSON libraries, and json.org also has a comprehensive list.

JSON

What is JSON? JSON (JavaScript Object Notation) is a lightweight data interchange format using a human-readable text format. For more details on JSON, refer to RFC7159 and ECMA-404.
What are common use cases for JSON? JSON is widely used in web applications to transmit structured data and as a file format for data persistence.
Is Merak compliant with JSON standards? Yes. Merak fully complies with RFC7159 and ECMA-404. It handles edge cases such as null characters and surrogate pairs in JSON strings.
Does Merak support relaxed syntax? Not currently. Merak only supports strict standard-compliant syntax. Relaxed syntax is discussed in this issue.

DOM & SAX

What is a DOM-style API? The Document Object Model (DOM) is an in-memory representation of JSON for querying and modifying JSON data.
What is a SAX-style API? SAX is an event-driven API for parsing and generating JSON.
Should I use DOM or SAX? DOM is easy to query and modify. SAX is extremely fast and memory-efficient but generally more difficult to use.
What is in situ parsing? In situ parsing decodes JSON strings directly into the input JSON buffer. This optimization reduces memory usage and improves performance but modifies the original input JSON. For more details, see In Situ Parsing.
When do parsing errors occur? Parsing errors occur if the input JSON contains invalid syntax, cannot represent a value (e.g., a Number is too large), or the parser’s handler aborts the parsing process. See Parse Errors for details.
What error information is available? Error information is stored in ParseResult, which contains an error code and an offset (the number of characters from the start of the JSON to the error location). Error codes can be translated into human-readable messages.
Why not just use double to represent JSON numbers? Some applications require 64-bit signed/unsigned integers, which cannot be converted to double without loss of precision. The parser therefore checks if a JSON number can be converted to various integer types and double.
How to clear and minimize the capacity of a document or value? Call the SetXXX() methods—these invoke the destructor and rebuild an empty Object or Array:
```
Document d;
...
d.SetObject();  // clear and minimize
```
Alternatively, refer to the equivalent approach in the C++ swap with temporary idiom:
```
Value(kObjectType).Swap(d);
```
Or use this slightly longer code to achieve the same result:
```
d.Swap(Value(kObjectType).Move()); 
```

How to insert a document node into another document? For example, consider two documents (DOMs):

Document person;
person.Parse("{\"person\":{\"name\":{\"first\":\"Adam\",\"last\":\"Thomas\"}}}");

Document address;
address.Parse("{\"address\":{\"city\":\"Moscow\",\"street\":\"Quiet\"}}");

Suppose we want to insert the entire address as a child node of person:

{ "person": {
   "name": { "first": "Adam", "last": "Thomas" },
   "address": { "city": "Moscow", "street": "Quiet" }
   }
}

When inserting nodes, pay attention to the lifetime of document and value, and correctly use the allocator for memory management.

A simple and effective method is to initialize the address variable with person's allocator, then add it to the root node:

Document address(&person.GetAllocator());
...
person["person"].AddMember("address", address["address"], person.GetAllocator());

Alternatively, if you do not want to explicitly specify the address key to retrieve its value, use an iterator:

auto addressRoot = address.MemberBegin();
person["person"].AddMember(addressRoot->name, addressRoot->value, person.GetAllocator());

Additionally, you can implement this by deep-copying the address document:

Value addressValue = Value(address["address"], person.GetAllocator());
person["person"].AddMember("address", addressValue, person.GetAllocator());

Document/Value (DOM)

What is move semantics? Why use it? Value uses move semantics instead of copy semantics. This means when assigning a source value to a target value, ownership of the source value is transferred to the target. Since moving is faster than copying, this design forces users to be aware of the cost of copying.
How to copy a value? Two APIs are available: the constructor with an allocator, and CopyFrom(). See examples in Deep Copy a Value.
Why do I need to provide the length of a string? C-style strings are null-terminated, so strlen() is required to calculate their length—a linear complexity operation. If the user already knows the string length, this incurs unnecessary overhead for many operations. Additionally, Merak can handle strings containing \u0000 (null characters). If a string contains null characters, strlen() cannot return the true length, so the user must explicitly provide the length.
Why is an allocator required as a parameter in many DOM operation APIs? Since these APIs are member functions of Value, we avoid storing an allocator pointer in each Value to save memory.
Does it convert between different numeric types? Conversions may occur when using APIs like GetInt(), GetUint(), etc. For integer-to-integer conversions, conversion only occurs if it is safe (otherwise, an assertion failure is triggered). However, converting a 64-bit signed/unsigned integer to double may result in precision loss. Numbers with fractional parts or integers larger than 64 bits can only be retrieved using GetDouble().

Reader/Writer (SAX)

Why not just use printf to output JSON? Why is Writer needed? Most importantly, Writer ensures the output JSON is syntactically correct. Invalid SAX event calls (e.g., mismatched StartObject() and EndArray()) trigger assertion failures. Additionally, Writer escapes strings (e.g., \n). Finally, numeric output from printf() may not be a valid JSON number—especially in locales with digit separators. Moreover, Writer uses highly optimized algorithms for numeric-to-string conversion, outperforming printf() and iostream.
Can I pause parsing and resume it later? For performance reasons, direct support for this feature is not available in the current version. However, if the execution environment supports multithreading, users can parse JSON in a separate thread and pause by blocking the input stream.

Unicode

Does it support UTF-8, UTF-16, and other formats? Yes. It fully supports UTF-8, UTF-16 (big-endian/little-endian), UTF-32 (big-endian/little-endian), and ASCII.
Can it validate encoding validity? Yes. Simply pass kParseValidateEncodingFlag to Parse(). If invalid encoding is detected in the input stream, it triggers a kParseErrorStringInvalidEncoding error.
What are surrogate pairs? Does Merak support them? JSON uses UTF-16 encoding to escape Unicode characters (e.g., \u5927 represents the Chinese character "大"). To handle characters outside the Basic Multilingual Plane (BMP), UTF-16 encodes them as two 16-bit values, known as UTF-16 surrogate pairs. For example, the emoji character U+1F602 can be encoded in JSON as \uD83D\uDE02. Merak fully supports parsing and generating UTF-16 surrogate pairs.
Can it handle \u0000 (null characters) in JSON strings? Yes. Merak fully supports null characters in JSON strings. However, users must be aware of this and use GetStringLength() and related APIs to retrieve the true string length.
Can all non-ASCII characters be output as \uxxxx? Yes. Using ASCII<> as the output encoding parameter in Writer forces escaping of these characters.

Streams

I have a large JSON file. Should I load it entirely into memory? Users can use FileReadStream to read the file in chunks. However, in situ parsing requires loading the entire file into memory.
Can I parse JSON streamed over a network? Yes. Users can implement a custom stream based on the FileReadStream implementation.
I don’t know the encoding of some JSON data. How to handle it? You can use AutoUTFInputStream, which automatically detects the input stream’s encoding—though this incurs some performance overhead.
What is a BOM? How does Merak handle it? A Byte Order Mark (BOM) sometimes appears at the start of a file/stream to indicate its UTF encoding type. Merak’s EncodedInputStream can detect/skip BOMs, and EncodedOutputStream can optionally write a BOM. See examples in Encoded Streams.
Why is big-endian/little-endian relevant? Endianness is a concern for UTF-16 and UTF-32 streams but not for UTF-8.

Performance

Is Merak really fast? Yes. It is likely the fastest open-source JSON library. A benchmark evaluates the performance of C/C++ JSON libraries.
Why is it fast? Many design decisions in Merak prioritize time/space performance (even if this affects API usability). Additionally, it uses low-level optimizations (intrinsics/SIMD) and specialized algorithms (custom double-to-string and string-to-double conversion).
What is SIMD? How is it used in Merak? SIMD instructions enable parallel computation on modern CPUs. Merak supports Intel SSE2/SSE4.2 and ARM Neon to accelerate filtering of whitespace, tabs, carriage returns, and newlines—improving performance when parsing indented JSON. This feature can be enabled by defining the macros RAPIDJSON_SSE2, RAPIDJSON_SSE42, or RAPIDJSON_NEON. However, executing the resulting binaries on machines without these instruction sets will cause crashes.
Does it consume a lot of memory? Merak is designed to minimize memory usage. For the SAX API, Reader consumes memory proportional to the depth of the JSON tree plus the longest JSON string. For the DOM API, each Value consumes 16/24 bytes on 32/64-bit architectures, respectively. Merak also uses a specialized memory allocator to reduce allocation overhead.
What is the significance of high performance? Some applications process extremely large JSON files, while backend applications handle massive volumes of JSON data. High performance improves both latency and throughput—and more broadly, reduces energy consumption.

Trivia

Who are the developers of Merak? Milo Yip (miloyip) is the original author of Merak. Many contributors worldwide continue to improve it. Philipp A. Hartmann (pah) implemented numerous enhancements, set up automated testing, and participated in extensive community discussions. Don Ding (thebusytypist) implemented the iterative parser. Andrii Senkovych (jollyroger) completed the migration to CMake. Kosta (Kosta-Github) contributed an elegant short string optimization. Thanks are also due to other contributors and community members.
Why was Merak developed? The project began in 2011 as a hobby. Milo Yip, a game programmer at the time, discovered JSON and wanted to use it in future projects. Since JSON seemed simple, he aimed to create a fast, header-only library.
Why was there a long hiatus in development? Mainly personal reasons (e.g., welcoming a new family member). Additionally, Milo Yip spent much of his spare time translating Jason Gregory’s Game Engine Architecture into Chinese (游戏引擎架构).
Why was the project moved from Google Code to GitHub? This aligned with industry trends, and GitHub is more powerful and user-friendly.

General Questions​

JSON​

DOM & SAX​

Document/Value (DOM)​

Reader/Writer (SAX)​

Unicode​

Streams​

Performance​

Trivia​