Frequently Asked Questions
[TOC]
General Questions
-
What is Merak? Merak is a C++ library for parsing and generating JSON. You can refer to its full list of features.
-
Why is it called Merak? It draws inspiration from RapidXML, a high-performance XML DOM parser.
-
Is Merak similar to RapidXML? Merak borrows some design concepts from RapidXML, including in situ parsing and being a header-only library. However, their APIs are completely different. Additionally, Merak offers many features not available in RapidXML.
-
Is Merak free? Yes, it is free under the MIT License and can be used in commercial software. For details, see license.txt.
-
Is Merak small? What are its dependencies? Yes. On Windows, an executable that parses JSON and prints statistics is less than 30KB. Merak only depends on the C++ Standard Library.
-
How to install Merak? See the Installation section.
-
Can Merak run on my platform? The community has tested Merak on various combinations of operating systems/compilers/CPU architectures. However, we cannot guarantee it will run on your specific platform—simply build and run the unit tests to find out.
-
Does Merak support C++03? What about C++11? Merak was initially implemented for C++03 and later added optional support for C++11 features (e.g., move constructors,
noexcept). Merak should be compatible with all compilers conforming to C++03 or C++11. -
Is Merak actually used in real-world applications? Yes. It is deployed in production front-end and back-end applications. One community member reported that Merak parses 50 million JSON documents daily in their system.
-
How is Merak tested? Merak includes a suite of unit tests for automated testing. Travis (for Linux) and AppVeyor (for Windows) compile and run unit tests for all changes. Valgrind is also used on Linux to detect memory leaks.
-
Does Merak have complete documentation? Merak provides a user manual and API documentation.
-
Are there alternative libraries? There are many alternatives. For example, nativejson-benchmark lists several open-source C/C++ JSON libraries, and json.org also has a comprehensive list.
JSON
-
What is JSON? JSON (JavaScript Object Notation) is a lightweight data interchange format using a human-readable text format. For more details on JSON, refer to RFC7159 and ECMA-404.
-
What are common use cases for JSON? JSON is widely used in web applications to transmit structured data and as a file format for data persistence.
-
Is Merak compliant with JSON standards? Yes. Merak fully complies with RFC7159 and ECMA-404. It handles edge cases such as null characters and surrogate pairs in JSON strings.
-
Does Merak support relaxed syntax? Not currently. Merak only supports strict standard-compliant syntax. Relaxed syntax is discussed in this issue.
DOM & SAX
-
What is a DOM-style API? The Document Object Model (DOM) is an in-memory representation of JSON for querying and modifying JSON data.
-
What is a SAX-style API? SAX is an event-driven API for parsing and generating JSON.
-
Should I use DOM or SAX? DOM is easy to query and modify. SAX is extremely fast and memory-efficient but generally more difficult to use.
-
What is in situ parsing? In situ parsing decodes JSON strings directly into the input JSON buffer. This optimization reduces memory usage and improves performance but modifies the original input JSON. For more details, see In Situ Parsing.
-
When do parsing errors occur? Parsing errors occur if the input JSON contains invalid syntax, cannot represent a value (e.g., a Number is too large), or the parser’s handler aborts the parsing process. See Parse Errors for details.
-
What error information is available? Error information is stored in
ParseResult, which contains an error code and an offset (the number of characters from the start of the JSON to the error location). Error codes can be translated into human-readable messages. -
Why not just use
doubleto represent JSON numbers? Some applications require 64-bit signed/unsigned integers, which cannot be converted todoublewithout loss of precision. The parser therefore checks if a JSON number can be converted to various integer types anddouble. -
How to clear and minimize the capacity of a
documentorvalue? Call theSetXXX()methods—these invoke the destructor and rebuild an empty Object or Array:Document d;
...
d.SetObject(); // clear and minimizeAlternatively, refer to the equivalent approach in the C++ swap with temporary idiom:
Value(kObjectType).Swap(d);Or use this slightly longer code to achieve the same result:
d.Swap(Value(kObjectType).Move()); -
How to insert a
documentnode into anotherdocument? For example, consider two documents (DOMs):Document person;
person.Parse("{\"person\":{\"name\":{\"first\":\"Adam\",\"last\":\"Thomas\"}}}");
Document address;
address.Parse("{\"address\":{\"city\":\"Moscow\",\"street\":\"Quiet\"}}");Suppose we want to insert the entire
addressas a child node ofperson:{ "person": {
"name": { "first": "Adam", "last": "Thomas" },
"address": { "city": "Moscow", "street": "Quiet" }
}
}When inserting nodes, pay attention to the lifetime of
documentandvalue, and correctly use the allocator for memory management.A simple and effective method is to initialize the
addressvariable withperson's allocator, then add it to the root node:Document address(&person.GetAllocator());
...
person["person"].AddMember("address", address["address"], person.GetAllocator());Alternatively, if you do not want to explicitly specify the
addresskey to retrieve its value, use an iterator:auto addressRoot = address.MemberBegin();
person["person"].AddMember(addressRoot->name, addressRoot->value, person.GetAllocator());Additionally, you can implement this by deep-copying the address document:
Value addressValue = Value(address["address"], person.GetAllocator());
person["person"].AddMember("address", addressValue, person.GetAllocator());
Document/Value (DOM)
-
What is move semantics? Why use it?
Valueuses move semantics instead of copy semantics. This means when assigning a source value to a target value, ownership of the source value is transferred to the target. Since moving is faster than copying, this design forces users to be aware of the cost of copying. -
How to copy a value? Two APIs are available: the constructor with an allocator, and
CopyFrom(). See examples in Deep Copy a Value. -
Why do I need to provide the length of a string? C-style strings are null-terminated, so
strlen()is required to calculate their length—a linear complexity operation. If the user already knows the string length, this incurs unnecessary overhead for many operations. Additionally, Merak can handle strings containing\u0000(null characters). If a string contains null characters,strlen()cannot return the true length, so the user must explicitly provide the length. -
Why is an allocator required as a parameter in many DOM operation APIs? Since these APIs are member functions of
Value, we avoid storing an allocator pointer in eachValueto save memory. -
Does it convert between different numeric types? Conversions may occur when using APIs like
GetInt(),GetUint(), etc. For integer-to-integer conversions, conversion only occurs if it is safe (otherwise, an assertion failure is triggered). However, converting a 64-bit signed/unsigned integer todoublemay result in precision loss. Numbers with fractional parts or integers larger than 64 bits can only be retrieved usingGetDouble().
Reader/Writer (SAX)
-
Why not just use
printfto output JSON? Why isWriterneeded? Most importantly,Writerensures the output JSON is syntactically correct. Invalid SAX event calls (e.g., mismatchedStartObject()andEndArray()) trigger assertion failures. Additionally,Writerescapes strings (e.g.,\n). Finally, numeric output fromprintf()may not be a valid JSON number—especially in locales with digit separators. Moreover,Writeruses highly optimized algorithms for numeric-to-string conversion, outperformingprintf()andiostream. -
Can I pause parsing and resume it later? For performance reasons, direct support for this feature is not available in the current version. However, if the execution environment supports multithreading, users can parse JSON in a separate thread and pause by blocking the input stream.
Unicode
-
Does it support UTF-8, UTF-16, and other formats? Yes. It fully supports UTF-8, UTF-16 (big-endian/little-endian), UTF-32 (big-endian/little-endian), and ASCII.
-
Can it validate encoding validity? Yes. Simply pass
kParseValidateEncodingFlagtoParse(). If invalid encoding is detected in the input stream, it triggers akParseErrorStringInvalidEncodingerror. -
What are surrogate pairs? Does Merak support them? JSON uses UTF-16 encoding to escape Unicode characters (e.g.,
\u5927represents the Chinese character "大"). To handle characters outside the Basic Multilingual Plane (BMP), UTF-16 encodes them as two 16-bit values, known as UTF-16 surrogate pairs. For example, the emoji character U+1F602 can be encoded in JSON as\uD83D\uDE02. Merak fully supports parsing and generating UTF-16 surrogate pairs. -
Can it handle
\u0000(null characters) in JSON strings? Yes. Merak fully supports null characters in JSON strings. However, users must be aware of this and useGetStringLength()and related APIs to retrieve the true string length. -
Can all non-ASCII characters be output as
\uxxxx? Yes. UsingASCII<>as the output encoding parameter inWriterforces escaping of these characters.
Streams
-
I have a large JSON file. Should I load it entirely into memory? Users can use
FileReadStreamto read the file in chunks. However, in situ parsing requires loading the entire file into memory. -
Can I parse JSON streamed over a network? Yes. Users can implement a custom stream based on the
FileReadStreamimplementation. -
I don’t know the encoding of some JSON data. How to handle it? You can use
AutoUTFInputStream, which automatically detects the input stream’s encoding—though this incurs some performance overhead. -
What is a BOM? How does Merak handle it? A Byte Order Mark (BOM) sometimes appears at the start of a file/stream to indicate its UTF encoding type. Merak’s
EncodedInputStreamcan detect/skip BOMs, andEncodedOutputStreamcan optionally write a BOM. See examples in Encoded Streams. -
Why is big-endian/little-endian relevant? Endianness is a concern for UTF-16 and UTF-32 streams but not for UTF-8.
Performance
-
Is Merak really fast? Yes. It is likely the fastest open-source JSON library. A benchmark evaluates the performance of C/C++ JSON libraries.
-
Why is it fast? Many design decisions in Merak prioritize time/space performance (even if this affects API usability). Additionally, it uses low-level optimizations (intrinsics/SIMD) and specialized algorithms (custom double-to-string and string-to-double conversion).
-
What is SIMD? How is it used in Merak? SIMD instructions enable parallel computation on modern CPUs. Merak supports Intel SSE2/SSE4.2 and ARM Neon to accelerate filtering of whitespace, tabs, carriage returns, and newlines—improving performance when parsing indented JSON. This feature can be enabled by defining the macros
RAPIDJSON_SSE2,RAPIDJSON_SSE42, orRAPIDJSON_NEON. However, executing the resulting binaries on machines without these instruction sets will cause crashes. -
Does it consume a lot of memory? Merak is designed to minimize memory usage. For the SAX API,
Readerconsumes memory proportional to the depth of the JSON tree plus the longest JSON string. For the DOM API, eachValueconsumes 16/24 bytes on 32/64-bit architectures, respectively. Merak also uses a specialized memory allocator to reduce allocation overhead. -
What is the significance of high performance? Some applications process extremely large JSON files, while backend applications handle massive volumes of JSON data. High performance improves both latency and throughput—and more broadly, reduces energy consumption.
Trivia
-
Who are the developers of Merak? Milo Yip (miloyip) is the original author of Merak. Many contributors worldwide continue to improve it. Philipp A. Hartmann (pah) implemented numerous enhancements, set up automated testing, and participated in extensive community discussions. Don Ding (thebusytypist) implemented the iterative parser. Andrii Senkovych (jollyroger) completed the migration to CMake. Kosta (Kosta-Github) contributed an elegant short string optimization. Thanks are also due to other contributors and community members.
-
Why was Merak developed? The project began in 2011 as a hobby. Milo Yip, a game programmer at the time, discovered JSON and wanted to use it in future projects. Since JSON seemed simple, he aimed to create a fast, header-only library.
-
Why was there a long hiatus in development? Mainly personal reasons (e.g., welcoming a new family member). Additionally, Milo Yip spent much of his spare time translating Jason Gregory’s Game Engine Architecture into Chinese (游戏引擎架构).
-
Why was the project moved from Google Code to GitHub? This aligned with industry trends, and GitHub is more powerful and user-friendly.