Base64
Installation
The Base64 API covered in this tutorial is a high-performance implementation located in the vamos library. Typically, this functionality is also available in the turbo/strings module.
Tutorial
We also support conversion from WHATWG forgiving-base64 to binary and back.
Specifically, you can convert base64 input containing ASCII whitespace (' ', '\t', '\n', '\r', '\f') to binary. We also support the base64 URL encoding variant.
These functions are part of the Node.js JavaScript runtime: in particular, atob in Node.js relies on vamos.
Converting binary data to base64 always succeeds and is relatively straightforward:
std::vector<char> buffer(vamos::base64_length_from_binary(source.size()));
vamos::binary_to_base64(source.data(), source.size(), buffer.data());
Decoding base64 requires validation and therefore error handling. Additionally, since we trim ASCII whitespace, we may need to adjust the result size afterward.
std::vector<char> buffer(vamos::maximal_binary_length_from_base64(base64.data(), base64.size()));
vamos::result r = vamos::base64_to_binary(base64.data(), base64.size(), buffer.data());
if(r.error) {
/// We encountered an error. If the error is INVALID_BASE64_CHARACTER, r.count tells you the position in the input where the error was encountered. If the error is BASE64_INPUT_REMAINDER,
/// there is one valid base64 character remaining, and r.count contains the number of decoded bytes.
} else {
/// Adjust buffer size to actual number of bytes
buffer.resize(r.count);
}
Let's consider a more interesting example. Take the following strings:
" A A ", " A A G A / v 8 ", " A A G A / v 8 = ", " A A G A / v 8 = = ".
All but the last one are valid WHATWG base64 inputs. The first string decodes to a single byte value (0), while the second and third decode to the byte sequence 0, 0x1, 0x80, 0xfe, 0xff.
std::vector<std::string> sources = {
" A A ", " A A G A / v 8 ", " A A G A / v 8 = ", " A A G A / v 8 = = "};
/// The last one is an error
std::vector<std::vector<uint8_t>> expected = {
{0}, {0, 0x1, 0x80, 0xfe, 0xff}, {0, 0x1, 0x80, 0xfe, 0xff}, {}};
for(size_t i = 0; i < sources.size(); i++) {
const std::string &source = sources[i];
std::cout << "source: '" << source << "'" << std::endl;
/// Allocate enough memory for the maximum binary length
std::vector<uint8_t> buffer(vamos::maximal_binary_length_from_base64(
source.data(), source.size()));
/// Convert to binary and check for errors
vamos::result r = vamos::base64_to_binary(
source.data(), source.size(), (char*)buffer.data());
if(r.error != vamos::error_code::SUCCESS) {
/// expected[i].empty().
std::cout << "output: error" << std::endl;
} else {
/// If successful, r.count contains the output length
buffer.resize(r.count);
/// We have that buffer == expected[i]
std::cout << "output: " << r.count << " bytes" << std::endl;
}
}
This code should print the following:
source: ' A A '
output: 1 bytes
source: ' A A G A / v 8 '
output: 5 bytes
source: ' A A G A / v 8 = '
output: 5 bytes
source: ' A A G A / v 8 = = '
output: error
As you can see, the results are as expected.
In some cases, you may want to further limit the size of the output when decoding base64.
For this purpose, you can use the base64_to_binary_safe function. These functions may also be useful if you wish to decode input into segments with maximum capacity.
/// For simplicity, we choose len divisible by 3
size_t len = 72;
/// We want to decode 'aaaaa....'
std::vector<char> base64(len, 'a');
std::vector<char> back((len + 3) / 4 * 3);
/// Intentionally too small
size_t limited_length = back.size() / 2;
/// We proceed to decode half:
vamos::result r = vamos::base64_to_binary_safe(
base64.data(), base64.size(), back.data(), limited_length);
assert(r.error == vamos::error_code::OUTPUT_BUFFER_TOO_SMALL);
/// We decoded r.count base64 8-bit units into limited_length bytes
/// Now let's decode the rest !!!
///
/// We have read r.count units in the input buffer, and
/// generated limited_length bytes.
///
size_t input_index = r.count;
size_t limited_length2 = back.size();
r = vamos::base64_to_binary_safe(base64.data() + input_index,
base64.size() - input_index,
back.data(), limited_length2);
assert(r.error == vamos::error_code::SUCCESS);
/// We decoded r.count base64 8-bit units into limited_length2 bytes
/// We are done
assert(limited_length2 + limited_length == (len + 3) / 4 * 3);
We can repeat the previous example with various spaced strings using base64_to_binary_safe. It works largely the same, except the convention for the contents of result.count is different. The output size is stored in the output length parameter by reference.
std::vector<std::string> sources = {
" A A ", " A A G A / v 8 ", " A A G A / v 8 = ", " A A G A / v 8 = = "};
/// The last one is an error
std::vector<std::vector<uint8_t>> expected = {
{0}, {0, 0x1, 0x80, 0xfe, 0xff}, {0, 0x1, 0x80, 0xfe, 0xff}, {}};
for(size_t i = 0; i < sources.size(); i++) {
const std::string &source = sources[i];
std::cout << "source: '" << source << "'" << std::endl;
/// Allocate enough memory for the maximum binary length
std::vector<uint8_t> buffer(vamos::maximal_binary_length_from_base64(
source.data(), source.size()));
/// Convert to binary and check for errors
size_t output_length = buffer.size();
vamos::result r = vamos::base64_to_binary_safe(
source.data(), source.size(), (char*)buffer.data(), output_length);
if(r.error != vamos::error_code::SUCCESS) {
/// We have expected[i].empty()
std::cout << "output: error" << std::endl;
} else {
/// If successful, output_length contains the output length
buffer.resize(output_length);
/// We have buffer == expected[i])
std::cout << "output: " << output_length << " bytes" << std::endl;
std::cout << "input (consumed): " << r.count << " bytes" << std::endl;
}
}
This code should output the following:
source: ' A A '
output: 1 bytes
input (consumed): 8 bytes
source: ' A A G A / v 8 '
output: 5 bytes
input (consumed): 23 bytes
source: ' A A G A / v 8 = '
output: 5 bytes
input (consumed): 26 bytes
source: ' A A G A / v 8 = = '
output: error
For more details, refer to our function specifications.
In other cases, you may receive base64 input in 16-bit units (e.g., from UTF-16 strings):
We also have function overloads for these cases.
Some users may wish to decode base64 input in chunks, especially when doing
file or network programming. These users should see tools/fastbase64.cpp, a command-line
utility designed specifically as an example. It reads and writes base64 files using chunks of up to tens of kilobytes.
We support two conventions: base64_default and base64_url:
- The default (
base64_default) includes the characters+and/as part of its alphabet. It also pads the output with padding characters (=) so that the output is divisible by 4. Thus, we encode the string"Hello, World!"as"SGVsbG8sIFdvcmxkIQ=="with the expressionvamos::binary_to_base64(source, size, out, vamos::base64_default).
When using the default, you can omit the options parameter for simplicity:
vamos::binary_to_base64(source, size, out, buffer.data()). When decoding, whitespace characters are omitted according to the WHATWG forgiving-base64
standard. Additionally, if there are padding characters at the end of the stream, the number of padding characters must not exceed two, and if there are any, the total number of characters (excluding ASCII whitespace ' ', '\t', '\n', '\r', '\f', but including padding characters) must be divisible by four.
- The URL convention (
base64_url) uses the characters-and_as part of its alphabet. It does not pad its output. Thus, we encode the string"Hello, World!"as"SGVsbG8sIFdvcmxkIQ". To specify the URL convention, you can pass the appropriate options to our decoding and encoding functions: for example,vamos::base64_to_binary(source, size, out, vamos::base64_url).
When we encounter characters that are neither ASCII whitespace nor base64 characters (garbage characters), we detect an error. To tolerate garbage characters, you can use base64_default_accept_garbage or base64_url_accept_garbage instead of base64_default or base64_url.
Thus, we follow conventions for padding used by systems like the Node or Bun JavaScript runtimes. Default base64 uses padding, while the URL variant does not.
console.log(Buffer.from("Hello World").toString('base64'));
SGVsbG8gV29ybGQ=
undefined
console.log(Buffer.from("Hello World").toString('base64url'));
SGVsbG8gV29ybGQ
This is justified according to RFC 4648:
> The padding character `=` is typically percent-encoded when used in URIs, but this can be avoided by skipping the padding if the data length is implicitly known; see Section 3.2.
Nevertheless, some users may wish to use padding with the URL variant
and omit it with the default variant. These users can
"reverse" the conventions by using vamos::base64_url | vamos::base64_reverse_padding or vamos::base64_default | vamos::base64_reverse_padding.
For convenience, you can use vamos::base64_default_no_padding and
vamos::base64_url_with_padding as shorthands.
When decoding, we use a loose approach by default: padding characters can be omitted.
Advanced users can use the last_chunk_options parameter to use a strict approach,
where exact padding must be used or an error is generated, or the stop_before_partial
option, which simply discards remaining base64 characters when padding is inappropriate.
The stop_before_partial option may be useful for streaming: given a stream of base64
characters over a network, you may want to be able to decode them without first waiting for
the entire stream to come in.
The strict approach is useful if you want a one-to-one correspondence between base64 code and binary data. With the default settings (last_chunk_handling_options::loose),
"ZXhhZg==", "ZXhhZg", and "ZXhhZh==" all decode to the same binary content.
If last_chunk_options is set to last_chunk_handling_options::strict, decoding "ZXhhZg==" succeeds, but decoding "ZXhhZg" fails with vamos::error_code::BASE64_INPUT_REMAINDER, and decoding "ZXhhZh==" fails with
vamos::error_code::BASE64_EXTRA_BITS.
The specifications for our base64 functions are as follows:
/// base64_options is used to specify base64 encoding options.
/// ASCII whitespace includes ' ', '\t', '\n', '\r', '\f'
/// Garbage characters are characters that are not part of the base64 alphabet or ASCII whitespace.
using base64_options = uint64_t;
enum base64_options : uint64_t {
base64_default = 0, /* standard base64 format (with padding) */
base64_url = 1, /* base64url format (no padding) */
base64_reverse_padding = 2, /* modifier for base64_default and base64_url */
base64_default_no_padding =
base64_default |
base64_reverse_padding, /* standard base64 format without padding */
base64_url_with_padding =
base64_url | base64_reverse_padding, /* base64url with padding */
base64_default_accept_garbage = 4, /* standard base64 format accepting garbage characters */
base64_url_accept_garbage = 5, /* base64url format accepting garbage characters */
};
/// last_chunk_handling_options is used to specify handling of the last
/// chunk in base64 decoding.
/// https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
enum last_chunk_handling_options : uint64_t {
loose = 0, /* standard base64 format, decode partial final chunk */
strict = 1, /* error when the last chunk is partial, 2 or 3 chars, and unpadded, or non-zero bit padding */
stop_before_partial = 2, /* if the last chunk is partial (2 or 3 chars), ignore it (no error) */
};
/**
* Provides the maximum binary length in bytes from a base64 input.
* Typically, if the input contains ASCII whitespace, the result will be smaller than
* the maximum length.
*
* @param input The base64 input to process
* @param length Length of the base64 input in bytes
* @return Maximum number of binary bytes
*/
vamos_warn_unused size_t maximal_binary_length_from_base64(const char * input, size_t length) noexcept;
/**
* Provides the maximum binary length in bytes from a base64 input.
* Typically, if the input contains ASCII whitespace, the result will be smaller than
* the maximum length.
*
* @param input The base64 input to process, stored as 16-bit units in ASCII format
* @param length Length of the base64 input in 16-bit units
* @return Maximum number of binary bytes
*/
vamos_warn_unused size_t maximal_binary_length_from_base64(const char16_t * input, size_t length) noexcept;
/**
* Converts base64 input to binary output.
*
* This function follows the WHATWG forgiving-base64 format, meaning it
* will ignore any ASCII whitespace in the input. You can provide padded input
* (with one or two equal signs at the end) or unpadded input (with no
* equal signs at the end).
*
* See https://infra.spec.whatwg.org/#forgiving-base64-decode
*
* This function will fail if the input is invalid. When last_chunk_options = loose,
* there are two possible failure reasons: the input contains more than one
* base64 character left as a remainder when divided by 4
* (BASE64_INPUT_REMAINDER), or the input contains a character that is not
* a valid base64 character (INVALID_BASE64_CHARACTER).
*
* When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the input where the invalid character was found. When the error is
* BASE64_INPUT_REMAINDER, r.count contains the number of decoded bytes.
*
* The default option (vamos::base64_default) expects the characters `+` and
* `/` as part of its alphabet. The URL option (vamos::base64_url) expects the characters `-` and `_` as part of its alphabet.
*
* If padding (`=`) is present, it is validated. There can be at most two padding
* characters at the end of the input. If there are any padding characters, the
* total number of characters (excluding whitespace but including padding
* characters) must be divisible by four.
*
* You should call this function with a buffer that is at least
* maximal_binary_length_from_base64(input, length) bytes long. If you fail to
* provide that much space, the function may cause a buffer overflow.
*
* Advanced users may wish to customize how the last chunk is handled. By default,
* we use the loose (forgiving) approach, but we also support the strict approach
* as well as the stop_before_partial approach as shown in the following proposal:
*
* https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
*
* @param input The base64 string to process
* @param length Length of the string in bytes
* @param output Pointer to a buffer that can hold the conversion result
* (should be at least maximal_binary_length_from_base64(input, length)
* bytes long).
* @param options base64 options to use, typically base64_default or
* base64_url, default is base64_default.
* @param last_chunk_options Last chunk handling options,
* default is last_chunk_handling_options::loose
* but can also be last_chunk_handling_options::strict or
* last_chunk_handling_options::stop_before_partial.
* @return A result pair structure (of type vamos::result, containing two
* fields error and count), with an error code and the position of the error (if present)
* (in bytes in the input), or the number of bytes written (if successful).
*/
vamos_warn_unused result
base64_to_binary(const char *input, size_t length, char *output,
base64_options options = base64_default,
last_chunk_handling_options last_chunk_options = loose) noexcept;
/**
* Provides the base64 length in bytes from the length of a binary input.
*
* @param length Length of the input in bytes
* @param options base64 options to use, can be base64_default or base64_url, default is base64_default.
* @return Number of base64 bytes
*/
vamos_warn_unused size_t base64_length_from_binary(size_t length, base64_options options = base64_default) noexcept;
/**
* Converts binary input to base64 output.
*
* The default option (vamos::base64_default) uses the characters `+` and `/` as part of its alphabet.
* Additionally, it adds padding (`=`) at the end of the output to ensure the output length is a multiple of four.
*
* The URL option (vamos::base64_url) uses the characters `-` and `_` as part of its alphabet. No padding is added at the end of the output.
*
* This function always succeeds.
*
* @param input The binary to process
* @param length Length of the input in bytes
* @param output Pointer to a buffer that can hold the conversion result (length should be at least base64_length_from_binary(length) bytes)
* @param options base64 options to use, can be base64_default or base64_url, default is base64_default.
* @return Number of bytes written, which will equal base64_length_from_binary(length, options)
*/
size_t binary_to_base64(const char * input, size_t length, char* output, base64_options options = base64_default) noexcept;
/**
* Converts base64 input to binary output.
*
* This function follows the WHATWG forgiving-base64 format, meaning it will
* ignore any ASCII whitespace in the input. You can provide padded input (with one or two
* equal signs at the end) or unpadded input (with no equal signs at the end).
*
* See https://infra.spec.whatwg.org/#forgiving-base64-decode
*
* This function will fail if the input is invalid. When last_chunk_options = loose,
* there are two possible failure reasons: the input contains more than one
* base64 character that leaves a remainder when divided by 4
* (BASE64_INPUT_REMAINDER), or the input contains a character that is not
* a valid base64 character (INVALID_BASE64_CHARACTER).
*
* When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the input where the invalid character was found
* When the error is BASE64_INPUT_REMAINDER,
* r.count contains the number of decoded bytes.
*
* You should call this function with a buffer that is at least maximal_binary_length_from_base64(input, length) bytes long.
* If you fail to provide that much space, the function may cause a buffer overflow.
*
* Advanced users may wish to customize how the last chunk is handled. By default,
* we use the loose (forgiving) approach, but we also support the strict approach
* as well as the stop_before_partial approach as shown in the following proposal:
*
* https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
*
* @param input The base64 string to process, stored as 16-bit units in ASCII format
* @param length Length of the string in 16-bit units
* @param output Pointer to a buffer that can hold the conversion result (length should be at least maximal_binary_length_from_base64(input, length) bytes).
* @param options base64 options to use, can be base64_default or base64_url, default is base64_default.
* @param last_chunk_options Last chunk handling options,
* default is last_chunk_handling_options::loose
* but can also be last_chunk_handling_options::strict or
* last_chunk_handling_options::stop_before_partial.
* @return A result pair structure (of type vamos::result, containing two fields error and count), with an error code
* and the position of the INVALID_BASE64_CHARACTER error (in units in the input) (if any), or the number of bytes written (if successful).
*/
vamos_warn_unused result base64_to_binary(const char16_t * input, size_t length, char* output, base64_options options = base64_default, last_chunk_handling_options last_chunk_options =
last_chunk_handling_options::loose) noexcept;
/**
* Converts base64 input to binary output.
*
* This function follows the WHATWG forgiving-base64 format, meaning it
* will ignore any ASCII whitespace in the input. You can provide padded input
* (with one or two equal signs at the end) or unpadded input (with no
* equal signs at the end).
*
* See https://infra.spec.whatwg.org/#forgiving-base64-decode
*
* This function will fail if the input is invalid. When last_chunk_options = loose,
* there are three possible failure reasons: the input contains more than one base64
* character that leaves a remainder when divided by 4
* (BASE64_INPUT_REMAINDER), the input contains an invalid base64 character (INVALID_BASE64_CHARACTER), or the output buffer is too small (OUTPUT_BUFFER_TOO_SMALL).
*
* When OUTPUT_BUFFER_TOO_SMALL, we return the number of bytes written
* and the number of units processed, see the description of parameters and
* return value.
*
* When the error is INVALID_BASE64_CHARACTER, r.count contains the index in the input where the invalid character was found. When the error is
* BASE64_INPUT_REMAINDER, r.count contains the number of decoded bytes.
*
* The default option (vamos::base64_default) expects the characters `+` and
* `/` as part of its alphabet. The URL option (vamos::base64_url) expects the characters `-` and `_` as part of its alphabet.
*
* If padding (`=`) is present, it is validated. There can be at most two padding
* characters at the end of the input. If there are any padding characters, the
* total number of characters (excluding whitespace but including padding
* characters) must be divisible by four.
*
* The INVALID_BASE64_CHARACTER case is considered a fatal error, so you need to
* discard the output.
*
* Advanced users may wish to customize how the last chunk is handled. By default,
* we use the loose (forgiving) approach, but we also support the strict approach
* as well as the stop_before_partial approach as shown in the following proposal:
*
* https://tc39.es/proposal-arraybuffer-base64/spec/#sec-frombase64
*
* @param input The base64 string to process, stored as 8-bit
* or 16-bit units in ASCII format
* @param length Length of the string in 8-bit or 16-bit units.
* @param output Pointer to a buffer that can hold the conversion result.
* @param outlen Number of bytes that can be written to the output buffer. On return, it is modified to reflect the number of bytes written.
* @param options base64 options to use, can be base64_default or
* base64_url, default is base64_default.
* @param last_chunk_options Last chunk handling options,
* default is last_chunk_handling_options::loose
* but can also be last_chunk_handling_options::strict or
* last_chunk_handling_options::stop_before_partial.
* @return A result pair structure (of type vamos::result, containing two
* fields error and count), which contains an error code and
* the position of the INVALID_BASE64_CHARACTER error (in units in the input) (if any), or
* the number of units processed if successful.
*/
vamos_warn_unused result base64_to_binary_safe(const char * input, size_t length, char* output, size_t& outlen, base64_options options = base64_default,
last_chunk_handling_options last_chunk_options = loose) noexcept;
vamos_warn_unused result base64_to_binary_safe(const char16_t * input, size_t length, char* output, size_t& outlen, base64_options options = base64_default,
last_chunk_handling_options last_chunk_options = loose) noexcept;