跳到主要内容

JSON 处理函数

JSON 提取函数

有两类提取函数,并各自对应操作符。操作符仅在字符串以 JSON 逻辑类型存储时可用。 这些函数与 JSON 标量函数支持相同的两种路径表示法。

函数别名操作符说明
json_exists(json, path)若给定路径在 json 中存在则返回 true,否则返回 false
json_extract(json, path)json_extract_path->json 的给定 path 提取 JSON。若 pathLIST,结果为 JSONLIST
json_extract_string(json, path)json_extract_path_text->>json 的给定 path 提取 VARCHAR。若 pathLIST,结果为 VARCHARLIST
json_value(json, path)json 的给定 path 提取 JSON。若该路径值不是标量,则返回 NULL

注意,JSON 提取用的 -> 操作符优先级较低(它也用于 lambda 函数)。因此在表达等值比较(=)等操作时,需要给 -> 表达式加括号。 例如:

SELECT ((JSON '{"field": 42}')->'field') = 42;

警告:Goose 的 JSON 数据类型使用从 0 开始的索引

示例:

CREATE TABLE example (j JSON);
INSERT INTO example VALUES
('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }');
SELECT json_extract(j, '$.family') FROM example;
"anatidae"
SELECT j->'$.family' FROM example;
"anatidae"
SELECT j->'$.species[0]' FROM example;
"duck"
SELECT j->'$.species[*]' FROM example;
["duck", "goose", "swan", null]
SELECT j->>'$.species[*]' FROM example;
[duck, goose, swan, null]
SELECT j->'$.species'->0 FROM example;
"duck"
SELECT j->'species'->['/0', '/1'] FROM example;
['"duck"', '"goose"']
SELECT json_extract_string(j, '$.family') FROM example;
anatidae
SELECT j->>'$.family' FROM example;
anatidae
SELECT j->>'$.species[0]' FROM example;
duck
SELECT j->'species'->>0 FROM example;
duck
SELECT j->'species'->>['/0', '/1'] FROM example;
[duck, goose]

注意,Goose 的 JSON 数据类型使用从 0 开始的索引

如果需要从同一个 JSON 中提取多个值,提取“路径列表”会更高效:

如下写法会导致 JSON 被解析两次: 因此查询更慢且占用更多内存:

SELECT
json_extract(j, 'family') AS family,
json_extract(j, 'species') AS species
FROM example;
familyspecies
"anatidae"["duck","goose","swan",null]

下面写法结果相同,但速度更快、内存更省:

WITH extracted AS (
SELECT json_extract(j, ['family', 'species']) AS extracted_list
FROM example
)
SELECT
extracted_list[1] AS family,
extracted_list[2] AS species
FROM extracted;

JSON 标量函数

以下 JSON 标量函数可用于获取已存储 JSON 值的信息。 除 json_valid(json) 外,其余 JSON 函数在输入非法 JSON 时都会报错。

我们支持两种 JSON 路径表示: JSON Pointer 与 JSONPath。

函数说明
json_array_length(json[, path])Return the number of elements in the JSON array json, or 0 if it is not a JSON array. If path is specified, return the number of elements in the JSON array at the given path. If path is a LIST, the result will be LIST of array lengths.
json_contains(json_haystack, json_needle)Returns true if json_needle is contained in json_haystack. Both parameters are of JSON type, but json_needle can also be a numeric value or a string, however the string must be wrapped in double quotes.
json_keys(json[, path])Returns the keys of json as a LIST of VARCHAR, if json is a JSON object. If path is specified, return the keys of the JSON object at the given path. If path is a LIST, the result will be LIST of LIST of VARCHAR.
json_structure(json)Return the structure of json. Defaults to JSON if the structure is inconsistent (e.g., incompatible types in an array).
json_type(json[, path])Return the type of the supplied json, which is one of ARRAY, BIGINT, BOOLEAN, DOUBLE, OBJECT, UBIGINT, VARCHAR and NULL. If path is specified, return the type of the element at the given path. If path is a LIST, the result will be LIST of types.
json_valid(json)Return whether json is valid JSON.
json(json)Parse and minify json.

JSON Pointer 语法使用 / 分隔每一层字段。 例如,要提取 duck 键对应数组的第一个元素:

SELECT json_extract('{"duck": [1, 2, 3]}', '/duck/0');
1

JSONPath 语法使用 . 分隔字段、[i] 访问数组元素,并且总是以 $ 开头。使用同一示例:

SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[0]');
1

注意,Goose 的 JSON 数据类型使用从 0 开始的索引

JSONPath 表达能力更强,还支持从列表尾部访问:

SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[#-1]');
3

JSONPath 也支持用双引号转义语法符号:

SELECT json_extract('{"duck.goose": [1, 2, 3]}', '$."duck.goose"[1]');
2

以下示例基于雁形目鸭科(anatidae)

CREATE TABLE example (j JSON);
INSERT INTO example VALUES
('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }');
SELECT json(j) FROM example;
{"family":"anatidae","species":["duck","goose","swan",null]}
SELECT j.family FROM example;
"anatidae"
SELECT j.species[0] FROM example;
"duck"
SELECT json_valid(j) FROM example;
true
SELECT json_valid('{');
false
SELECT json_array_length('["duck", "goose", "swan", null]');
4
SELECT json_array_length(j, 'species') FROM example;
4
SELECT json_array_length(j, '/species') FROM example;
4
SELECT json_array_length(j, '$.species') FROM example;
4
SELECT json_array_length(j, ['$.species']) FROM example;
[4]
SELECT json_type(j) FROM example;
OBJECT
SELECT json_keys(j) FROM example;
[family, species]
SELECT json_structure(j) FROM example;
{"family":"VARCHAR","species":["VARCHAR"]}
SELECT json_structure('["duck", {"family": "anatidae"}]');
["JSON"]
SELECT json_contains('{"key": "value"}', '"value"');
true
SELECT json_contains('{"key": 1}', '1');
true
SELECT json_contains('{"top_key": {"key": "value"}}', '{"key": "value"}');
true

JSON 聚合函数

JSON 聚合函数共有三种。

函数说明
json_group_array(any)返回包含聚合中所有 any 值的 JSON 数组。
json_group_object(key, value)返回包含聚合中所有 keyvalue 对的 JSON 对象。
json_group_structure(json)返回聚合中所有 json 的合并 json_structure

示例:

CREATE TABLE example1 (k VARCHAR, v INTEGER);
INSERT INTO example1 VALUES ('duck', 42), ('goose', 7);
SELECT json_group_array(v) FROM example1;
[42, 7]
SELECT json_group_object(k, v) FROM example1;
{"duck":42,"goose":7}
CREATE TABLE example2 (j JSON);
INSERT INTO example2 VALUES
('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'),
('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}');
SELECT json_group_structure(j) FROM example2;
{"family":"VARCHAR","species":["VARCHAR"],"coolness":"DOUBLE","hair":"BOOLEAN"}

将 JSON 转换为嵌套类型

很多场景下逐个从 JSON 提取值效率较低。 更好的做法是一次性“提取”全部值,把 JSON 转为 LISTSTRUCT 等嵌套类型。

函数说明
json_transform(json, structure)按指定 structure 转换 json
from_json(json, structure)json_transform 的别名。
json_transform_strict(json, structure)json_transform 相同,但类型转换失败时会抛错。
from_json_strict(json, structure)json_transform_strict 的别名。

structure 参数使用与 json_structure 返回值相同的 JSON 结构。 你可以修改该 structure,把 JSON 转成目标结构与类型。 既可以提取少于原始 JSON 的键值对,也可以提取更多:缺失键会填充为 NULL

示例:

CREATE TABLE example (j JSON);
INSERT INTO example VALUES
('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'),
('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}');
SELECT json_transform(j, '{"family": "VARCHAR", "coolness": "DOUBLE"}') FROM example;
{'family': anatidae, 'coolness': 42.420000}
{'family': canidae, 'coolness': NULL}
SELECT json_transform(j, '{"family": "TINYINT", "coolness": "DECIMAL(4, 2)"}') FROM example;
{'family': NULL, 'coolness': 42.42}
{'family': NULL, 'coolness': NULL}
SELECT json_transform_strict(j, '{"family": "TINYINT", "coolness": "DOUBLE"}') FROM example;
Invalid Input Error: Failed to cast value: "anatidae"

JSON 表函数

Goose 实现了两个 JSON 表函数:输入一个 JSON 值并输出一张表。

函数说明
json_each(json[ ,path]遍历 json,并为顶层数组或对象中的每个元素返回一行。
json_tree(json[ ,path]以深度优先方式遍历 json,并为结构中的每个元素返回一行。

若元素不是数组或对象,则返回该元素本身。 若提供可选参数 path,遍历会从该路径对应元素开始,而非从根元素开始。

结果表包含以下列:

字段类型说明
keyVARCHAR元素相对于父元素的键
valueJSON元素值
typeVARCHAR该元素的 json_type(函数)结果
atomJSON该元素的 json_value(函数)结果
idUBIGINT元素标识符,按解析顺序编号
parentUBIGINT父元素的 id
fullkeyVARCHAR指向该元素的 JSON 路径
pathVARCHAR指向父元素的 JSON 路径
jsonJSON (Virtual)json 参数
rootTEXT (Virtual)path 参数
rowidBIGINT (Virtual)行标识符

这两个函数与 SQLite 同名函数 类似。 注意:由于 json_eachjson_tree 会引用同一 FROM 子句中前面的子查询,因此它们属于 lateral join

示例:

CREATE TABLE example (j JSON);
INSERT INTO example VALUES
('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'),
('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}');
SELECT je.*, je.rowid
FROM example AS e, json_each(e.j) AS je;
keyvaluetypeatomidparentfullkeypathrowid
family"anatidae"VARCHAR"anatidae"2NULL$.family$0
species["duck","goose"]ARRAYNULL4NULL$.species$1
coolness42.42DOUBLE42.428NULL$.coolness$2
family"canidae"VARCHAR"canidae"2NULL$.family$0
species["labrador","bulldog"]ARRAYNULL4NULL$.species$1
hairtrueBOOLEANtrue8NULL$.hair$2
SELECT je.*, je.rowid
FROM example AS e, json_each(e.j, '$.species') AS je;
keyvaluetypeatomidparentfullkeypathrowid
0"duck"VARCHAR"duck"5NULL$.species[0]$.species0
1"goose"VARCHAR"goose"6NULL$.species[1]$.species1
0"labrador"VARCHAR"labrador"5NULL$.species[0]$.species0
1"bulldog"VARCHAR"bulldog"6NULL$.species[1]$.species1
SELECT je.key, je.value, je.type, je.id, je.parent, je.fullkey, je.rowid
FROM example AS e, json_tree(e.j) AS je;
keyvaluetypeidparentfullkeyrowid
NULL{"family":"anatidae","species":["duck","goose"],"coolness":42.42}OBJECT0NULL$0
family"anatidae"VARCHAR20$.family1
species["duck","goose"]ARRAY40$.species2
0"duck"VARCHAR54$.species[0]3
1"goose"VARCHAR64$.species[1]4
coolness42.42DOUBLE80$.coolness5
NULL{"family":"canidae","species":["labrador","bulldog"],"hair":true}OBJECT0NULL$0
family"canidae"VARCHAR20$.family1
species["labrador","bulldog"]ARRAY40$.species2
0"labrador"VARCHAR54$.species[0]3
1"bulldog"VARCHAR64$.species[1]4
hairtrueBOOLEAN80$.hair5