schema_of_json_agg aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.2 and later

Returns the combined schema of all JSON strings in a group in DDL format.

Syntax

schema_of_json_agg(jsonStr [, options] ) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • jsonStr: A STRING literal with JSON.

  • options: An optional MAP literal with keys and values being STRING. For details on options, see from_json function.

  • cond: An optional BOOLEAN expression filtering the rows used for aggregation.

Returns

A STRING holding a definition of an array of structs with n fields of strings where the column names are derived from the distinct set of JSON keys . The field values hold the derived formatted SQL types.

The schema of each record is merged together by field name. When two fields with the same name have a different type across records, Databricks uses the least common type. When no such type exists, the type is derived as a STRING. For example, INT and DOUBLE become DOUBLE, while STRUCT<i INT> and STRING become STRING.

The schema obtained from reading a column of JSON data is the same as the one derived from the following.

SELECT * FROM json.`/my/data`;

To derive the schema of a single JSON string, use schema_of_json function.

Examples

> SELECT schema_of_json_agg(a) FROM VALUES('{"foo": "bar"}') AS data(a);
  STRUCT<foo: STRING>

> SELECT schema_of_json_agg(a) FROM VALUES('[1]') AS data(a);
  ARRAY<BIGINT>

> CREATE TEMPORARY VIEW data(a) AS VALUES
  ('{"foo": "bar",   "wing": {"ding": "dong"}}'),
  ('{"top": "level", "wing": {"stop": "go"}}')

> SELECT schema_of_json_agg(a) FROM data;
  STRUCT<foo: STRING,top: STRING,wing: STRUCT<ding: STRING, stop: STRING>>