schema_of_json_agg
aggregate function
Applies to: Databricks SQL Databricks Runtime 13.2 and later
Returns the combined schema of all JSON strings in a group in DDL format.
Syntax
schema_of_json_agg(jsonStr [, options] ) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER
clause.
Arguments
jsonStr
: ASTRING
literal withJSON
.options
: An optionalMAP
literal with keys and values beingSTRING
. For details on options, see from_json function.cond
: An optionalBOOLEAN
expression filtering the rows used for aggregation.
Returns
A STRING
holding a definition of an array of structs with n
fields of strings where the column names are derived from the distinct set of JSON
keys .
The field values hold the derived formatted SQL types.
The schema of each record is merged together by field name.
When two fields with the same name have a different type across records, Databricks uses the least common type.
When no such type exists, the type is derived as a STRING
.
For example, INT
and DOUBLE
become DOUBLE
, while STRUCT<i INT>
and STRING
become STRING
.
The schema obtained from reading a column of JSON
data is the same as the one derived from the following.
SELECT * FROM json.`/my/data`;
To derive the schema of a single JSON
string, use schema_of_json function.
Examples
> SELECT schema_of_json_agg(a) FROM VALUES('{"foo": "bar"}') AS data(a);
STRUCT<foo: STRING>
> SELECT schema_of_json_agg(a) FROM VALUES('[1]') AS data(a);
ARRAY<BIGINT>
> CREATE TEMPORARY VIEW data(a) AS VALUES
('{"foo": "bar", "wing": {"ding": "dong"}}'),
('{"top": "level", "wing": {"stop": "go"}}')
> SELECT schema_of_json_agg(a) FROM data;
STRUCT<foo: STRING,top: STRING,wing: STRUCT<ding: STRING, stop: STRING>>