schema_of_variant_agg aggregate function

July 26, 2024

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 15.3 and later

Returns the combined schema of all VARIANT values in a group in DDL format.

Syntax

schema_of_variant_agg ( variantExpr ) [FILTER ( WHERE cond ) ]

Arguments

  • variantExpr: A VARIANT expression.

  • cond: An optional BOOLEAN expression filtering the rows used for aggregation.

Returns

A STRING holding a schema definition of the variantExpr. The types in the schema are the derived formatted SQL types.

The schema of each VARIANT value is merged together by field name. When two fields with the same name have a different type across records, Databricks uses the least common type. When no such type exists, the type is derived as a VARIANT. For example, INT and DOUBLE become DOUBLE, while TIMESTAMP and STRING become VARIANT.

To derive the schema of a single VARIANT value, use schema_of_variant function.

Examples

SQL
-- Simple example
> SELECT schema_of_variant_agg(a) FROM VALUES(parse_json('{"foo": "bar"}')) AS data(a);
  OBJECT<foo: STRING>

> SELECT schema_of_variant_agg(a) FROM VALUES(parse_json('[1]')) AS data(a);
  ARRAY<BIGINT>

> CREATE TEMPORARY VIEW data(a) AS VALUES
  (parse_json('{"foo": "bar", "wing": {"ding": "dong"}}')),
  (parse_json('{"wing": 123}'));
> SELECT schema_of_variant_agg(a) FROM data;
  OBJECT<foo: STRING, wing: VARIANT>