Databricks SQL type precedence

Databricks SQL uses several rules that govern how conflicts between data types are resolved. At the heart of this conflict resolution is the type precedence list which defines whether values of a given data type can be promoted to another data type implicitly.

Data type Precedence list (from narrowest to widest)
TINYINT TINYINT -> SMALLINT -> INT -> BIGINT -> DECIMAL -> FLOAT- -> DOUBLE
SMALLINT SMALLINT -> INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
INT INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
BIGINT BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
DECIMAL DECIMAL -> FLOAT (1) -> DOUBLE
FLOAT FLOAT (1) -> DOUBLE
DOUBLE DOUBLE
DATE DATE -> TIMESTAMP
TIMESTAMP TIMESTAMP
ARRAY ARRAY (2)
BINARY BINARY
BOOLEAN BOOLEAN
INTERVAL INTERVAL
MAP MAP (2)
STRING STRING
STRUCT STRUCT (2)

(1) For least common type resolution FLOAT is skipped to avoid loss of precision.

(2) For a complex type the precedence rule applies recursively to its component elements.

String literals and NULL

Special rules apply for string literals and untyped NULL.

A NULL can be promoted to any other type. A string literal can be promoted to any simple data type.

This is a graphical depiction of the precedence hierarchy:

Graphical representation of precedence rules

Least common type resolution

The least common type from a set of types is the narrowest type reachable from the precedence list by all elements of the set of types.

The least common type resolution is used to:

  • Decide whether a function expecting a parameter of a type can be invoked using an argument of a narrower type.
  • Derive the argument type for functions which expect a shared argument type for multiple parameters, such as coalesce, least, or greatest.
  • Derive the operand types for operators such as arithmetic operations or comparisons.
  • Derive the result type for expressions such as the case expression.
  • Derive the element, key, or value types for array and map constructors.

Special rules are applied if the least common type resolves to FLOAT:

If any of the contributing types is INT, BIGINT, or DECIMAL the least common type is pushed to DOUBLE to avoid potential loss of digits.

Examples

The coalesce function (Databricks SQL) accepts any set of argument types as long as they share a least common type.

The result type is the least common type of the arguments.

> SELECT typeof(coalesce(1Y, 1L, NULL));
  BIGINT

> SELECT typeof(coalesce(1, DATE'2020-01-01'));
Error: Incompatible types [INT, DATE]

> SELECT typeof(coalesce(ARRAY(1Y), ARRAY(1L)))
  ARRAY<BIGINT>

> SELECT typeof(coalesce(1, 1F))
  DOUBLE

> SELECT typeof(coalesce(1L, 1F))
  DOUBLE

> SELECT (typeof(coalesce(1BD, 1F))
  DOUBLE

The substring function (Databricks SQL) expects arguments of type INT for the start and length parameters.

> SELECT substring('hello', 1Y, 2);
he

> SELECT substring('hello', 1, 2);
He

> SELECT substring('hello', '1', 2);
he

> SELECT substring('hello', 1L, 2);
Error: Argument 2 requires an INT type.

> SELECT substring('hello', str, 2)
  FROM VALUES(CAST('1' AS STRING)) AS T(str);
Error: Argument 2 requires an INT type.