SQL data type rules (Databricks SQL)

Databricks SQL uses several rules to resolve conflicts among data types. At the heart of this conflict resolution is the type precedence list which defines whether values of a given data type can be promoted to another data type implicitly.

Additionally, when Databricks SQL invokes a function it also employs implicit downcasting and crosscasting to match function arguments to function parameters where possible.

Type precedence

Databricks SQL uses several rules that govern how conflicts between data types are resolved. At the heart of this conflict resolution is the type precedence list which defines whether values of a given data type can be promoted to another data type implicitly.

Data type Precedence list (from narrowest to widest)
TINYINT TINYINT -> SMALLINT -> INT -> BIGINT -> DECIMAL -> FLOAT- -> DOUBLE
SMALLINT SMALLINT -> INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
INT INT -> BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
BIGINT BIGINT -> DECIMAL -> FLOAT (1) -> DOUBLE
DECIMAL DECIMAL -> FLOAT (1) -> DOUBLE
FLOAT FLOAT (1) -> DOUBLE
DOUBLE DOUBLE
DATE DATE -> TIMESTAMP
TIMESTAMP TIMESTAMP
ARRAY ARRAY (2)
BINARY BINARY
BOOLEAN BOOLEAN
INTERVAL INTERVAL
MAP MAP (2)
STRING STRING
STRUCT STRUCT (2)

(1) For least common type resolution FLOAT is skipped to avoid loss of precision.

(2) For a complex type the precedence rule applies recursively to its component elements.

String literals and NULL

Special rules apply for string literals and untyped NULL.

A NULL can be promoted to any other type. A string literal can be promoted to any simple data type.

This is a graphical depiction of the precedence hierarchy:

Graphical representation of precedence rules

Least common type resolution

The least common type from a set of types is the narrowest type reachable from the precedence list by all elements of the set of types.

The least common type resolution is used to:

  • Decide whether a function that expects a parameter of a given type can be invoked using an argument of a narrower type.
  • Derive the argument type for a function that expects a shared argument type for multiple parameters, such as coalesce, in, least, or greatest.
  • Derive the operand types for operators such as arithmetic operations or comparisons.
  • Derive the result type for expressions such as the case expression.
  • Derive the element, key, or value types for array and map constructors.
  • Derive the result type of UNION, INTERSECT, or EXCEPT set operators.

Special rules are applied if the least common type resolves to FLOAT:

If any of the contributing types is INTEGER, BIGINT, or DECIMAL the least common type is pushed to DOUBLE to avoid potential loss of digits.

Implicit downcasting and crosscasting

Databricks SQL employs these forms of implicit casting only on function and operator invocation, and only where it can unambiguously determine the intent.

  • Implicit downcasting

    Implicit downcasting automatically casts a wider type to a narrower type without requiring you to specify the cast explicitly. Downcasting is convenient, but it carries the risk of unexpected runtime errors if the actual value fails to be representable in the narrow type.

  • Implicit crosscasting

    Implicit crosscasting casts a value from one type family to another without requiring you to specify the cast explicitly. Databricks SQL supports implicit casting from any simple type to STRING, but not the reverse. However, a STRING literal can be cast to another simple type, because Databricks SQL can derive the safety of the cast during query compilation. While values of all simple types can be represented as STRINGs, this is obviously not the case when casting a STRING value to any other simple type.

Casting on function invocation

Given a resolved function or operator, the following rules apply, in the order they are listed, for each parameter and argument pair:

  • If a supported parameter type is part of the argument’s type precedence chain, Databricks SQL promotes the argument to that parameter type.

    In most cases the function description explicitly states the supported types or chain, such as “any numeric type”.

    For example, sin(expr) operates on DOUBLE but will accept any numeric.

  • If the expected parameter type is a STRING and the argument is a simple type Databricks SQL crosscasts the argument to the string parameter type.

    For example, substr(str, start, len) expects str to be a STRING. Instead, you can pass a numeric or datetime type.

  • If the function expects a numeric type, such as an INTEGER, or a DATE type, but the argument is a more general type, such as a DOUBLE or TIMESTAMP, Databricks SQL implicitly downcasts the argument to that parameter type.

    For example, a date_add(date, days) expects a DATE and an INTEGER.

    If you invoke date_add() with a TIMESTAMP and a BIGINT, Databricks SQL downcasts the TIMESTAMP to DATE by removing the time component and the BIGINT to an INTEGER.

  • Otherwise, Databricks SQL raises an error.

Examples

The coalesce function (Databricks SQL) accepts any set of argument types as long as they share a least common type.

The result type is the least common type of the arguments.

-- The least common type of TINYINT and BIGINT is BIGINT
> SELECT typeof(coalesce(1Y, 1L, NULL));
  BIGINT

-- INTEGER and DATE do not share a precedence chain
> SELECT typeof(coalesce(1, DATE'2020-01-01'));
Error: Incompatible types [INT, DATE]

-- Both are ARRAYs and the elements have a least common type
> SELECT typeof(coalesce(ARRAY(1Y), ARRAY(1L)))
  ARRAY<BIGINT>

-- The least common type of INT and FLOAT is DOUBLE
> SELECT typeof(coalesce(1, 1F))
  DOUBLE

> SELECT typeof(coalesce(1L, 1F))
  DOUBLE

> SELECT (typeof(coalesce(1BD, 1F))
  DOUBLE

The substring function (Databricks SQL) expects arguments of type STRING for the string and INTEGER for the start and length parameters.

-- Promotion of TINYINT to INTEGER
> SELECT substring('hello', 1Y, 2);
he

-- No casting
> SELECT substring('hello', 1, 2);
He

-- Casting of a literal string
> SELECT substring('hello', '1', 2);
he

-- Downcasting of a BIGINT to an INT
> SELECT substring('hello', 1L, 2);
he

-- No crosscasting from STRING to INTEGER
> SELECT substring('hello', str, 2)
  FROM VALUES(CAST('1' AS STRING)) AS T(str);
Error: Argument 2 requires an INT type.

-- Crosscasting from INTEGER to STRING
> SELECT substring(12345, 2, 2);
 23

|| (CONCAT) allows implicit crosscasting to string.

-- A numeric is cast to STRING
SELECT 'This is a numeric: ' || 5.4E10;
This is a numeric: 5.4E10

-- A date is cast to STRING
SELECT 'This is a date: ' || DATE'2021-11-30';
This is a date: 2021-11-30

date_add can be invoked with a TIMESTAMP or BIGINT due to implicit downcasting.

SELECT date_add(TiMESTAMP'2011-11-30 08:30:00', 5L);
2011-12-05