Databricks SQL data types

For rules governing how conflicts between data types are resolved, see Databricks SQL type precedence.

Supported data types

Databricks SQL supports the following data types:

Data Type Description
BIGINT Represents 8-byte signed integer numbers.
BINARY Represents byte sequence values.
BOOLEAN Represents Boolean values.
DATE Represents values comprising values of fields year, month and day, without a time-zone.
DECIMAL(p,s) Represents numbers with maximum precision p and fixed scale s.
DOUBLE Represents 8-byte double-precision floating point numbers.
FLOAT Represents 4-byte single-precision floating point numbers.
INT Represents 4-byte signed integer numbers.
INTERVAL intervalQualifier Represents intervals of time either on a scale of seconds or months.
NULL Represents the untyped NULL.
SMALLINT Represents 4-byte signed integer numbers.
STRING Represents character string values.
TIMESTAMP Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone.
TINYINT Represents 1-byte signed integer numbers.
ARRAY<elementType> Represents values comprising a sequence of elements with the type of elementType.
MAP<keyType,valueType> Represents values comprising a set of key-value pairs.
STRUCT<[fieldName:fieldType [NOT NULL][COMMENT str][, …]]> Represents values with the structure described by a sequence of fields.

Data type classification

Data types are grouped into the following classes:

  • Integral numeric types represent whole numbers:
  • Exact numeric types represent base-10 numbers:
  • Binary floating point types use exponents and a binary representation to cover a large range of numbers:
  • Numeric types represents all numeric data types:
    • Exact numeric types
    • Binary floating point types
  • Date-time types represent date and time components:
  • Simple data types are types defined by holding singleton values:
  • Complex types are types composed of multiple components of complex or simple types:

Special floating point values

Several special floating point values are treated in a case-insensitive manner:

  • Inf, +Inf, Infinity, +Infinity: positive infinity
  • -Inf, -Infinity: negative infinity
  • NaN: not a number

Positive and negative infinity semantics

Positive and negative infinity have the following semantics:

  • Positive infinity multiplied by any positive value returns positive infinity.
  • Negative infinity multiplied by any positive value returns negative infinity.
  • Positive infinity multiplied by any negative value returns negative infinity.
  • Negative infinity multiplied by any negative value returns positive infinity.
  • Positive or negative infinity multiplied by 0 returns NaN.
  • Positive or negative infinity is equal to itself.
  • In aggregations, all positive infinity values are grouped together. Similarly, all negative infinity values are grouped together.
  • Positive infinity and negative infinity are treated as normal values in join keys.
  • Positive infinity sorts lower than NaN and higher than any other values.
  • Negative infinity sorts lower than any other values.

NaN semantics

When dealing with float or double types that do not exactly match standard floating point semantics, NaN has the following semantics:

  • NaN = NaN returns true.
  • In aggregations, all NaN values are grouped together.
  • NaN is treated as a normal value in join keys.
  • NaN values go last when in ascending order, larger than any other numeric value.

Examples

> SELECT double('infinity');
 Infinity

> SELECT float('-inf');
 -Infinity

> SELECT float('NaN');
 NaN

> SELECT double('infinity') * 0;
 NaN

> SELECT double('-infinity') * (-1234567);
 Infinity

> SELECT double('infinity') < double('NaN');
 true

> SELECT double('NaN') = double('NaN');
 true

> SELECT double('inf') = double('infinity');
 true

> SELECT COUNT(*), c2
    FROM VALUES (1, double('infinity')),
                (2, double('infinity')),
                (3, double('inf')),
                (4, double('-inf')),
                (5, double('NaN')),
                (6, double('NaN')),
                (7, double('-infinity'))
        AS test(c1, c2)
    GROUP BY c2;
        2       NaN
        2 -Infinity
        3  Infinity