Skip to main content

Data types

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

For rules governing how conflicts between data types are resolved, see SQL data type rules.

Supported data types

Databricks supports the following data types:

Data Type

Description

BIGINT

Represents 8-byte signed integer numbers.

BINARY

Represents byte sequence values.

BOOLEAN

Represents Boolean values.

DATE

Represents values comprising values of fields year, month and day, without a time-zone.

DECIMAL(p,s)

Represents numbers with maximum precision p and fixed scale s.

DOUBLE

Represents 8-byte double-precision floating point numbers.

FLOAT

Represents 4-byte single-precision floating point numbers.

INT

Represents 4-byte signed integer numbers.

INTERVAL intervalQualifier

Represents intervals of time either on a scale of seconds or months.

VOID

Represents the untyped NULL.

SMALLINT

Represents 2-byte signed integer numbers.

STRING

Represents character string values.

TIMESTAMP

Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone.

TIMESTAMP_NTZ

Represents values comprising values of fields year, month, day, hour, minute, and second. All operations are performed without taking any time zone into account.

TINYINT

Represents 1-byte signed integer numbers.

ARRAY < elementType >

Represents values comprising a sequence of elements with the type of elementType.

MAP < keyType,valueType >

Represents values comprising a set of key-value pairs.

STRUCT < [fieldName : fieldType [NOT NULL][COMMENT str][, …]] >

Represents values with the structure described by a sequence of fields.

VARIANT

Represents semi-structured data.

OBJECT

Represents values in a VARIANT with the structure described by a set of fields.

important

Delta Lake does not support the VOID type.

Data type classification

Data types are grouped into the following classes:

  • Binary floating point types use exponents and a binary representation to cover a large range of numbers:

Language mappings

Applies to: check marked yes Databricks Runtime

Spark SQL data types are defined in the package org.apache.spark.sql.types. You access them by importing the package:

Scala
import org.apache.spark.sql.types._

SQL type

Data type

Value type

API to access or create data type

TINYINT

ByteType

Byte

ByteType

SMALLINT

ShortType

Short

ShortType

INT

IntegerType

Int

IntegerType

BIGINT

LongType

Long

LongType

FLOAT

FloatType

Float

FloatType

DOUBLE

DoubleType

Double

DoubleType

DECIMAL(p,s)

DecimalType

java.math.BigDecimal

DecimalType

STRING

StringType

String

StringType

BINARY

BinaryType

Array[Byte]

BinaryType

BOOLEAN

BooleanType

Boolean

BooleanType

TIMESTAMP

TimestampType

java.sql.Timestamp

TimestampType

TIMESTAMP_NTZ

TimestampNTZType

java.time.LocalDateTime

TimestampNTZType

DATE

DateType

java.sql.Date

DateType

year-month interval

YearMonthIntervalType

java.time.Period

YearMonthIntervalType (3)

day-time interval

DayTimeIntervalType

java.time.Duration

DayTimeIntervalType (3)

ARRAY

ArrayType

scala.collection.Seq

ArrayType(elementType [, containsNull]). (2)

MAP

MapType

scala.collection.Map

MapType(keyType, valueType [, valueContainsNull]). (2)

STRUCT

StructType

org.apache.spark.sql.Row

StructType(fields). fields is a Seq of StructField. 4.

StructField

The value type of the data type of this field(For example, Int for a StructField with the data type IntegerType)

StructField(name, dataType [, nullable]). 4

VARIANT

VariantType

org.apache.spark.unsafe.type.VariantVal

VariantType

OBJECT

Not Supported

Not supported

Not supported

(1) Numbers are converted to the domain at runtime. Make sure that numbers are within range.

(2) The optional value defaults to TRUE.

(3) Interval types

  • YearMonthIntervalType([startField,] endField): Represents a year-month interval which is made up of a contiguous subset of the following fields:

    startField is the leftmost field, and endField is the rightmost field of the type. Valid values of startField and endField are 0(MONTH) and 1(YEAR).

  • DayTimeIntervalType([startField,] endField): Represents a day-time interval which is made up of a contiguous subset of the following fields:

    startField is the leftmost field, and endField is the rightmost field of the type. Valid values of startField and endField are 0(DAY), 1(HOUR), 2(MINUTE), 3(SECOND).

(4) StructType

  • StructType(fields) Represents values with the structure described by a sequence, list, or array of StructFields (fields). Two fields with the same name are not allowed.
  • StructField(name, dataType, nullable) Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable indicates if values of these fields can have null values. This is the default.