Data types

Supported data types

Databricks Runtime SQL and DataFrames support the following data types:

Data Type

Description

BIGINT

Represents 8-byte signed integer numbers.

BINARY

Represents byte sequence values.

BOOLEAN

Represents Boolean values.

DATE

Represents values comprising values of fields year, month and day, without a time-zone.

DECIMAL(p,s)

Represents numbers with maximum precision p and fixed scale s.

DOUBLE

Represents 8-byte double-precision floating point numbers.

FLOAT

Represents 4-byte single-precision floating point numbers.

INT

Represents 4-byte signed integer numbers.

INTERVAL intervalQualifier

Represents intervals of time either on a scale of seconds or months.

VOID

Represents the untyped NULL.

SMALLINT

Represents 2-byte signed integer numbers.

STRING

Represents character string values.

TIMESTAMP

Represents values comprising values of fields year, month, day, hour, minute, and second, with the session local timezone.

TINYINT

Represents 1-byte signed integer numbers.

ARRAY<elementType>

Represents values comprising a sequence of elements with the type of elementType.

MAP<keyType,valueType>

Represents values comprising a set of key-value pairs.

STRUCT<[fieldName:fieldType [NOT NULL][COMMENT str][, …]]>

Represents values with the structure described by a sequence of fields.

Data type classification

Data types are grouped into the following classes:

Language mappings

Spark SQL data types are defined in the package org.apache.spark.sql.types. You access them by importing the package:

import org.apache.spark.sql.types._

SQL type

Data type

Value type

API to access or create data type

TINYINT

ByteType

Byte

ByteType

SMALLINT

ShortType

Short

ShortType

INT

IntegerType

Int

IntegerType

BIGINT

LongType

Long

LongType

FLOAT

FloatType

Float

FloatType

DOUBLE

DoubleType

Double

DoubleType

DECIMAL(p,s)

DecimalType

java.math.BigDecimal

DecimalType

STRING

StringType

String

StringType

BINARY

BinaryType

Array[Byte]

BinaryType

BOOLEAN

BooleanType

Boolean

BooleanType

TIMESTAMP

TimestampType

java.sql.Timestamp

TimestampType

DATE

DateType

java.sql.Date

DateType

year-month interval

YearMonthIntervalType

java.time.Period

YearMonthIntervalType (3)

day-time interval

DayTimeIntervalType

java.time.Duration

DayTimeIntervalType (3)

ARRAY

ArrayType

scala.collection.Seq

ArrayType(elementType [, containsNull]). (2)

MAP

MapType

scala.collection.Map

MapType(keyType, valueType [, valueContainsNull]). (2)

STRUCT

StructType

org.apache.spark.sql.Row

StructType(fields). fields is a Seq of StructField. [4](#4).

StructField

The value type of the data type of this field(For example, Int for a StructField with the data type IntegerType)

StructField(name, dataType [, nullable]). [4](#4)

Spark SQL data types are defined in the package org.apache.spark.sql.types. To access or create a data type, use factory methods provided in org.apache.spark.sql.types.DataTypes.

SQL type

Data Type

Value type

API to access or create data type

TINYINT

ByteType

byte or Byte

DataTypes.ByteType

SMALLINT

ShortType

short or Short

DataTypes.ShortType

INT

IntegerType

int or Integer

DataTypes.IntegerType

BIGINT

LongType

long or Long

DataTypes.LongType

FLOAT

FloatType

float or Float

DataTypes.FloatType

DOUBLE

DoubleType

double or Double

DataTypes.DoubleType

DECIMAL(p,s)

DecimalType

java.math.BigDecimal

DataTypes.createDecimalType() DataTypes.createDecimalType(precision, scale).

STRING

StringType

String

DataTypes.StringType

BINARY

BinaryType

byte[]

DataTypes.BinaryType

BOOLEAN

BooleanType

boolean or Boolean

DataTypes.BooleanType

TIMESTAMP

TimestampType

java.sql.Timestamp

DataTypes.TimestampType

DATE

DateType

java.sql.Date

DataTypes.DateType

year-month interval

YearMonthIntervalType

java.time.Period

YearMonthIntervalType (3)

day-time interval

DayTimeIntervalType

java.time.Duration

DayTimeIntervalType (3)

ARRAY

ArrayType

ava.util.List

DataTypes.createArrayType(elementType [, containsNull]).(2)

MAP

MapType

java.util.Map

DataTypes.createMapType(keyType, valueType [, valueContainsNull]).(2)

STRUCT

StructType

org.apache.spark.sql.Row

DataTypes.createStructType(fields). fields is a List or array of StructField. [4](#4)

StructField

The value type of the data type of this field (For example, int for a StructField with the data type IntegerType)

DataTypes.createStructField(name, dataType, nullable) [4](#4)

Spark SQL data types are defined in the package pyspark.sql.types. You access them by importing the package:

from pyspark.sql.types import *

SQL type

Data type

Value type

API to access or create data type

TINYINT

ByteType

int or long. (1)

ByteType()

SMALLINT

ShortType

int or long. (1)

ShortType()

INT

IntegerType

int or long

IntegerType()

BIGINT

LongType

long (1)

LongType()

FLOAT

FloatType

float (1)

FloatType()

DOUBLE

DoubleType

float

DoubleType()

DECIMAL(p,s)

DecimalType

decimal.Decimal

DecimalType()

STRING

StringType

string

StringType()

BINARY

BinaryType

bytearray

BinaryType()

BOOLEAN

BooleanType

bool

BooleanType()

TIMESTAMP

TimestampType

datetime.datetime

TimestampType()

DATE

DateType

datetime.date

DateType()

year-month interval

YearMonthIntervalType

Not supported

Not supported

day-time interval

DayTimeIntervalType

datetime.timedelta

DayTimeIntervalType (3)

ARRAY

ArrayType

list, tuple, or array

ArrayType(elementType, [containsNull]).(2)

MAP

MapType

dict

MapType(keyType, valueType, [valueContainsNull]).(2)

STRUCT

StructType

list or tuple

StructType(fields). field is a Seq of StructField. (4)

StructField

The value type of the data type of this field (For example, Int for a StructField with the data type IntegerType)

StructField(name, dataType, [nullable]).(4)

SQL type

Data type

Value type

API to access or create data type

TINYINT

ByteType

integer (1)

‘byte’

SMALLINT

ShortType

integer (1)

‘short’

INT

IntegerType

integer

‘integer’

BIGINT

LongType

integer (1)

‘long’

FLOAT

FloatType

numeric (1)

‘float’

DOUBLE

DoubleType

numeric

‘double’

DECIMAL(p,s)

DecimalType

Not supported

Not supported

STRING

StringType

character

‘string’

BINARY

BinaryType

raw

‘binary’

BOOLEAN

BooleanType

logical

‘bool’

TIMESTAMP

TimestampType

POSIXct

‘timestamp’

DATE

DateType

Date

‘date’

year-month interval

YearMonthIntervalType

Not supported

Not supported

day-time interval

DayTimeIntervalType

Not supported

Not supported

ARRAY

ArrayType

vector or list

list(type=’array’, elementType=elementType, containsNull=[containsNull]).(2)

MAP

MapType

environment

list(type=’map’, keyType=keyType, valueType=valueType, valueContainsNull=[valueContainsNull]).(2)

STRUCT

StructType

named list

list(type=’struct’, fields=fields). fields is a Seq of StructField. (4)

StructField

The value type of the data type of this field (For example, integer for a StructField with the data type IntegerType)

list(name=name, type=dataType, nullable=[nullable]).(4)

(1) Numbers are converted to the domain at runtime. Make sure that numbers are within range.

(2) The optional value defaults to TRUE.

(3) Interval types

  • YearMonthIntervalType([startField,] endField): Represents a year-month interval which is made up of a contiguous subset of the following fields:

    startField is the leftmost field, and endField is the rightmost field of the type. Valid values of startField and endField are 0(MONTH) and 1(YEAR).

  • DayTimeIntervalType([startField,] endField): Represents a day-time interval which is made up of a contiguous subset of the following fields:

    startField is the leftmost field, and endField is the rightmost field of the type. Valid values of startField and endField are 0(DAY), 1(HOUR), 2(MINUTE), 3(SECOND).

(4) StructType

  • StructType(fields) Represents values with the structure described by a sequence, list, or array of StructFields (fields). Two fields with the same name are not allowed.

  • StructField(name, dataType, nullable) Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable indicates if values of these fields can have null values. This is the default.