Built-in functions

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

This article presents links to and descriptions of built-in operators and functions for strings and binary types, numeric scalars, aggregations, windows, arrays, maps, dates and timestamps, casting, CSV data, JSON data, XPath manipulation, and other miscellaneous functions.

Also see:

Operators and predicates

For information on how operators are parsed with respect to each other, see Operator precedence.

Operator

Syntax

Description

&

expr1 & expr2

Returns the bitwise AND of expr1 and expr2.

and

expr1 and expr2

Returns the logical AND of expr1 and expr2.

*

multiplier * multiplicand

Returns multiplier multiplied by multiplicand.

!=

expr1 != expr2

Returns true if expr1 does not equal expr2, or false otherwise.

!

!expr

Returns the logical NOT of a Boolean expression.

between

expr1 [not] between expr2 and expr2

Tests whether expr1 is greater or equal than expr2 and less than or equal to expr3.

[ ]

arrayExpr [ indexExpr ]

Returns indexExprnd element of ARRAY arrayExpr

[ ]

mapExpr [ keyExpr ]

Returns value at keyExpr of MAP mapExpr

^

expr1 ^ expr2

Returns the bitwise exclusive OR (XOR) of expr1 and expr2.

:

jsonStr : jsonPath

Returns fields extracted from the jsonStr.

::

expr :: type

Casts the value expr to the target data type type.

div

divisor div dividend

Returns the integral part of the division of divisor by dividend.

.

mapExpr . keyIdentifier

Returns a MAP value by keyIdentifier.

.

structExpr . fieldIdentifier

Returns a STRUCT field by fieldIdentifier.

==

expr1 == expr2

Returns true if expr1 equals expr2, or false otherwise.

=

expr1 = expr2

Returns true if expr1 equals expr2, or false otherwise.

>=

expr1 >= expr2

Returns true if expr1 is greater than or equal to expr2, or false otherwise.

>

expr1 > expr2

Returns true if expr1 is greater than expr2, or false otherwise.

exists

exists(query)

Returns true if query returns at least one row, or false otherwise.

ilike

str [not] ilike (pattern[ESCAPE escape])

Returns true if str does (not) match pattern with escape case-insensitively.

ilike

str [not] ilike {ANY|SOME|ALL}([pattern[, ...]])

Returns true if str does (not) match any/all patterns case-insensitively.

in

elem [not] in (expr1[, ...])

Returns true if elem does (not) equal any exprN.

in

elem [not] in (query)

Returns true if elem does (not) equal any row in query.

is distinct

expr1 is [not] distinct from expr2

Tests whether the arguments do (not) have different values where NULLs are considered as comparable values.

is false

expr is [not] false

Tests whether expr is (not) false.

is null

expr is [not] null

Returns true if expr is (not) NULL.

is true

expr is [not] true

Tests whether expr is (not) true.

like

str [not] like (pattern[ESCAPE escape])

Returns true if str does (not) match pattern with escape.

like

str [not] like {ANY|SOME|ALL}([pattern[, ...]])

Returns true if str does (not) match any/all patterns.

<=>

expr1 <=> expr2

Returns the same result as the EQUAL(=) for non-null operands, but returns true if both are NULL, false if one of the them is NULL.

<=

expr1 <= expr2

Returns true if expr1 is less than or equal to expr2, or false otherwise.

<>

expr1 <> expr2

Returns true if expr1 does not equal expr2, or false otherwise.

<

expr1 < expr2

Returns true if expr1 is less than expr2, or false otherwise.

-

expr1 - expr2

Returns the subtraction of expr2 from expr1.

not

not expr

Returns the logical NOT of a Boolean expression.

or

expr1 or expr2

Returns the logical OR of expr1 and expr2.

%

dividend % divisor

Returns the remainder after dividend / divisor.

||

expr1 || expr2

Returns the concatenation of expr1 and expr2.

|

expr1 | expr2

Returns the bitwise OR of expr1 and expr2.

+

expr1 + expr2

Returns the sum of expr1 and expr2.

regexp

str [not] regexp regex

Returns true if str does (not) match regex.

regexp_like

str [not] regexp_like regex

Returns true if str does (not) match regex.

rlike

str [not] rlike regex

Returns true if str does (not) match regex.

/

dividend / divisor

Returns dividend divided by divisor.

~

~ expr

Returns the bitwise NOT of expr.

Operator precedence

Precedence

Operator

1

:, ::, [ ]

2

-(unary), +(unary), ~

3

*, /, %, div

4

+, -, ||

5

&

6

^

7

|

8

=, ==, <=>, <>, !=, <, <=, >, >=

9

not, exists

10

between, in, rlike, regexp, ilike, like, is [not] [NULL, true, false], is [not] distinct from

11

and

12

or

String and binary functions

Function

Description

expr1 || expr2

Returns the concatenation of expr1 and expr2.

aes_decrypt(expr, key[, mode[, padding[, aad]]])

Decrypts a binary expr using AES encryption.

aes_encrypt(expr, key[, mode[, padding[, iv[, aad]]]])

Encrypts a binary expr using AES encryption.

ascii(str)

Returns the ASCII code point of the first character of str.

base64(expr)

Converts expr to a base 64 string.

bin(expr)

Returns the binary representation of expr.

binary(expr)

Casts the value of expr to BINARY.

bit_length(expr)

Returns the bit length of string data or number of bits of binary data.

bitmap_count(expr)

Returns the number of bits set in a BINARY string representing a bitmap.

btrim(str [, trimStr])

Returns str with leading and trailing characters removed.

char(expr)

Returns the character at the supplied UTF-16 code point.

char_length(expr)

Returns the character length of string data or number of bytes of binary data.

character_length(expr)

Returns the character length of string data or number of bytes of binary data.

charindex(substr, str[, pos])

Returns the position of the first occurrence of substr in str after position pos.

chr(expr)

Returns the character at the supplied UTF-16 code point.

concat(expr1, expr2[, …])

Returns the concatenation of the arguments.

concat_ws(sep[, expr1[, …]])

Returns the concatenation strings separated by sep.

contains(expr, subExpr)

Returns true if expr STRING or BINARY contains subExpr.

crc32(expr)

Returns a cyclic redundancy check value of expr.

decode(expr, charSet)

Translates binary expr to a string using the character set encoding charSet.

encode(expr, charSet)

Returns the binary representation of a string using the charSet character encoding.

endswith(expr, endExpr)

Returns true if expr STRING or BINARY ends with endExpr.

find_in_set(searchExpr, sourceExpr)

Returns the position of a string within a comma-separated list of strings.

format_number(expr, scale)

Formats expr like #,###,###.##, rounded to scale decimal places.

format_number(expr, fmt)

Formats expr like fmt.

format_string(strfmt[, obj1 [, …]])

Returns a formatted string from printf-style format strings.

hex(expr)

Converts expr to hexadecimal.

str ilike (pattern[ESCAPE escape])

Returns true if str matches pattern with escape case insensitively.

initcap(expr)

Returns expr with the first letter of each word in uppercase.

instr(str, substr)

Returns the (1-based) index of the first occurrence of substr in str.

lcase(expr)

Returns expr with all characters changed to lowercase.

left(str, len)

Returns the leftmost len characters from str.

len(expr)

Returns the character length of string data or number of bytes of binary data.

length(expr)

Returns the character length of string data or number of bytes of binary data.

levenshtein(str1, str2)

Returns the Levenshtein distance between the strings str1 and str2.

str like (pattern[ESCAPE escape])

Returns true if str matches pattern with escape.

locate(substr, str[, pos])

Returns the position of the first occurrence of substr in str after position pos.

lower(expr)

Returns expr with all characters changed to lowercase.

lpad(expr, len[, pad])

Returns expr, left-padded with pad to a length of len.

ltrim([trimstr,] str)

Returns str with leading characters within trimStr removed.

mask(str[, upperChar[, lowerChar[, digitChar[, otherChar]]]])

Returns a masked version of the input str.

md5(expr)

Returns an MD5 128-bit checksum of expr as a hex string.

octet_length(expr)

Returns the byte length of string data or number of bytes of binary data.

overlay(input PLACING replace FROM pos [FOR len])

Replaces input with replace that starts at pos and is of length len.

parse_url(url, partToExtract[, key])

Extracts a part from url.

position(substr, str[, pos])

Returns the position of the first occurrence of substr in str after position pos.

position(subtr IN str)

Returns the position of the first occurrence of substr in str after position pos.

printf(strfmt[, obj1 [, …]])

Returns a formatted string from printf-style format strings.

str regexp regex

Returns true if str matches regex.

str regexp_like regex

Returns true if str matches regex.

regexp_count(str, regexp)

Returns the number of times str matches the regexp pattern.

regexp_extract(str, regexp[, idx])

Extracts the first string in str that matches the regexp expression and corresponds to the regex group index.

regexp_extract_all(str, regexp[, idx])

Extracts the all strings in str that matches the regexp expression and corresponds to the regex group index.

regexp_instr(str, regexp)

Returns the position of the first substring in str that matches regexp.

regexp_replace(str, regexp, rep[, position])

Replaces all substrings of str that match regexp with rep.

regexp_substr(str, regexp)

Returns the first substring in str that matches regexp.

repeat(expr, n)

Returns the string that repeats expr n times.

replace(str, search [, replace])

Replaces all occurrences of search with replace.

reverse(expr)

Returns a reversed string or an array with reverse order of elements.

right(str, len)

Returns the rightmost len characters from the string str.

str rlike regex

Returns true if str matches regex.

rpad(expr, len[, pad])

Returns expr, right-padded with pad to a length of len.

rtrim([trimStr,] str)

Returns str with trailing characters removed.

sentences(str[, lang, country])

Splits str into an array of array of words.

sha(expr)

Returns a sha1 hash value as a hex string of expr.

sha1(expr)

Returns a sha1 hash value as a hex string of expr.

sha2(expr, bitLength)

Returns a checksum of the SHA-2 family as a hex string of expr.

soundex(expr)

Returns the soundex code of the string.

space(n)

Returns a string consisting of n spaces.

split(str, regex[, limit])

Splits str around occurrences that match regex and returns an array with a length of at most limit.

split_part(str, delim, partNum)

Splits str around occurrences of delim and returns the partNum part.

startswith(expr, startExpr)

Returns true if expr STRING or BINARY starts with startExpr.

string(expr)

Casts the value expr to STRING.

substr(expr, pos[, len])

Returns the substring of expr that starts at pos and is of length len.

substr(expr FROM pos[ FOR len])

Returns the substring of expr that starts at pos and is of length len.

substring(expr, pos[, len])

Returns the substring of expr that starts at pos and is of length len.

substring(expr FROM pos[ FOR len])

Returns the substring of expr that starts at pos and is of length len.

substring_index(expr, delim, count)

Returns the substring of expr before count occurrences of the delimiter delim.

to_binary(expr[, fmt])

Returns expr cast to a Binary based on fmt.

to_char(numExpr, fmt)

Returns numExpr cast to STRING using formatting fmt.”

to_varchar(numExpr, fmt)

Returns numExpr cast to STRING using formatting fmt.”

translate(expr, from, to)

Returns an expr where all characters in from have been replaced with those in to.

trim([[BOTH | LEADING | TRAILING] [trimStr] FROM] str)

Trim characters from a string.

try_aes_decrypt(expr, key[, mode[, padding[, aad]]])

Decrypts a binary expr using AES encryption, and return NULL in case of error.

try_to_binary(expr [, fmt])

Returns expr cast to BINARY based on fmt, or NULL if the input is invalid.

ucase(expr)

Returns expr with all characters changed to uppercase.

unbase64(expr)

Returns a decoded base64 string as binary.

unhex(expr)

Converts hexadecimal expr to BINARY.

upper(expr)

Returns expr with all characters changed to uppercase.

url_decode(str)

Translates a string back from application/x-www-form-urlencoded format.

url_encode(str)

Translates a string into application/x-www-form-urlencoded format.

Numeric scalar functions

Function

Description

~ expr

Returns the bitwise NOT of expr.

dividend / divisor

Returns dividend divided by divisor.

expr1 | expr2

Returns the bitwise OR of expr1 and expr2.

- expr

Returns the negated value of expr.

expr1 - expr2

Returns the subtraction of expr2 from expr1.

+ expr

Returns the value of expr.

expr1 + expr2

Returns the sum of expr1 and expr2.

dividend % divisor

Returns the remainder after dividend / divisor.

expr1 ^ expr2

Returns the bitwise exclusive OR (XOR) of expr1 and expr2.

expr1 & expr2

Returns the bitwise AND of expr1 and expr2.

multiplier * multiplicand

Returns multiplier multiplied by multiplicand.

abs(expr)

Returns the absolute value of the numeric value in expr.

acos(expr)

Returns the inverse cosine (arccosine) of expr.

acosh(expr)

Returns the inverse hyperbolic cosine of expr.

asin(expr)

Returns the inverse sine (arcsine) of expr.

asinh(expr)

Returns the inverse hyperbolic sine of expr.

atan(expr)

Returns the inverse tangent (arctangent) of expr.

atan2(exprY, exprX)

Returns the angle in radians between the positive x-axis of a plane and the point specified by the coordinates (exprX, exprY).

atanh(expr)

Returns inverse hyperbolic tangent of expr.

bigint(expr)

Casts the value expr to BIGINT.

bit_count(expr)

Returns the number of bits set in the argument.

bit_get(expr, pos)

Returns the value of a bit in a binary representation of an integral numeric.

bit_reverse(expr)

Returns the value obtained by reversing the order of the bits in the argument.

bitmap_bit_position(expr)

Returns the 0-based bit position of a given BIGINT number within a bucket.

bitmap_bucket_number(expr)

Returns the bitmap bucket number for a given BIGINT number.

bround(expr[,targetScale])

Returns the rounded expr using HALF_EVEN rounding mode.

cbrt(expr)

Returns the cube root of expr.

ceil(expr[,targetScale])

Returns the smallest number not smaller than expr rounded up to targetScale digits relative to the decimal point.

ceiling(expr[,targetScale])

Returns the smallest number not smaller than expr rounded up to targetScale digits relative to the decimal point.

conv(num, fromBase, toBase)

Converts num from fromBase to toBase.

convert_timezone([sourceTz, ]targetTz, sourceTs)

Converts the TIMESTAMP_NTZ sourceTs from the sourceTz time zone to targetTz.

cos(expr)

Returns the cosine of expr.

cosh(expr)

Returns the hyperbolic cosine of expr.

cot(expr)

Returns the cotangent of expr.

csc(expr)

Returns the cosecant of expr.

decimal(expr)

Casts the value expr to DECIMAL.

degrees(expr)

Converts radians to degrees.

divisor div dividend

Returns the integral part of the division of divisor by dividend.

double(expr)

Casts the value expr to DOUBLE.

e()

Returns the constant e.

exp(expr)

Returns e to the power of expr.

expm1(expr)

Returns exp(expr) - 1.

factorial(expr)

Returns the factorial of expr.

float(expr)

Casts the value expr to FLOAT.

floor(expr[,targetScale])

Returns the largest number not smaller than expr rounded down to targetScale digits relative to the decimal point.

getbit(expr, pos)

Returns the value of a bit in a binary representation of an integral numeric.

hypot(expr1, expr2)

Returns sqrt(expr1 * expr1 + expr2 * expr2).

int(expr)

Casts the value expr to INTEGER.

isnan(expr)

Returns true if expr is NaN.

ln(expr)

Returns the natural logarithm (base e) of expr.

log([base,] expr)

Returns the logarithm of expr with base.

log1p(expr)

Returns log(1 + expr).

log2(expr)

Returns the logarithm of expr with base 2.

log10(expr)

Returns the logarithm of expr with base 10.

mod(dividend, divisor)

Returns the remainder after dividend / divisor.

nanvl(expr1, expr2)

Returns expr1 if it’s not NaN, or expr2 otherwise.

negative(expr)

Returns the negated value of expr.

pi()

Returns pi.

pmod(dividend, divisor)

Returns the positive remainder after dividend / divisor.

positive(expr)

Returns the value of expr.

pow(expr1, expr2)

Raises expr1 to the power of expr2.

power(expr1, expr2)

Raises expr1 to the power of expr2.

radians(expr)

Converts expr in degrees to radians.

rand([seed])

Returns a random value between 0 and 1.

randn([seed])

Returns a random value from a standard normal distribution.

random([seed])

Returns a random value between 0 and 1.

rint(expr)

Returns expr rounded to a whole number as a DOUBLE.

round(expr[,targetScale])

Returns the rounded expr using HALF_UP rounding mode.

sec(expr)

Returns the secant of expr.

sin(expr)

Returns the sine of expr.

shiftleft(expr, n)

Returns a bitwise left shifted by n bits.

shiftright(expr, n)

Returns a bitwise signed signed integral number right shifted by n bits.

shiftrightunsigned(expr, n)

Returns a bitwise unsigned signed integral number right shifted by n bits.

sign(expr)

Returns -1.0, 0.0, or 1.0 as expr is negative, 0, or positive.

signum(expr)

Returns -1.0, 0.0, or 1.0 as expr is negative, 0, or positive.

sinh(expr)

Returns the hyperbolic sine of expr.

smallint(expr)

Casts the value expr to SMALLINT.

sqrt(expr)

Returns the square root of expr.

tan(expr)

Returns the tangent of expr.

tanh(expr)

Returns the hyperbolic tangent of expr.

tinyint(expr)

Casts expr to TINYINT.

to_number(expr, fmt )

Returns expr cast to DECIMAL using formatting fmt.

try_add(expr1, expr2)

Returns the sum of expr1 and expr2, or NULL in case of error.

try_divide(dividend, divisor)

Returns dividend divided by divisor, or NULL if divisor is 0.

try_multiply(multiplier, multiplicand)

Returns multiplier multiplied by multiplicand, or NULL on overflow.

try_subtract(expr1, expr2)

Returns the subtraction of expr2 from expr1, or NULL on overflow.

try_to_number(expr, fmt )

Returns expr cast to DECIMAL using formatting fmt, or NULL if expr does not match the format.

width_bucket(expr, minExpr, maxExpr, numBuckets)

Returns the bucket number for a value in an equi-width histogram.

Aggregate functions

Function

Description

any(expr)

Returns true if at least one value of expr in the group is true.

any_value(expr[,ignoreNull])

Returns any random value of expr for a group of rows.

approx_count_distinct(expr[,relativeSD])

Returns the estimated number of distinct values in expr within the group.

approx_percentile(expr,percentage[,accuracy])

Returns the approximate percentile of the expr within the group.

approx_top_k(expr[,k[,maxItemsTracked]])

Returns the top k most frequently occurring item values in an expr along with their approximate counts.

array_agg(expr)

Returns an array consisting of all values in expr within the group.

avg(expr)

Returns the mean calculated from values of a group.

bit_and(expr)

Returns the bitwise AND of all input values in the group.

bit_or(expr)

Returns the bitwise OR of all input values in the group.

bit_xor(expr)

Returns the bitwise XOR of all input values in the group.

bitmap_construct_agg(expr)

Returns the bitwise OR of all bit position values in the group. between 0 and 32767 in a group as a BINARY.

bitmap_or_agg(expr)

Returns the bitwise OR of all BINARY input values in the group.

bool_and(expr)

Returns true if all values in expr are true within the group.

bool_or(expr)

Returns true if at least one value in expr is true within the group.

collect_list(expr)

Returns an array consisting of all values in expr within the group.

collect_set(expr)

Returns an array consisting of all unique values in expr within the group.

corr(expr1,expr2)

Returns Pearson coefficient of correlation between a group of number pairs.

count(*)

Returns the total number of retrieved rows in a group, including rows containing null.

count(expr[, …])

Returns the number of rows in a group for which the supplied expressions are all non-null.

count_if(expr)

Returns the number of true values for the group in expr.

count_min_sketch(column,epsilon,confidence,seed)

Returns a count-min sketch of all values in the group in column with the epsilon, confidence and seed.

covar_pop(expr1,expr2)

Returns the population covariance of number pairs in a group.

covar_samp(expr1,expr2)

Returns the sample covariance of number pairs in a group.

every(expr)

Returns true if all values of expr in the group are true.

first(expr[,ignoreNull])

Returns the first value of expr for a group of rows.

first_value(expr[,ignoreNull])

Returns the first value of expr for a group of rows.

hll_sketch_agg(expr[,lgConfigK])

Returns a HyperLogLog sketch used to approximate a distinct values count.

hll_union_agg(expr[,allowDifferentLgConfigK])

Aggregates HyperLogLog sketches for a group of rows.

kurtosis(expr)

Returns the kurtosis value calculated from values of a group.

last(expr[,ignoreNull])

Returns the last value of expr for the group of rows.

last_value(expr[,ignoreNull])

Returns the last value of expr for the group of rows.

max(expr)

Returns the maximum value of expr in a group.

max_by(expr1,expr2)

Returns the value of an expr1 associated with the maximum value of expr2 in a group.

mean(expr)

Returns the mean calculated from values of a group.

median(expr)

Returns the median calculated from values of a group.

min(expr)

Returns the minimum value of expr in a group.

min_by(expr1, expr2)

Returns the value of an expr1 associated with the minimum value of expr2 in a group.

mode(expr [,deterministic])

Returns the most frequent, not NULL, value of expr in a group.

percentile(expr, percentage [,frequency])

Returns the exact percentile value of expr at the specified percentage.

percentile_approx(expr,percentage[,accuracy])

Returns the approximate percentile of the expr within the group.

percentile_cont(pct) WITHIN GROUP (ORDER BY key)

Returns the interpolated percentile of the key within the group.

percentile_disc(pct) WITHIN GROUP (ORDER BY key)

Returns the discrete percentile of the key within the group.

regr_avgx(yExpr, xExpr)

Returns the mean of xExpr calculated from values of a group where xExpr and yExpr are NOT NULL.

regr_avgy(yExpr, xExpr)

Returns the mean of yExpr calculated from values of a group where xExpr and yExpr are NOT NULL.

regr_count(yExpr, xExpr)

Returns the number of non-null value pairs yExpr, xExpr in the group.

regr_intercept(yExpr, xExpr)

Returns the intercept of the uni-variate linear regression line in a group where xExpr and yExpr are NOT NULL.

regr_r2(yExpr, xExpr)

Returns the coefficient of determination from values of a group where xExpr and yExpr are NOT NULL.

regr_slope(yExpr, xExpr)

Returns the slope of the linear regression line of non-null value pairs yExpr, xExpr in the group.

regr_sxx(yExpr, xExpr)

Returns the sum of squares of the xExpr values of a group where xExpr and yExpr are NOT NULL.

regr_sxy(yExpr, xExpr)

Returns the sum of products of yExpr and xExpr calculated from values of a group where xExpr and yExpr are NOT NULL.

regr_syy(yExpr, xExpr)

Returns the sum of squares of the yExpr values of a group where xExpr and yExpr are NOT NULL.

skewness(expr)

Returns the skewness value calculated from values of a group.

some(expr)

Returns true if at least one value of expr in a group is true.

std(expr)

Returns the sample standard deviation calculated from the values within the group.

stddev(expr)

Returns the sample standard deviation calculated from the values within the group.

stddev_pop(expr)

Returns the population standard deviation calculated from values of a group.

stddev_samp(expr)

Returns the sample standard deviation calculated from values of a group.

sum(expr)

Returns the sum calculated from values of a group.

try_avg(expr)

Returns the mean calculated from values of a group, NULL if there is an overflow.

try_sum(expr)

Returns the sum calculated from values of a group, NULL if there is an overflow.

var_pop(expr)

Returns the population variance calculated from values of a group.

var_samp(expr)

Returns the sample variance calculated from values of a group.

variance(expr)

Returns the sample variance calculated from values of a group.

Ranking window functions

Function

Description

dense_rank()

Returns the rank of a value compared to all values in the partition.

ntile(n)

Divides the rows for each window partition into n buckets ranging from 1 to at most n.

percent_rank()

Computes the percentage ranking of a value within the partition.

rank()

Returns the rank of a value compared to all values in the partition.

row_number()

Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.

Analytic window functions

Function

Description

cume_dist()

Returns the position of a value relative to all values in the partition.

lag(expr[,offset[,default]])

Returns the value of expr from a preceding row within the partition.

lead(expr[,offset[,default]])

Returns the value of expr from a subsequent row within the partition.

nth_value(expr, offset[, ignoreNulls])

Returns the value of expr at a specific offset in the window.

Array functions

Function

Description

arrayExpr[indexExpr]

Returns element at position indexExpr of ARRAY arrayExpr.

aggregate(expr,start,merge[,finish])

Aggregates elements in an array using a custom aggregator.

array([expr [, …]])

Returns an array with the elements in expr.

array_append(array, elem)

Returns array appended by elem.

array_compact(array)

Removes NULL values from array.

array_contains(array,value)

Returns true if array contains value.

array_distinct(array)

Removes duplicate values from array.

array_except(array1,array2)

Returns an array of the elements in array1 but not in array2.

array_insert(array, index, elem)

Returns an expanded array where elem is inserted at the index position.

array_intersect(array1,array2)

Returns an array of the elements in the intersection of array1 and array2.

array_join(array,delimiter[,nullReplacement])

Concatenates the elements of array.

array_max(array)

Returns the maximum value in array.

array_min(array)

Returns the minimum value in array.

array_position(array,element)

Returns the position of the first occurrence of element in array.

array_prepend(array, elem)

Returns array prepended by elem.

array_remove(array,element)

Removes all occurrences of element from array.

array_repeat(element,count)

Returns an array containing element count times.

array_size(array)

Returns the number of elements in array.

array_sort(array,func)

Returns array sorted according to func.

array_union(array1,array2)

Returns an array of the elements in the union of array1 and array2 without duplicates.

arrays_overlap(array1, array2)

Returns true if the intersection of array1 and array2 is not empty.

arrays_zip(array1 [, …])

Returns a merged array of structs in which the nth struct contains all Nth values of input arrays.

cardinality(expr)

Returns the size of expr.

concat(expr1, expr2 [, …])

Returns the concatenation of the arguments.

element_at(arrayExpr, index)

Returns the element of an arrayExpr at index.

exists(expr, pred)

Returns true if pred is true for any element in expr.

explode(collection)

Returns rows by un-nesting collection.

explode_outer(collection)

Returns rows by un-nesting collection using outer semantics.

filter(expr,func)

Filters the array in expr using the function func.

flatten(arrayOfArrays)

Transforms an array of arrays into a single array.

forall(expr, predFunc)

Tests whether predFunc holds for all elements in the array.

get(arrayExpr, index)

Returns the element of an arrayExpr at index, starting at 0.

inline(expr)

Explodes an array of structs into a table.

inline_outer(expr)

Explodes an array of structs into a table with outer semantics.

posexplode(expr)

Returns rows by un-nesting the array with numbering of positions.

posexplode_outer(expr)

Returns rows by un-nesting the array with numbering of positions using OUTER semantics.

reduce(expr,start,merge[,finish])

Aggregates elements in an array using a custom aggregator.

reverse(array)

Returns a reversed string or an array with reverse order of elements.

sequence(start,stop,step)

Generates an array of elements from start to stop (inclusive), incrementing by step.

shuffle(array)

Returns a random permutation of the array in expr.

size(expr)

Returns the cardinality of expr.

slice(expr,start,length)

Returns a subset of an array.

sort_array(expr[,ascendingOrder])

Returns the array in expr in sorted order.

transform(expr, func)

Transforms elements in an array in expr using the function func.

try_element_at(arrayExpr, index)

Returns the element of an arrayExpr at index, or NULL if index is out of bound.

zip_with(expr1, expr2, func)

Merges the arrays in expr1 and expr2, element-wise, into a single array using func.

Map functions

Function

Description

mapExpr[keyExpr]

Returns value at keyExpr of MAP mapExpr.

cardinality(expr)

Returns the size of expr.

element_at(mapExpr, key)

Returns the value of mapExpr for key.

explode(expr)

Returns rows by un-nesting expr.

explode_outer(expr)

Returns rows by un-nesting expr using outer semantics.

map([{key1, value1}[, …]])

Creates a map with the specified key-value pairs.

map_concat([expr1 [, …]])

Returns the union of all expr map expressions.

map_contains_key(map, key)

Returns true if map contains key, false otherwise.

map_entries(map)

Returns an unordered array of all entries in map.

map_filter(expr, func)

Filters entries in the map in expr using the function func.

map_from_arrays(keys, values)

Creates a map with a pair of the keys and values arrays.

map_from_entries(expr)

Creates a map created from the specified array of entries.

map_keys(map)

Returns an unordered array containing the keys of map.

map_values(map)

Returns an unordered array containing the values of map.

map_zip_with(map1, map2, func)

Merges map1 and map2 into a single map.

size(expr)

Returns the cardinality of expr.

str_to_map(expr[,pairDelim[,keyValueDelim]])

Returns a map after splitting expr into key-value pairs using delimiters.

transform_keys(expr, func)

Transforms keys in a map in expr using the function func.

transform_values(expr, func)

Transforms values in a map in expr using the function func.

try_element_at(mapExpr, key)

Returns the value of mapExpr for key, or NULL if key does not exist.

Date, timestamp, and interval functions

For information on date and timestamp formats, see Datetime patterns.

Function

Description

intervalExpr / divisor

Returns interval divided by divisor.

- intervalExpr

Returns the negated value of intervalExpr.

intervalExpr1 - intervalExpr2

Returns the subtraction of intervalExpr2 from intervalExpr1.

datetimeExpr1 - datetimeExpr2

Returns the subtraction of datetimeExpr2 from datetimeExpr1.

+ intervalExpr

Returns the value of intervalExpr.

intervalExpr1 + intervalExpr2

Returns the sum of intervalExpr1 and intervalExpr2.

intervalExpr * multiplicand

Returns intervalExpr multiplied by multiplicand.

abs(expr)

Returns the absolute value of the interval value in expr.

add_months(startDate,numMonths)

Returns the date that is numMonths after startDate.

curdate()

Returns the current date at the start of query evaluation.

current_date()

Returns the current date at the start of query evaluation.

current_timestamp()

Returns the current timestamp at the start of query evaluation.

current_timezone()

Returns the current session local timezone.

date(expr)

Casts the value expr to DATE.

date_add(startDate,numDays)

Returns the date numDays after startDate.

date_add(unit, value, expr)

Adds value units to a timestamp expr.

date_diff(unit, start, stop)

Returns the difference between two timestamps measured in units.

date_format(expr,fmt)

Converts a timestamp to a string in the format fmt.

date_from_unix_date(days)

Creates a date from the number of days since 1970-01-01.

date_part(field,expr)

Extracts a part of the date, timestamp, or interval.

date_sub(startDate,numDays)

Returns the date numDays before startDate.

date_trunc(unit,expr)

Returns timestamp truncated to the unit specified in unit.

dateadd(startDate,numDays)

Returns the date numDays after startDate.

dateadd(unit, value, expr)

Adds value units to a timestamp expr.

datediff(endDate,startDate)

Returns the number of days from startDate to endDate.

datediff(unit, start, stop)

Returns the difference between two timestamps measured in units.

day(expr)

Returns the day of month of the date or timestamp.

dayofmonth(expr)

Returns the day of month of the date or timestamp.

dayofweek(expr)

Returns the day of week of the date or timestamp.

dayofyear(expr)

Returns the day of year of the date or timestamp.

divisor div dividend

Returns the integral part of the division of interval divisor by interval dividend.

extract(field FROM source)

Returns field of source.

from_unixtime(unixTime,fmt)

Returns unixTime in fmt.

from_utc_timestamp(expr,timezone)

Returns a timestamp in expr specified in UTC in the timezone timeZone.

getdate()

Returns the current timestamp at the start of query evaluation.

hour(expr)

Returns the hour component of a timestamp.

last_day(expr)

Returns the last day of the month that the date belongs to.

make_date(year,month,day)

Creates a date from year, month, and day fields.

make_dt_interval([days[, hours[, mins[, secs]]]])

Creates an day-time interval from days, hours, mins and secs.

make_interval(years, months, weeks, days, hours, mins, secs)

Deprecated: Creates an interval from years, months, weeks, days, hours, mins and secs.

make_timestamp(year,month,day,hour,min,sec[,timezone])

Creates a timestamp from year, month, day, hour, min, sec, and timezone fields.

make_ym_interval([years[, months]])

Creates a year-month interval from years, and months.

minute(expr)

Returns the minute component of the timestamp in expr.

month(expr)

Returns the month component of the timestamp in expr.

months_between(expr1,expr2[,roundOff])

Returns the number of months elapsed between dates or timestamps in expr1 and expr2.

next_day(expr,dayOfWeek)

Returns the first date which is later than expr and named as in dayOfWeek.

now()

Returns the current timestamp at the start of query evaluation.

quarter(expr)

Returns the quarter of the year for expr in the range 1 to 4.

second(expr)

Returns the second component of the timestamp in expr.

session_window(expr, gpDuration)

Creates a session-window over a timestamp expression.

sign(expr)

Returns -1.0, 0.0, or 1.0 as interval expr is negative, 0, or positive.

signum(expr)

Returns -1.0, 0.0, or 1.0 as interval expr is negative, 0, or positive.

timediff(unit, start, stop)

Returns the difference between two timestamps measured in units.

timestamp(expr)

Casts expr to TIMESTAMP.

timestamp_micros(expr)

Creates a timestamp expr microseconds since UTC epoch.

timestamp_millis(expr)

Creates a timestamp expr milliseconds since UTC epoch.

timestamp_seconds(expr)

Creates timestamp expr seconds since UTC epoch.

timestampadd(unit, value, expr)

Adds value units to a timestamp expr.

timestampdiff(unit, start, stop)

Returns the difference between two timestamps measured in units.

to_date(expr[,fmt])

Returns expr cast to a date using an optional formatting.

to_timestamp(expr[,fmt])

Returns expr cast to a timestamp using an optional formatting.

to_unix_timestamp(expr[,fmt])

Returns the timestamp in expr as a UNIX timestamp.

to_utc_timestamp(expr,timezone)

Returns the timestamp in expr in a different timezone as UTC.

trunc(expr, fmt)

Returns a date with the a portion of the date truncated to the unit specified by the format model fmt.

try_add(expr1, expr2)

Returns the sum of expr1 and expr2, or NULL in case of error.

try_divide(dividend, divisor)

Returns dividend divided by divisor, or NULL if divisor is 0.

try_multiply(multiplier, multiplicand)

Returns multiplier multiplied by multiplicand, or NULL on overflow.

try_subtract(expr1, expr2)

Returns the subtraction of expr2 from expr1, or NULL on overflow.

try_to_timestamp(expr[,fmt])

Returns expr cast to a timestamp using an optional formatting, or NULL if the cast fails.

unix_date(expr)

Returns the number of days since 1970-01-01.

unix_micros(expr)

Returns the number of microseconds since 1970-01-01 00:00:00 UTC.

unix_millis(expr)

Returns the number of milliseconds since 1970-01-01 00:00:00 UTC.

unix_seconds(expr)

Returns the number of seconds since 1970-01-01 00:00:00 UTC.

unix_timestamp([expr[, fmt]])

eturns the UNIX timestamp of current or specified time.

weekday(expr)

Returns the day of the week of expr.

weekofyear(expr)

Returns the week of the year of expr.

year(expr)

Returns the year component of expr.

window(expr, width[, step[, start]])

Creates a hopping based sliding-window over a timestamp expression.

window_time(window)

Returns the inclusive end time of a sliding-window produced by the window or session_window functions.

H3 geospatial functions

For information about H3 geospatial functions, see H3 geospatial functions.

Cast functions and constructors

For information on casting between types, see cast function and try_cast function.

Function

Description

array([expr [, …]])

Returns an array with the elements in expr.

bigint(expr)

Casts the value expr to BIGINT.

binary(expr)

Casts the value of expr to BINARY.

boolean(expr)

Casts expr to BOOLEAN.

cast(expr AS type)

Casts the value expr to the target data type type.

expr :: type

Casts the value expr to the target data type type.

date(expr)

Casts the value expr to DATE.

decimal(expr)

Casts the value expr to DECIMAL.

double(expr)

Casts the value expr to DOUBLE.

float(expr)

Casts the value expr to FLOAT.

int(expr)

Casts the value expr to INTEGER.

make_date(year,month,day)

Creates a date from year, month, and day fields.

make_dt_interval([days[, hours[, mins[, secs]]]])

Creates an day-time interval from days, hours, mins and secs.

make_interval(years, months, weeks, days, hours, mins, secs)

Creates an interval from years, months, weeks, days, hours, mins and secs.

make_timestamp(year,month,day,hour,min,sec[,timezone])

Creates a timestamp from year, month, day, hour, min, sec, and timezone fields.

make_ym_interval([years[, months]])

Creates a year-month interval from years, and months.

map([{key1, value1} [, …]])

Creates a map with the specified key-value pairs.

named_struct({name1, val1} [, …])

Creates a struct with the specified field names and values.

smallint(expr)

Casts the value expr to SMALLINT.

string(expr)

Casts the value expr to STRING.

struct(expr1 [, …])

Creates a STRUCT with the specified field values.

tinyint(expr)

Casts expr to TINYINT.

timestamp(expr)

Casts expr to TIMESTAMP.

to_char(expr, fmt)

Returns expr cast to STRING using formatting fmt.”

to_date(expr[,fmt])

Returns expr cast to a date using an optional formatting.

to_number(expr, fmt)

Returns expr cast to DECIMAL using formatting fmt.

to_timestamp(expr[,fmt])

Returns expr cast to a timestamp using an optional formatting.

to_varchar(expr, fmt)

Returns expr cast to STRING using formatting fmt.”

try_cast(expr AS type)

Casts the value expr to the target data type type safely.

try_to_number(expr, fmt)

Returns expr cast to DECIMAL using formatting fmt, or NULL if expr is not a valid.

CSV functions

Function

Description

from_csv(csvStr, schema[, options])

Returns a struct value with the csvStr and schema.

schema_of_csv(csv[, options])

Returns the schema of a CSV string in DDL format.

to_csv(expr[, options])

Returns a CSV string with the specified struct value.

JSON functions

Function

Description

jsonStr : jsonPath

Returns fields extracted from the jsonStr.

from_json(jsonStr, schema[, options])

Returns a struct value with the jsonStr and schema.

get_json_object(expr, path)

Extracts a JSON object from path.

json_array_length(jsonArray)

Returns the number of elements in the outermost JSON array.

json_object_keys(jsonObject)

Returns all the keys of the outermost JSON object as an array.

json_tuple(jsonStr, path1 [, …])

Returns multiple JSON objects as a tuple.

schema_of_json(jsonStr[, options])

Returns the schema of a JSON string in DDL format.

to_json(expr[, options])

Returns a JSON string with the struct specified in expr.

XPath and XML functions

Function

Description

from_xml(xmlStr, schema[, options])

Returns a struct value parsed from the xmlStr using schema.

schema_of_xml(xmlStr[, options])

Returns the schema of a XML string in DDL format.

xpath(xml, xpath)

Returns values within the nodes of xml that match xpath.

xpath_boolean(xml, xpath)

Returns true if the xpath expression evaluates to true, or if a matching node in xml is found.

xpath_double(xml, xpath)

Returns a DOUBLE value from an XML document.

xpath_float(xml, xpath)

Returns a FLOAT value from an XML document.

xpath_int(xml, xpath)

Returns a INTEGER value from an XML document.

xpath_long(xml, xpath)

Returns a BIGINT value from an XML document.

xpath_number(xml, xpath)

Returns a DOUBLE value from an XML document.

xpath_short(xml, xpath)

Returns a SHORT value from an XML document.

xpath_string(xml, xpath)

Returns the contents of the first XML node that matches the XPath expression.

AI functions

Function

Description

ai_analyze_sentiment(content)

Returns the sentiment of a text.

ai_classify(content, labels)

Classifies the provided content into one of the provided labels.

ai_extract(content, labels)

Extracts entities specified by labels from a given text.

ai_fix_grammar(content)

Corrects grammatical errors in a given text.

ai_gen(content)

Invokes a state-of-the-art generative AI model from Databricks Foundation Model APIs to answer the user-provided prompt.

ai_generate_text(prompt, modelName[, param1, value1] […])

Deprecated: Returns text generated by a selected large language model (LLM) given the prompt.

ai_mask(content, labels)

Masks specified entities within a given text.

ai_query(endpointName, request, returnType)

Invokes an existing Databricks Model Serving endpoint and parses and returns its response.

ai_similarity(strExpr1, strExpr2)

Compares two strings and computes the semantic similarity score.

ai_summarize(content[, max_words])

Generates a summary of a given text.

ai_translate(content, to_lang)

Translates text to a specified target language.

Read functions

Function

Description

read_files(path, [optionKey => optionValue] [, …])

Reads data files on cloud storage and returns it in tabular format.

read_kafka([optionKey => optionValue] [, …])

Reads records from an Apache Kafka cluster and returns it in tabular format.

read_kinesis({parameter => value} [, …])

Returns a table with records read from Kinesis from one or more streams.

read_pubsub([parameter => value] [, …])

A table valued function for reading records from Pub/Sub from a topic.

read_pulsar({optionKey => optionValue} [, …])

Returns a table with records read from Pulsar.

read_state_metadata(path)

Returns a table with rows that represent the metadata of a streaming query state.

read_statestore(path [, option_key => option_value] […])

Returns records from the state store of streaming queries.

Miscellaneous functions

Function

Description

assert_true(expr)

Returns an error if expr is not true.

CASE expr { WHEN opt1 THEN res1 } […] [ELSE def] END

Returns resN for the first optN that equals expr or def if none matches.

CASE { WHEN cond1 THEN res1 } […] [ELSE def] END

Returns resN for the first condN that evaluates to true, or def if none found.

cloud_files_state( { TABLE(table) | checkpoint } )

Returns the file-level state of an autoloader cloud_files source.

coalesce(expr1, expr2 [, …])

Returns the first non-null argument.

cube (expr1 [, …])

Creates a multi-dimensional cube using the specified expression columns.

current_catalog()

Returns the current catalog.

current_database()

Returns the current schema.

current_metastore()

Returns the current Unity Catalog Metastore id.

current_recipient(key)

Returns a property for the current recipient in a view shared with Delta Sharing.

current_schema()

Returns the current schema.

current_user()

Returns the user executing the statement.

current_version()

Returns the current version of Databricks.

decode(expr, { key, value } [, …] [,defValue])

Returns the value matching the key.

elt(index, expr1 [, …] )

Returns the nth expression.

equal_null(expr1, expr2)

Returns true if expr1 equals expr2 or both expressions are NULL, or false otherwise.

event_log( { TABLE(table) | pipeline_id } )

Returns a table of the refresh history for a materialized view, streaming table, or DLT pipeline.

greatest(expr1, expr2 [, …])

Returns the largest value of all arguments, skipping null values.

grouping(col)

Indicates whether a specified column in a GROUPING SET, ROLLUP, or CUBE represents a subtotal.

grouping_id([col1 [, …]])

Returns the level of grouping for a set of columns.

hash(expr1 [, …])

Returns a hashed value of the arguments.

hll_sketch_estimate(expr)

Etimates number of distinct values collected in a HyperLogLog sketch.

hll_union(expr1, expr2 [,allowDifferentLgConfigK])

Combines two HyperLogLog sketches.

java_method(class, method[, arg1 [, …]])

Calls a method with reflection.

if(cond, expr1, expr2)

Returns expr1 if cond is true, or expr2 otherwise.

iff(cond, expr1, expr2)

Returns expr1 if cond is true, or expr2 otherwise.

ifnull(expr1, expr2)

Returns expr2 if expr1 is NULL, or expr1 otherwise.

input_file_block_length()

Returns the length in bytes of the block being read.

input_file_block_start()

Returns the start offset in bytes of the block being read.

input_file_name()

Returns the name of the file being read, or empty string if not available.

is_account_group_member(group)

Returns true if the current user is a member of group at the account level.

is_member(group)

Returns true if the current user is a member of group at the workspace level.

isnull(expr)

Returns true if expr is NULL.

isnotnull(expr)

Returns true if expr is not NULL.

least(expr1, expr2 [, …])

Returns the smallest value of all arguments, skipping null values.

list_secrets()

Returns the keys which the user is authorized to see from Databricks secret service.

luhn_check(numStr)

Returns true if numStr passes the Luhn algorithm check.

monotonically_increasing_id()

Returns monotonically increasing 64-bit integers.

nullif(expr1, expr2)

Returns NULL if expr1 equals expr2, or expr1 otherwise.

nvl(expr1, expr2)

Returns expr2 if expr1 is NULL, or expr1 otherwise.

nvl2(expr1, expr2, expr3)

Returns expr2 if expr1 is not NULL, or expr3 otherwise.

raise_error(expr)

Throws an exception with expr as the message.

range(end)

Returns a table of values within a specified range.

range(start, end [, step [, numParts]])

Returns a table of values within a specified range.

reflect(class, method[, arg1 [, …]])

Calls a method with reflection.

secret(scope, key)

Extracts a secret value with the given scope and key from Databricks secret service.

session_user()

Returns the user connected to Databricks.

spark_partition_id()

Returns the current partition ID.

sql_keywords()

Returns the set of SQL keywords in Databricks.

stack(numRows, expr1 [, …])

Separates expr1, …, exprN into numRows rows.

table_changes(table_str, start [, end])

Returns a log of changes to a Delta Lake table with Change Data Feed enabled.

try_reflect(class, method[, arg1 [, …]])

Calls a method with reflection, returning NULL if the method fails.

typeof(expr)

Return a DDL-formatted type string for the data type of expr.

user()

Returns the user executing the statement.

uuid()

Returns an universally unique identifier (UUID) string.

window(expr, width[, step [, start]])

Creates a hopping based sliding-window over a timestamp expression.

xxhash64(expr1 [, …])

Returns a 64-bit hashed value of the arguments.

version()

Returns the Apache Spark version.