Databricks SQL built-in functions

This article presents links to and descriptions of built-in operators, and functions for strings and binary types, numeric scalars, aggregations, windows, arrays, maps, dates and timestamps, casting, CSV data, JSON data, XPath manipulation, and miscellaneous functions.

Also see:

Operators and predicates

For information on how operators are parsed with respect to each other, see Operator precedence.

Operator Syntax Description
& expr1 & expr2 Returns the bitwise AND of expr1 and expr2.
and expr1 and expr2 Returns the logical AND of expr1 and expr2.
* multiplier * multiplicand Returns multiplier multiplied by multiplicand.
!= expr1 != expr2 Returns true if expr1 does not equal expr2, or false otherwise.
! !expr Returns the logical NOT of a Boolean expression.
between expr1 [not] between expr2 and expr2 Tests whether expr1 is greater or equal than expr2 and less than or equal to expr3.
^ expr1 ^ expr2 Returns the bitwise exclusive OR (XOR) of expr1 and expr2.
:: expr :: type Casts the value expr to the target data type type.
div divisor div dividend Returns the integral part of the division of divisor by dividend.
== expr1 == expr2 Returns true if expr1 equals expr2, or false otherwise.
= expr1 = expr2 Returns true if expr1 equals expr2, or false otherwise.
>= expr1 >= expr2 Returns true if expr1 is greater than or equal to expr2, or false otherwise.
> expr1 > expr2 Returns true if expr1 is greater than expr2, or false otherwise.
exists exists(query) Returns true if query returns at least one row, or false otherwise.
in elem [not] in (expr1[, ...]) Returns true if elem equals any exprN.
in elem [not] in (query) Returns true if elem equals any row in query.
is distinct expr1 is [not] distinct from expr2 Tests whether the arguments have different values where NULLs are considered as comparable values.
is false expr is [not] false Tests whether expr is false.
is null expr is [not] null Returns true if expr is (not) NULL.
is true expr is [not] true Tests whether expr is true.
like str [not] like (pattern[ESCAPE escape]) Returns true if str matches pattern with escape.
<=> expr1 <=> expr2 Returns the same result as the EQUAL(=) for non-null operands, but returns true if both are NULL, false if one of the them is NULL.
<> expr1 <= expr2 Returns true if expr1 does not equal expr2, or false otherwise.
<> expr1 <> expr2 Returns true if expr1 does not equal expr2, or false otherwise.
< expr1 < expr2 Returns true if expr1 is less than or equal to expr2, or false otherwise.
- expr1 - expr2 Returns the subtraction of expr2 from expr1.
not not expr Returns the logical NOT of a Boolean expression.
% dividend % divisor Returns the remainder after dividend / divisor.
|| expr1 || expr2 Returns the concatenation of expr1 and expr2.
| expr1 | expr2 Returns the bitwise OR of expr1 and expr2.
+ expr1 + expr2 Returns the sum of expr1 and expr2.
regexp str [not] regexp regex Returns true if str matches regex.
rlike str [not] rlike regex Returns true if str matches regex.
/ dividend / divisor Returns dividend divided by divisor.
~ ~ expr Returns the bitwise NOT of expr.

Operator precedence

Precedence Operator
1 ::
2 -(unary), +(unary), ~
3 *, /, %, div
4 +, -, ||
5 &
6 ^
7 |
8 =, ==, <=>, <>, !=, <, <=, >, >=
9 not, exists
10 between, in, rlike, regexp, like, is [not] [NULL, true, false], is [not] distinct from
11 and
12 or

String and binary functions

Function Description
expr1 || expr2 Returns the concatenation of expr1 and expr2.
ascii(str) Returns the ASCII code point of the first character of str.
base64(expr) Converts expr to a base 64 string.
bin(expr) Returns the binary representation of expr.
binary(expr) Casts the value of expr to BINARY.
bit_length(expr) Returns the bit length of string data or number of bits of binary data.
char(expr) Returns the character at the supplied UTF-16 code point.
char_length(expr) Returns the character length of string data or number of bytes of binary data.
character_length(expr) Returns the character length of string data or number of bytes of binary data.
charindex(substr, str[, pos]) Returns the position of the first occurrence of substr in str after position pos.
chr(expr) Returns the character at the supplied UTF-16 code point.
concat(expr1, expr2[, …]) Returns the concatenation of the arguments.
concat_ws(sep[, expr1[, …]]) Returns the concatenation strings separated by sep.
crc32(expr) Returns a cyclic redundancy check value of expr.
decode(expr, charSet) Translates binary expr to a string using the character set encoding charSet.
encode(expr, charSet) Returns the binary representation of a string using the charSet character encoding.
find_in_set(searchExpr, sourceExpr) Returns the position of a string within a comma-separated list of strings.
format_number(expr, scale) Formats expr like #,###,###.##, rounded to scale decimal places.
format_number(expr, fmt) Formats expr like fmt.
format_string(strfmt[, obj1 [, …]]) Returns a formatted string from printf-style format strings.
hex(expr) Converts expr to hexadecimal.
initcap(expr) Returns expr with the first letter of each word in uppercase.
instr(str, substr) Returns the (1-based) index of the first occurrence of substr in str.
lcase(expr) Returns expr with all characters changed to lowercase.
left(str, len) Returns the leftmost len characters from str.
length(expr) Returns the character length of string data or number of bytes of binary data.
levenshtein(str1, str2) Returns the Levenshtein distance between the strings str1 and str2.
str like (pattern[ESCAPE escape]) Returns true if str matches pattern with escape.
locate(substr, str[, pos]) Returns the position of the first occurrence of substr in str after position pos.
lower(expr) Returns expr with all characters changed to lowercase.
lpad(str, len[, pad]) Returns str, left-padded with pad to a length of len.
ltrim([trimstr,] str) Returns str with leading characters within trimStr removed.
ltrim([trimstr,] str) Returns str with leading characters within trimStr removed.
md5(expr) Returns an MD5 128-bit checksum of expr as a hex string.
octet_length(expr) Returns the byte length of string data or number of bytes of binary data.
overlay(input PLACING replace FROM pos [FOR len]) Replaces input with replace that starts at pos and is of length len.
parse_url(url, partToExtract[, key]) Extracts a part from url.
position(substr, str[, pos]) Returns the position of the first occurrence of substr in str after position pos.
position(subtr IN str) Returns the position of the first occurrence of substr in str after position pos.
printf(strfmt[, obj1 [, …]]) Returns a formatted string from printf-style format strings.
str regexp regex Returns true if str matches regex.
regexp_extract(str, regexp[, idx]) Extracts the first string in str that matches the regexp expression and corresponds to the regex group index.
regexp_extract_all(str, regexp[, idx]) Extracts the all strings in str that matches the regexp expression and corresponds to the regex group index.
regexp_replace(str, regexp, rep[, position]) Replaces all substrings of str that match regexp with rep.
repeat(expr, n) Returns the string that repeats expr n times.
regexp_replace(str, regexp, rep[, position]) Replaces all substrings of str that match regexp with rep.
reverse(expr) Returns a reversed string or an array with reverse order of elements.
right(str, len) Returns the rightmost len characters from the string str.
str rlike regex Returns true if str matches regex.
rpad(str, len[, pad]) Returns str, right-padded with pad to a length of len.
rtrim([trimStr,] str) Returns str with trailing characters removed.
sentences(str[, lang, country]) Splits str into an array of array of words.
sha(expr) Returns a sha1 hash value as a hex string of expr.
sha1(expr) Returns a sha1 hash value as a hex string of expr.
sha2(expr, bitLength) Returns a checksum of the SHA-2 family as a hex string of expr.
soundex(expr) Returns the soundex code of the string.
space(n) Returns a string consisting of n spaces.
split(str, regex[, limit]) Splits str around occurrences that match regex and returns an array with a length of at most limit.
string(expr) Casts the value expr to STRING.
substr(expr, pos[, len]) Returns the substring of expr that starts at pos and is of length len.
substr(expr FROM pos[ FOR len]) Returns the substring of expr that starts at pos and is of length len.
substring(expr, pos[, len]) Returns the substring of expr that starts at pos and is of length len.
substring(expr FROM pos[ FOR len]) Returns the substring of expr that starts at pos and is of length len.
substring_index(expr, delim, count) Returns the substring of expr before count occurrences of the delimiter delim.
translate(expr, from, to) Returns an expr where all characters in from have been replaced with those in to.
trim([[BOTH | LEADING | TRAILING] [trimStr] FROM] str) Trim characters from a string.
ucase(expr) Returns expr with all characters changed to uppercase.
unbase64(expr) Returns a decoded base64 string as binary.
unhex(expr) Converts hexadecimal expr to BINARY.
upper(expr) Returns expr with all characters changed to uppercase.

Numeric scalar functions

Function Description
~ expr Returns the bitwise NOT of expr.
dividend / divisor Returns dividend divided by divisor.
expr1 | expr2 Returns the bitwise OR of expr1 and expr2.
- expr Returns the negated value of expr.
expr1 - expr2 Returns the subtraction of expr2 from expr1.
+ expr Returns the value of expr.
expr1 + expr2 Returns the sum of expr1 and expr2.
dividend % divisor Returns the remainder after dividend / divisor.
expr1 ^ expr2 Returns the bitwise exclusive OR (XOR) of expr1 and expr2.
expr1 & expr2 Returns the bitwise AND of expr1 and expr2.
multiplier * multiplicand Returns multiplier multiplied by multiplicand.
abs(expr) Returns the absolute value of the numeric value in expr.
acos(expr) Returns the inverse cosine (also know as, arc cosine) of expr.
acosh(expr) Returns the inverse hyperbolic cosine of expr.
asin(expr) Returns the inverse sine (also know as arc sine) of expr.
asinh(expr) Returns the inverse hyperbolic sine of expr.
atan(expr) Returns the inverse tangent (also know as arc tangent) of expr.
atan2(exprY, exprX) Returns the angle in radians between the positive x-axis of a plane and the point specified by the coordinates (exprX, exprY).
atanh(expr) Returns inverse hyperbolic tangent of expr.
bigint(expr) Casts the value expr to BIGINT.
bit_count(expr) Returns the number of bits set in the argument.
bit_reverse(expr) Returns the value obtained by reversing the order of the bits in the argument.
bround(expr[,targetScale]) Returns the rounded expr using HALF_EVEN rounding mode.
cbrt(expr) Returns the cube root of expr.
ceil(expr) Returns the smallest integral number not smaller than expr.
ceiling(expr) Returns the smallest integral number not smaller than expr.
conv(num, fromBase, toBase) Converts num from fromBase to toBase.
cos(expr) Returns the cosine of expr.
cosh(expr) Returns the hyperbolic cosine of expr.
cot(expr) Returns the cotangent of expr.
decimal(expr) Casts the value expr to DECIMAL.
degrees(expr) Converts radians to degrees.
divisor div dividend Returns the integral part of the division of divisor by dividend.
double(expr) Casts the value expr to DOUBLE.
e() Returns the constant e.
exp(expr) Returns e to the power of expr.
expm1(expr) Returns exp(expr) - 1.
factorial(expr) Returns the factorial of expr.
float(expr) Casts the value expr to FLOAT.
floor(expr) Returns the smallest integral number not greater than expr.
hypot(expr1, expr2) Returns sqrt(expr1 * expr1 + expr2 * expr2).
int(expr) Casts the value expr to INTEGER.
isnan(expr) Returns true if expr is NaN.
ln(expr) Returns the natural logarithm (base e) of expr.
log([base,] expr) Returns the logarithm of expr with base.
log1p(expr) Returns log(1 + expr).
log2(expr) Returns the logarithm of expr with base 2.
log10(expr) Returns the logarithm of expr with base 10.
mod(dividend, divisor) Returns the remainder after dividend / divisor.
nanvl(expr1, expr2) Returns expr1 if it’s not NaN, or expr2 otherwise.
negative(expr) Returns the negated value of expr.
pi() Returns pi.
pmod(dividend, divisor) Returns the positive remainder after dividend / divisor.
positive(expr) Returns the value of expr.
pow(expr1, expr2) Raises expr1 to the power of expr2.
power(expr1, expr2) Raises expr1 to the power of expr2.
radians(expr) Converts expr in degrees to radians.
rand([seed]) Returns a random value between 0 and 1.
randn([seed]) Returns a random value from a standard normal distribution.
random([seed]) Returns a random value between 0 and 1.
rint(expr) Returns expr rounded to a whole number as a DOUBLE.
round(expr[,targetScale]) Returns the rounded expr using HALF_UP rounding mode.
shiftleft(expr, n) Returns a bitwise left shifted by n bits.
shiftright(expr, n) Returns a bitwise signed right shifted by n bits.
unsignedright(expr, n) Returns a bitwise unsigned right shifted by n bits.
sign(expr) Returns -1.0, 0.0, or 1.0 as expr is negative, 0, or positive.
signum(expr) Returns -1.0, 0.0, or 1.0 as expr is negative, 0, or positive.
sin(expr) Returns the sine of expr.
sinh(expr) Returns the hyperbolic sine of expr.
smallint(expr) Casts the value expr to SMALLINT.
sqrt(expr) Returns the square root of expr.
tan(expr) Returns the tangent of expr.
tanh(expr) Returns the hyperbolic tangent of expr.
tinyint(expr) Casts expr to TINYINT.

Aggregate functions

Function Description
any(expr) Returns true if at least one value of expr in the group is true.
approx_count_distinct(expr[,relativeSD]) Returns the estimated number of distinct values in expr within the group.
approx_percentile(col,percentage[,accuracy]) Returns the approximate percentile of the expr within the group.
avg(expr) Returns the mean calculated from values of a group.
bit_and(expr) Returns the bitwise AND of all input values in the group.
bit_or(expr) Returns the bitwise OR of all input values in the group.
bit_xor(expr) Returns the bitwise XOR of all input values in the group.
bool_and(expr) Returns true if all values in expr are true within the group.
bool_or(expr) Returns true if at least one value in expr is true within the group.
collect_list(expr) Returns an array consisting of all values in expr within the group.
collect_set(expr) Returns an array consisting of all unique values in expr within the group.
corr(expr1,expr2) Returns Pearson coefficient of correlation between a group of number pairs.
count(*) Returns the total number of retrieved rows in a group, including rows containing null.
count(expr[,expr…] Returns the number of rows in a group for which the supplied expressions are all non-null.
count_if(expr) Returns the number of true values for the group in expr.
count_min_sketch(expr, epsilon, confidence, seed) Returns a count-min sketch of all values in the group in expr with the epsilon, confidence and seed.
covar_pop(expr1,expr2) Returns the population covariance of number pairs in a group.
covar_samp(expr1,expr2) Returns the sample covariance of number pairs in a group.
every(expr) Returns true if all values of expr in the group are true.
first(expr[,ignoreNull]) Returns the first value of expr for a group of rows.
first_value(expr[,ignoreNull]) Returns the first value of expr for a group of rows.
kurtosis(expr) Returns the kurtosis value calculated from values of a group.
last(expr[,ignoreNull]) Returns the last value of expr for the group of rows.
last_value(expr[,ignoreNull]) Returns the last value of expr for the group of rows.
max(expr) Returns the maximum value of expr in a group.
max_by(expr1,expr2) Returns the value of an expr1 associated with the maximum value of expr2 in a group.
mean(expr) Returns the mean calculated from values of a group.
min(expr) Returns the minimum value of expr in a group.
min_by(expr1, expr2) Returns the value of an expr1 associated with the minimum value of expr2 in a group.
percentile(expr, percentage [,frequency]) Returns the exact percentile value of expr at the specified percentage.
percentile_approx(expr,percentage[,accuracy]) Returns the approximate percentile of the expr within the group.
skewness(expr) Returns the skewness value calculated from values of a group.
some(expr) Returns true if at least one value of expr in a group is true.
std(expr) Returns the sample standard deviation calculated from the values within the group.
stddev(expr) Returns the sample standard deviation calculated from the values within the group.
stddev_pop(expr) Returns the population standard deviation calculated from values of a group.
stddev_samp(expr) Returns the sample standard deviation calculated from values of a group.
sum(expr) Returns the sum calculated from values of a group.
var_pop(expr) Returns the population variance calculated from values of a group.
var_samp(expr) Returns the sample variance calculated from values of a group.
variance(expr) Returns the sample variance calculated from values of a group.

Ranking window functions

Function Description
dense_rank() Returns the rank of a value compared to all values in the partition.
ntile(n) Divides the rows for each window partition into n buckets ranging from 1 to at most n.
percent_rank() Computes the percentage ranking of a value within the partition.
rank() Returns the rank of a value compared to all values in the partition.
row_number() Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.

Analytic window functions

Function Description
cume_dist() Returns the position of a value relative to all values in the partition.
lag(expr[,offset[,default]]) Returns the value of expr from a preceding row within the partition.
lead(expr[,offset[,default]]) Returns the value of expr from a subsequent row within the partition.
nth_value(expr, offset[, ignoreNulls]) Returns the value of expr at a specific offset in the window.

Array functions

Function Description
aggregate(expr,start,merge[,finish]) Aggregates elements in an array using a custom aggregator.
array([expr][, …]) Returns an array with the elements in expr.
array_contains(array,value) Returns true if array contains value.
array_distinct(array) Removes duplicate values from array.
array_except(array1,array2) Returns an array of the elements in array1 but not in array2.
array_intersect(array1,array2) Returns an array of the elements in the intersection of array1 and array2.
array_join(array,delimiter[,nullReplacement]) Concatenates the elements of array.
array_max(array) Returns the maximum value in array.
array_min(array) Returns the minimum value in array.
array_position(array,element) Returns the position of the first occurrence of element in array.
array_remove(array,element) Removes all occurrences of element from array.
array_repeat(element,count) Returns an array containing element count times.
array_sort(array,func) Returns array sorted according to func.
array_union(array1,array2) Returns an array of the elements in the union of array1 and array2 without duplicates.
arrays_overlap(array1, array2) Returns true if the intersection of array1 and array2 is not empty.
arrays_zip([array1][, …]) Returns a merged array of structs in which the nth struct contains all nth values of input arrays.
cardinality(expr) Returns the size of expr.
concat(expr1, expr2 [, …]) Returns the concatenation of the arguments.
element_at(arrayExpr, index) Returns the element of an arrayExpr at index.
exists(expr, pred) Returns true if pred is true for any element in expr.
explode(expr) Returns rows by un-nesting expr.
explode_outer(expr) Returns rows by un-nesting expr using outer semantics.
filter(expr,func) Filters the array in expr using the function func.
flatten(arrayOfArrays) Transforms an array of arrays into a single array.
forall(expr, predFunc) Tests whether predFunc holds for all elements in the array.
inline(expr) Explodes an array of structs into a table.
inline_outer(expr) Explodes an array of structs into a table with outer semantics.
posexplode(expr) Returns rows by un-nesting the array with numbering of positions.
posexplode_outer(expr) Returns rows by un-nesting the array with numbering of positions using OUTER semantics.
reduce(expr,start,merge[,finish]) Aggregates elements in an array using a custom aggregator.
reverse(array) Returns a reversed string or an array with reverse order of elements.
sequence(start,stop,step) Generates an array of elements from start to stop (inclusive), incrementing by step.
shuffle(array) Returns a random permutation of the array in expr.
size(expr) Returns the cardinality of expr.
slice(expr,start,length) Returns a subset of an array.
sort_array(array[,ascendingOrder]) Returns the array in expr in sorted order.
transform(expr, func) Transforms elements in an array in expr using the function func.
zip_with(expr1, expr2, func) Merges the arrays in expr1 and expr2, element-wise, into a single array using func.

Map functions

Function Description
cardinality(expr) Returns the size of expr.
element_at(mapExpr, key) Returns the value of mapExpr for key.
explode(expr) Returns rows by un-nesting expr.
explode_outer(expr) Returns rows by un-nesting expr using outer semantics.
map([key1, value1][, …]) Creates a map with the specified key-value pairs.
map_concat([expr1] [,…]) Returns the union of all expr map expressions.
map_entries(map) Returns an unordered array of all entries in map.
map_filter(expr, func) Filters entries in the map in expr using the function func.
map_from_arrays(keys, values) Creates a map with a pair of the keys and values arrays.
map_from_entries(expr) Creates a map created from the specified array of entries.
map_keys(map) Returns an unordered array containing the keys of map.
map_values(map) Returns an unordered array containing the values of map.
map_zip_with(map1, map2, func) Merges map1 and map2 into a single map.
size(expr) Returns the cardinality of expr.
transform_keys(expr, func) Transforms keys in a map in expr using the function func.
transform_values(expr, func) Transforms values in a map in expr using the function func.

Date and timestamp functions

For information on date and timestamp formats, see Databricks SQL datetime patterns.

Function Description
add_months(startDate,numMonths) Returns the date that is numMonths after startDate.
current_date() Returns the current date at the start of query evaluation.
current_timestamp() Returns the current timestamp at the start of query evaluation.
current_timezone() Returns the current session local timezone.
date(expr) Casts the value expr to DATE.
date_add(startDate,numDays) Returns the date numDays after startDate.
date_format(expr,fmt) Converts a timestamp to a string in the format fmt.
date_from_unix_date(days) Creates a date from the number of days since 1970-01-01.
date_part(field,expr) Extracts a part of the date, timestamp, or interval.
date_sub(startDate,numDays) Returns the date numDays before startDate.
date_trunc(field,expr) Returns timestamp truncated to the unit specified in field.
datediff(endDate,startDate) Returns the number of days from startDate to endDate.
day(expr) Returns the day of month of the date or timestamp.
dayofmonth(expr) Returns the day of month of the date or timestamp.
dayofweek(expr) Returns the day of week of the date or timestamp.
dayofyear(expr) Returns the day of year of the date or timestamp.
extract(field FROM source) Returns field of source.
from_unixtime(unixTime,fmt) Returns unixTime in fmt.
from_utc_timestamp(expr,timezone) Returns a timestamp in expr specified in UTC in the timezone timeZone.
hour(expr) Returns the hour component of a timestamp.
last_day(expr) Returns the last day of the month that the date belongs to.
make_date(year,month,day) Creates a date from year, month, and day fields.
make_interval(years, months, weeks, days, hours, mins, secs) Creates an interval from years, months, weeks, days, hours, mins and secs.
make_timestamp(year,month,day,hour,min,sec[,timezone]) Creates a timestamp from year, month, day, hour, min, sec, and timezone fields.
minute(expr) Returns the minute component of the timestamp in expr.
month(expr) Returns the month component of the timestamp in expr.
months_between(expr1,expr2[,roundOff]) Returns the number of months elapsed between dates or timestamps in expr1 and expr2.
next_day(expr,dayOfWeek) Returns the first date which is later than expr and named as in dayOfWeek.
now() Returns the current timestamp at the start of query evaluation.
quarter(expr) Returns the quarter of the year for expr in the range 1 to 4.
second(expr) Returns the second component of the timestamp in expr.
timestamp(expr) Casts expr to TIMESTAMP.
timestamp_micros(expr) Creates a timestamp expr microseconds since UTC epoch.
timestamp_millis(expr) Creates a timestamp expr milliseconds since UTC epoch.
timestamp_seconds(expr) Creates timestamp expr seconds since UTC epoch.
to_date(expr[,fmt]) Returns expr cast to a date using an optional formatting.
to_timestamp(expr[,fmt]) Returns expr cast to a timestamp using an optional formatting.
to_unix_timestamp(expr[,fmt]) Returns the timestamp in expr as a UNIX timestamp.
to_utc_timestamp(expr,timezone) Returns the timestamp in expr in a different timezone as UTC.
trunc(expr, fmt) Returns a date with the a portion of the date truncated to the unit specified by the format model fmt.
unix_date(expr) Returns the number of days since 1970-01-01.
unix_micros(expr) Returns the number of microseconds since 1970-01-01 00:00:00 UTC.
unix_millis(expr) Returns the number of milliseconds since 1970-01-01 00:00:00 UTC.
unix_seconds(expr) Returns the number of seconds since 1970-01-01 00:00:00 UTC.
unix_timestamp([expr[, fmt]]) eturns the UNIX timestamp of current or specified time.
weekday(expr) Returns the day of the week of expr.
weekofyear(expr) Returns the week of the year of expr.
year(expr) Returns the year component of expr.
window(expr, width[,slide[,start)]] Creates a hopping based sliding-window over a timestamp expression.

Cast functions and constructors

For information on casting between types, see cast function and try_cast function.

Function Description
array(expr, …) Returns an array with the elements in expr.
bigint(expr) Casts the value expr to BIGINT.
binary(expr) Casts the value of expr to BINARY.
boolean(expr) Casts expr to Boolean.
cast(expr AS type) Casts the value expr to the target data type type.
expr :: type Casts the value expr to the target data type type.
date(expr) Casts the value expr to DATE.
decimal(expr) Casts the value expr to DECIMAL.
double(expr) Casts the value expr to DOUBLE.
float(expr) Casts the value expr to FLOAT.
int(expr) Casts the value expr to INTEGER.
make_interval(years, months, weeks, days, hours, mins, secs) Creates an interval from years, months, weeks, days, hours, mins and secs.
map([key1, value1][, …]) Creates a map with the specified key-value pairs.
named_struct({name1, val1}[, …]) Creates a struct with the specified field names and values.
smallint(expr) Casts the value expr to SMALLINT.
string(expr) Casts the value expr to STRING.
struct(expr1[, …]) Creates a STRUCT with the specified field values.
tinyint(expr) Casts expr to TINYINT.
timestamp(expr) Casts expr to TIMESTAMP.
try_cast(expr AS type) Casts the value expr to the target data type type safely.

CSV functions

Function Description
from_csv(csvStr, schema[, options]) Returns a struct value with the csvStr and schema.
schema_of_csv(csv[, options]) Returns the schema of a CSV string in DDL format.
to_csv(expr[, options]) Returns a CSV string with the specified struct value.
from_csv(csvStr, schema[, options]) Returns a struct value with the csvStr and schema.
schema_of_csv(csv[, options]) Returns the schema of a CSV string in DDL format.
to_csv(expr[, options]) Returns a CSV string with the specified struct value.

JSON functions

Function Description
from_json(jsonStr, schema[, options]) Returns a struct value with the jsonStr and schema.
get_json_object(expr, path) Extracts a JSON object from path.
json_array_length(jsonArray) Returns the number of elements in the outermost JSON array.
json_object_keys(jsonObject) Returns all the keys of the outermost JSON object as an array.
json_tuple(jsonStr, path1[, …]) Returns multiple JSON objects as a tuple.
schema_of_json(json[, options]) Returns the schema of a JSON string in DDL format.
to_json(expr[, options]) Returns a JSON string with the struct specified in expr.

XPath functions

Function Description
xpath(xml, xpath) Returns values within the nodes of xml that match xpath.
xpath_boolean(xml, xpath) Returns true if the xpath expression evaluates to true, or if a matching node in xml is found.
xpath_double(xml, xpath) Returns a DOUBLE value from an XML document.
xpath_float(xml, xpath) Returns a FLOAT value from an XML document.
xpath_int(xml, xpath) Returns a INTEGER value from an XML document.
xpath_long(xml, xpath) Returns a BIGINT value from an XML document.
xpath_number(xml, xpath) Returns a DOUBLE value from an XML document.
xpath_short(xml, xpath) Returns a SHORT value from an XML document.
xpath_string(xml, xpath) Returns the contents of the first XML node that matches the XPath expression.

Miscellaneous functions

Function Description
assert_true(expr) Returns an error if expr is not true.
CASE expr {WHEN opt1 THEN res1}[…][ELSE def] END Returns resN for the first optN that equals expr or def if none matches.
CASE {WHEN cond1 THEN res1}[…][ELSE def] END Returns resN for the first condN evaluating to true, or def if none found.
coalesce(expr1[, …]) Returns the first non-null argument.
cube (expr1[, …]) Creates a multi-dimensional cube using the specified expression columns.
current_catalog() Returns the current catalog.
current_database() Returns the current database (schema).
current_user() Returns the current user.
decode(expr, {key1, value1}[, …][,defValue]) Returns the value matching the key.
elt(index, expr1[, …]) Returns the nth expression.
greatest(expr1[, …]) Returns the greatest value of all arguments, skipping null values.
grouping(col) Indicates whether a specified column in a GROUPING SET, ROLLUP, or CUBE represents a subtotal.
grouping_id([col1[, …]]) Returns the level of grouping for a set of columns.
hash(expr1[, …]) Returns a hash value of the arguments.
java_method(class, method[, arg1[, …]]) Calls a method with reflection.
if(cond, expr1, expr2) Returns expr1 if cond is true, or expr2 otherwise.
iff(cond, expr1, expr2) Returns expr1 if cond is true, or expr2 otherwise.
ifnull(expr1, expr2) Returns expr2 if expr1 is NULL, or expr1 otherwise.
input_file_block_length() Returns the length in bytes of the block being read.
input_file_block_start() Returns the start offset in bytes of the block being read.
input_file_name() Returns the name of the file being read, or empty string if not available.
is_member(group) Returns true if the current user is a member of group.
isnull(expr) Returns true if expr is NULL.
isnotnull(expr) Returns true if expr is not NULL.
least(expr1[, …]) Returns the smallest value of all arguments, skipping null values.
monotonically_increasing_id() Returns monotonically increasing 64-bit integers.
nvl(expr1, expr2) Returns expr2 if expr1 is NULL, or expr1 otherwise.
nvl2(expr1, expr2, expr3) Returns expr2 if expr1 is not NULL, or expr3 otherwise.
raise_error(expr) Throws an exception with expr as the message.
range(end) Returns a table of values within a specified range.
range(start, end [, step [, numParts]]) Returns a table of values within a specified range.
range(start, end [, step [, numParts]]) Returns a table of values within a specified range.
reflect(class, method[, arg1[, …]]) Calls a method with reflection.
spark_partition_id() Returns the current partition ID.
stack(numRows, expr1[, …]) Separates expr1, …, exprN into numRows rows.
uuid() Returns an universally unique identifier (UUID) string.
window(expr, width[, step[, start]]) Creates a hopping based sliding-window over a timestamp expression.
xxhash64(expr1[, …]) Returns a 64-bit hash value of the arguments.
version() Returns the Apache Spark version.