PySpark functions

This page provides a list of PySpark SQL functions available on Databricks with links to corresponding reference documentation.

Normal functions

Function	Description
`broadcast(df)`	Marks a DataFrame as small enough for use in broadcast joins.
`call_function(funcName, *cols)`	Call a SQL function.
`col(col)`	Returns a Column based on the given column name. `column(col)` is also allowed.
`lit(col)`	Creates a Column of literal value.
`expr(str)`	Parses the expression string into the column that it represents

Sort functions

Function	Description
`asc(col)`	Returns a sort expression for the target column in ascending order.
`asc_nulls_first(col)`	Returns a sort expression based on the ascending order of the given column name, and null values return before non-null values.
`asc_nulls_last(col)`	Returns a sort expression based on the ascending order of the given column name, and null values appear after non-null values.
`desc(col)`	Returns a sort expression for the target column in descending order.
`desc_nulls_first(col)`	Returns a sort expression based on the descending order of the given column name, and null values appear before non-null values.
`desc_nulls_last(col)`	Returns a sort expression based on the descending order of the given column name, and null values appear after non-null values.

Conditional functions

Function	Description
`coalesce(*cols)`	Returns the first column that is not null.
`ifnull(col1, col2)`	Returns col2 if col1 is null, or col1 otherwise.
`nanvl(col1, col2)`	Returns col1 if it is not NaN, or col2 if col1 is NaN.
`nullif(col1, col2)`	Returns null if col1 equals to col2, or col1 otherwise.
`nullifzero(col)`	Returns null if col is equal to zero, or col otherwise.
`nvl(col1, col2)`	Returns col2 if col1 is null, or col1 otherwise.
`nvl2(col1, col2, col3)`	Returns col2 if col1 is not null, or col3 otherwise.
`when(condition, value)`	Evaluates a list of conditions and returns one of multiple possible result expressions.
`zeroifnull(col)`	Returns zero if col is null, or col otherwise.

Pattern matching functions

Function	Description
`equal_null(col1, col2)`	Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of them is null.
`ilike(str, pattern[, escapeChar])`	Returns true if str matches pattern with escape case-insensitively, null if any arguments are null, false otherwise.
`isnan(col)`	An expression that returns true if the column is NaN.
`isnotnull(col)`	Returns true if col is not null, or false otherwise.
`isnull(col)`	An expression that returns true if the column is null.
`like(str, pattern[, escapeChar])`	Returns true if str matches pattern with escape, null if any arguments are null, false otherwise.
`regexp(str, regexp)`	Returns true if str matches the Java regex regexp, or false otherwise.
`regexp_like(str, regexp)`	Returns true if str matches the Java regex regexp, or false otherwise.
`rlike(str, regexp)`	Returns true if str matches the Java regex regexp, or false otherwise.

Mathematical functions

Function	Description
`abs(col)`	Computes the absolute value of the given column or expression.
`acos(col)`	Computes the inverse cosine (also known as arccosine) of the given column or expression.
`acosh(col)`	Computes the inverse hyperbolic cosine (also known as arcosh) of the given column or expression.
`asin(col)`	Computes inverse sine of the input column.
`asinh(col)`	Computes inverse hyperbolic sine of the input column.
`atan(col)`	Compute inverse tangent of the input column.
`atan2(col1, col2)`	Compute the angle in radians between the positive x-axis of a plane and the point given by the coordinates
`atanh(col)`	Computes inverse hyperbolic tangent of the input column.
`bin(col)`	Returns the string representation of the binary value of the given column.
`bround(col[, scale])`	Round the given value to scale decimal places using HALF_EVEN rounding mode if scale >= 0 or at integral part when scale < 0.
`cbrt(col)`	Computes the cube-root of the given value.
`ceil(col[, scale])`	Computes the ceiling of the given value.
`ceiling(col[, scale])`	Computes the ceiling of the given value.
`conv(col, fromBase, toBase)`	Convert a number in a string column from one base to another.
`cos(col)`	Computes cosine of the input column.
`cosh(col)`	Computes hyperbolic cosine of the input column.
`cot(col)`	Computes cotangent of the input column.
`csc(col)`	Computes cosecant of the input column.
`degrees(col)`	Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
`e()`	Returns Euler's number.
`exp(col)`	Computes the exponential of the given value.
`expm1(col)`	Computes the exponential of the given value minus one.
`factorial(col)`	Computes the factorial of the given value.
`floor(col[, scale])`	Computes the floor of the given value.
`greatest(*cols)`	Returns the greatest value of the list of column names, skipping null values.
`hex(col)`	Computes hex value of the given column, which could be StringType, BinaryType, IntegerType or LongType.
`hypot(col1, col2)`	Computes `sqrt(a^2 + b^2)` without intermediate overflow or underflow.
`least(*cols)`	Returns the least value of the list of column names, skipping null values.
`ln(col)`	Returns the natural logarithm of the argument.
`log(arg1[, arg2])`	Returns the first argument-based logarithm of the second argument.
`log10(col)`	Computes the logarithm of the given value in Base 10.
`log1p(col)`	Computes the natural logarithm of the given value plus one.
`log2(col)`	Returns the base-2 logarithm of the argument.
`negate(col)`	Returns the negative value.
`negative(col)`	Returns the negative value.
`pi()`	Returns Pi.
`pmod(dividend, divisor)`	Returns the positive value of dividend mod divisor.
`positive(col)`	Returns the value.
`pow(col1, col2)`	Returns the value of the first argument raised to the power of the second argument.
`radians(col)`	Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
`rand([seed])`	Generates a random column with independent and identically distributed (i.i.d.) samples uniformly distributed in [0.0, 1.0).
`randn([seed])`	Generates a random column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
`rint(col)`	Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
`round(col[, scale])`	Round the given value to scale decimal places using HALF_UP rounding mode if scale >= 0 or at integral part when scale < 0.
`sec(col)`	Computes secant of the input column.
`sign(col)`	Computes the signum of the given value.
`signum(col)`	Computes the signum of the given value.
`sin(col)`	Computes sine of the input column.
`sinh(col)`	Computes hyperbolic sine of the input column.
`sqrt(col)`	Computes the square root of the specified float value.
`tan(col)`	Computes tangent of the input column.
`tanh(col)`	Computes hyperbolic tangent of the input column.
`try_add(left, right)`	Returns the sum of left and right and the result is null on overflow.
`try_divide(left, right)`	Returns dividend/divisor.
`try_mod(left, right)`	Returns the remainder after dividend/divisor.
`try_multiply(left, right)`	Returns left multiplied by right and the result is null on overflow.
`try_subtract(left, right)`	Returns left minus right and the result is null on overflow.
`unhex(col)`	Inverse of hex.
`uniform(min, max[, seed])`	Returns a random value with independent and identically distributed (i.i.d.) values with the specified range of numbers.
`width_bucket(v, min, max, numBucket)`	Returns the bucket number into which the value of this expression would fall after being evaluated.

String functions

Function	Description
`ascii(col)`	Computes the numeric value of the first character of the string column.
`base64(col)`	Computes the BASE64 encoding of a binary column and returns it as a string column.
`bit_length(col)`	Calculates the bit length for the specified string column.
`btrim(str[, trim])`	Remove the leading and trailing trim characters from str.
`char(col)`	Returns the ASCII character having the binary equivalent to col.
`char_length(str)`	Returns the character length of string data or number of bytes of binary data.
`character_length(str)`	Returns the character length of string data or number of bytes of binary data.
`chr(n)`	Returns the ASCII character having the binary equivalent to n.
`collate(col, collation)`	Marks a given column with specified collation.
`collation(col)`	Returns the collation name of a given column.
`concat_ws(sep, *cols)`	Concatenates multiple input string columns together into a single string column, using the given separator.
`contains(left, right)`	Returns a boolean.
`decode(col, charset)`	Computes the first argument into a string from a binary using the provided character set (one of `US-ASCII`, `ISO-8859-1`, `UTF-8`, `UTF-16BE`, `UTF-16LE`, `UTF-16`, `UTF-32`).
`elt(*inputs)`	Returns the n-th input, e.g., returns input2 when n is 2.
`encode(col, charset)`	Computes the first argument into a binary from a string using the provided character set (one of `US-ASCII`, `ISO-8859-1`, `UTF-8`, `UTF-16BE`, `UTF-16LE`, `UTF-16`, `UTF-32`).
`endswith(str, suffix)`	Returns a boolean.
`find_in_set(str, str_array)`	Returns the index (1-based) of the given string (str) in the comma-delimited list (strArray).
`format_number(col, d)`	Formats the number X to a format like `#,--#,--#.--`, rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string.
`format_string(format, *cols)`	Formats the arguments in printf-style and returns the result as a string column.
`initcap(col)`	Translate the first letter of each word to upper case in the sentence.
`instr(str, substr)`	Locate the position of the first occurrence of substr column in the given string.
`is_valid_utf8(str)`	Returns true if the input is a valid UTF-8 string, otherwise returns false.
`lcase(str)`	Returns str with all characters changed to lowercase.
`left(str, len)`	Returns the leftmost len (len can be string type) characters from the string str, if len is less or equal than 0 the result is an empty string.
`length(col)`	Computes the character length of string data or number of bytes of binary data.
`levenshtein(left, right[, threshold])`	Computes the Levenshtein distance of the two given strings.
`locate(substr, str[, pos])`	Locate the position of the first occurrence of substr in a string column, after position pos.
`lower(col)`	Converts a string expression to lower case.
`lpad(col, len, pad)`	Left-pad the string column to width len with pad.
`ltrim(col[, trim])`	Trim the spaces from left end for the specified string value.
`make_valid_utf8(str)`	Returns a new string in which all invalid UTF-8 byte sequences, if any, are replaced by the Unicode replacement character (U+FFFD).
`mask(col[, upperChar, lowerChar, digitChar, ...])`	Masks the given string value.
`octet_length(col)`	Calculates the byte length for the specified string column.
`overlay(src, replace, pos[, len])`	Overlay the specified portion of src with replace, starting from byte position pos of src and proceeding for len bytes.
`position(substr, str[, start])`	Returns the position of the first occurrence of substr in str after position start.
`printf(format, *cols)`	Formats the arguments in printf-style and returns the result as a string column.
`quote`	Returns `str` enclosed by single quotes and each instance of single quote in it is preceded by a backslash.
`randstr(length[, seed])`	Returns a string of the specified length whose characters are chosen uniformly at random from the following pool of characters: 0-9, a-z, A-Z.
`regexp_count(str, regexp)`	Returns a count of the number of times that the Java regex pattern regexp is matched in the string str.
`regexp_extract(str, pattern, idx)`	Extract a specific group matched by the Java regex regexp, from the specified string column.
`regexp_extract_all(str, regexp[, idx])`	Extract all strings in the str that match the Java regex regexp and corresponding to the regex group index.
`regexp_instr(str, regexp[, idx])`	Returns the position of the first substring in the str that match the Java regex regexp and corresponding to the regex group index.
`regexp_replace(string, pattern, replacement)`	Replace all substrings of the specified string value that match regexp with replacement.
`regexp_substr(str, regexp)`	Returns the first substring that matches the Java regex regexp within the string str.
`repeat(col, n)`	Repeats a string column n times, and returns it as a new string column.
`replace(src, search[, replace])`	Replaces all occurrences of search with replace.
`right(str, len)`	Returns the rightmost len (len can be string type) characters from the string str, if len is less or equal than 0 the result is an empty string.
`rpad(col, len, pad)`	Right-pad the string column to width len with pad.
`rtrim(col[, trim])`	Trim the spaces from right end for the specified string value.
`sentences(string[, language, country])`	Splits a string into arrays of sentences, where each sentence is an array of words.
`soundex(col)`	Returns the SoundEx encoding for a string
`split(str, pattern[, limit])`	Splits str around matches of the given pattern.
`split_part(src, delimiter, partNum)`	Splits str by delimiter and return requested part of the split (1-based).
`startswith(str, prefix)`	Returns a boolean.
`substr(str, pos[, len])`	Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.
`substring(str, pos, len)`	Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type.
`substring_index(str, delim, count)`	Returns the substring from string str before count occurrences of the delimiter delim.
`to_binary(col[, format])`	Converts the input col to a binary value based on the supplied format.
`to_char(col, format)`	Convert col to a string based on the format.
`to_number(col, format)`	Convert string 'col' to a number based on the string format 'format'.
`to_varchar(col, format)`	Convert col to a string based on the format.
`translate(srcCol, matching, replace)`	Translate any character in the srcCol by a character in matching.
`trim(col[, trim])`	Trim the spaces from both ends for the specified string column.
`try_to_binary(col[, format])`	Try to perform the `to_binary` operation, but return a NULL value instead of raising an error if the conversion cannot be performed.
`try_to_number(col, format)`	Convert string `col` to a number based on the string format `format`.
`try_validate_utf8(str)`	Returns the input value if it corresponds to a valid UTF-8 string, or NULL otherwise.
`ucase(str)`	Returns str with all characters changed to uppercase.
`unbase64(col)`	Decodes a BASE64 encoded string column and returns it as a binary column.
`upper(col)`	Converts a string expression to upper case.
`validate_utf8(str)`	Returns the input value if it corresponds to a valid UTF-8 string, or emits an error otherwise.

Bitwise functions

Function	Description
`bit_count(col)`	Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.
`bit_get(col, pos)`	Returns the value of the bit (0 or 1) at the specified position.
`bitwise_not(col)`	Computes bitwise not.
`getbit(col, pos)`	Returns the value of the bit (0 or 1) at the specified position.
`shiftleft(col, numBits)`	Shift the given value numBits left.
`shiftright(col, numBits)`	(Signed) shift the given value numBits right.
`shiftrightunsigned(col, numBits)`	Unsigned shift the given value numBits right.

Date/time functions

Function	Description
`add_months(start, months)`	Returns the date that is months months after start.
`convert_timezone(sourceTz, targetTz, sourceTs)`	Converts the timestamp without time zone sourceTs from the sourceTz time zone to targetTz.
`curdate()`	Returns the current date at the start of query evaluation as a DateType column.
`current_date()`	Returns the current date at the start of query evaluation as a DateType column.
`current_time`	Returns the current time at the start of query evaluation as a TimeType column. All calls of current_time within the same query return the same value.
`current_timestamp()`	Returns the current timestamp at the start of query evaluation as a TimestampType column.
`current_timezone()`	Returns the current session local timezone.
`date_add(start, days)`	Returns the date that is days days after start.
`date_diff(end, start)`	Returns the number of days from start to end.
`date_format(date, format)`	Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
`date_from_unix_date(days)`	Create date from the number of days since 1970-01-01.
`date_part(field, source)`	Extracts a part of the date/timestamp or interval source.
`date_sub(start, days)`	Returns the date that is days days before start.
`date_trunc(format, timestamp)`	Returns timestamp truncated to the unit specified by the format.
`dateadd(start, days)`	Returns the date that is days days after start.
`datediff(end, start)`	Returns the number of days from start to end.
`datepart(field, source)`	Extracts a part of the date/timestamp or interval source.
`day(col)`	Extract the day of the month of a given date/timestamp as integer.
`days`	Partition transform function: A transform for timestamps and dates to partition data into days.
`dayname(col)`	Returns the three-letter abbreviated day name from the given date.
`dayofmonth(col)`	Extract the day of the month of a given date/timestamp as integer.
`dayofweek(col)`	Extract the day of the week of a given date/timestamp as integer.
`dayofyear(col)`	Extract the day of the year of a given date/timestamp as integer.
`extract(field, source)`	Extracts a part of the date/timestamp or interval source.
`from_unixtime(timestamp[, format])`	Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
`from_utc_timestamp(timestamp, tz)`	This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE.
`hour(col)`	Extract the hours of a given timestamp as integer.
`last_day(date)`	Returns the last day of the month which the given date belongs to.
`localtimestamp()`	Returns the current timestamp without time zone at the start of query evaluation as a timestamp without time zone column.
`hours`	Partition transform function: A transform for timestamps to partition data into hours.
`make_date(year, month, day)`	Returns a column with a date built from the year, month and day columns.
`make_dt_interval([days, hours, mins, secs])`	Make DayTimeIntervalType duration from days, hours, mins and secs.
`make_interval([years, months, weeks, days, ...])`	Make interval from years, months, weeks, days, hours, mins and secs.
`make_time`	Create time from hour, minute and second fields. For invalid inputs it will throw an error.
`make_timestamp(years, months, days, hours, ...)`	Create timestamp from years, months, days, hours, mins, secs and timezone fields.
`make_timestamp_ltz(years, months, days, ...)`	Create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields.
`make_timestamp_ntz(years, months, days, ...)`	Create local date-time from years, months, days, hours, mins, secs fields.
`make_ym_interval([years, months])`	Make year-month interval from years, months.
`minute(col)`	Extract the minutes of a given timestamp as integer.
`month(col)`	Extract the month of a given date/timestamp as integer.
`monthname(col)`	Returns the three-letter abbreviated month name from the given date.
`months`	Partition transform function: A transform for timestamps and dates to partition data into months.
`months_between(date1, date2[, roundOff])`	Returns number of months between dates date1 and date2.
`next_day(date, dayOfWeek)`	Returns the first date which is later than the value of the date column based on second week day argument.
`now()`	Returns the current timestamp at the start of query evaluation.
`quarter(col)`	Extract the quarter of a given date/timestamp as integer.
`second(col)`	Extract the seconds of a given date as integer.
`session_window(timeColumn, gapDuration)`	Generates session window given a timestamp specifying column.
`time_diff(unit, start, end)`	Returns the difference between two times, measured in specified units.
`time_from_micros(col)`	Creates a TIME value from microseconds since midnight.
`time_from_millis(col)`	Creates a TIME value from milliseconds since midnight.
`time_from_seconds(col)`	Creates a TIME value from seconds since midnight (supports fractional seconds).
`time_to_micros(col)`	Extracts microseconds from TIME value.
`time_to_millis(col)`	Extracts milliseconds from TIME value.
`time_to_seconds(col)`	Extracts seconds from TIME value.
`time_trunc(unit, time)`	Returns time truncated to the unit.
`timestamp_add(unit, quantity, ts)`	Gets the difference between the timestamps in the specified units by truncating the fraction part.
`timestamp_diff(unit, start, end)`	Gets the difference between the timestamps in the specified units by truncating the fraction part.
`timestamp_micros(col)`	Creates timestamp from the number of microseconds since UTC epoch.
`timestamp_millis(col)`	Creates timestamp from the number of milliseconds since UTC epoch.
`timestamp_seconds(col)`	Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z) to a timestamp.
`to_date(col[, format])`	Converts a Column into pyspark.sql.types.DateType using the optionally specified format.
`to_time(str, format)`	Converts a column into TimeType using the optionally specified format.
`to_timestamp(col[, format])`	Converts a Column into pyspark.sql.types.TimestampType using the optionally specified format.
`to_timestamp_ltz(timestamp[, format])`	Parses the timestamp with the format to a timestamp with time zone.
`to_timestamp_ntz(timestamp[, format])`	Parses the timestamp with the format to a timestamp without time zone.
`to_unix_timestamp(timestamp[, format])`	Returns the UNIX timestamp of the given time.
`to_utc_timestamp(timestamp, tz)`	This is a common function for databases supporting TIMESTAMP WITHOUT TIMEZONE.
`trunc(date, format)`	Returns date truncated to the unit specified by the format.
`try_make_interval([years, months, weeks, ...])`	Try to perform a `make_interval` operation, but return a NULL value instead of raising an error if interval cannot be created.
`try_make_timestamp(years, months, days, ...)`	Try to create timestamp from years, months, days, hours, mins, secs and timezone fields.
`try_make_timestamp_ltz(years, months, days, ...)`	Try to create the current timestamp with local time zone from years, months, days, hours, mins, secs and timezone fields.
`try_make_timestamp_ntz(years, months, days, ...)`	Try to create local date-time from years, months, days, hours, mins, secs fields.
`try_to_date`	Converts a column to a DateType using the optionally specified format, returning NULL instead of raising an error if the conversion cannot be performed.
`try_to_time(str, format)`	Converts a column into TimeType using the optionally specified format.
`try_to_timestamp(col[, format])`	Parses the col with the format to a timestamp.
`unix_date(col)`	Returns the number of days since 1970-01-01.
`unix_micros(col)`	Returns the number of microseconds since 1970-01-01 00:00:00 UTC.
`unix_millis(col)`	Returns the number of milliseconds since 1970-01-01 00:00:00 UTC.
`unix_seconds(col)`	Returns the number of seconds since 1970-01-01 00:00:00 UTC.
`unix_timestamp([timestamp, format])`	Convert time string with given pattern ('yyyy-MM-dd HH:mm:ss', by default) to Unix time stamp (in seconds), using the default timezone and the default locale, returns null if failed.
`weekday(col)`	Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, ..., 6 = Sunday).
`weekofyear(col)`	Extract the week number of a given date as integer.
`window(timeColumn, windowDuration[, ...])`	Bucketize rows into one or more time windows given a timestamp specifying column.
`window_time(windowColumn)`	Computes the event time from a window column.
`year(col)`	Extract the year of a given date/timestamp as integer.
`years`	Partition transform function: A transform for timestamps and dates to partition data into years.

Hash functions

Function	Description
`crc32(col)`	Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.
`hash(*cols)`	Calculates the hash code of given columns, and returns the result as an int column.
`md5(col)`	Calculates the MD5 digest and returns the value as a 32 character hex string.
`sha(col)`	Returns a sha1 hash value as a hex string of the col.
`sha1(col)`	Returns the hex string result of SHA-1.
`sha2(col, numBits)`	Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512).
`xxhash64(*cols)`	Calculates the hash code of given columns using the 64-bit variant of the xxHash algorithm, and returns the result as a long column.

Collection functions

Function	Description
`aggregate(col, initialValue, merge[, finish])`	Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.
`cardinality(col)`	Returns the length of the array or map stored in the column.
`concat(*cols)`	Concatenates multiple input columns together into a single column.
`element_at(col, extraction)`	Returns the element of array at the given (1-based) index.
`exists(col, f)`	Returns whether a predicate holds for one or more elements in the array.
`filter(col, f)`	Returns an array of elements for which a predicate holds in a given array.
`forall(col, f)`	Returns whether a predicate holds for every element in the array.
`map_filter(col, f)`	Returns a new map column whose key-value pairs satisfy a given predicate function.
`map_zip_with(col1, col2, f)`	Merges two given maps into a single map by applying a function to the key-value pairs.
`reduce(col, initialValue, merge[, finish])`	Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state.
`reverse(col)`	Returns a reversed string or an array with elements in reverse order.
`size(col)`	Returns the length of the array or map stored in the column.
`transform(col, f)`	Returns an array of elements after applying a transformation to each element in the input array.
`transform_keys(col, f)`	Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new keys for the pairs.
`transform_values(col, f)`	Applies a function to every key-value pair in a map and returns a map with the results of those applications as the new values for the pairs.
`try_element_at(col, extraction)`	Returns element of array at given (1-based) index.
`zip_with(left, right, f)`	Merge two given arrays, element-wise, into a single array using a function.

Array functions

Function	Description
`array(*cols)`	Creates a new array column from the input columns or column names.
`array_append(col, value)`	Returns a new array column by appending value to the existing array col.
`array_compact(col)`	Removes null values from the array.
`array_contains(col, value)`	Returns a boolean indicating whether the array contains the given value: null if the array is null, true if the array contains the given value, and false otherwise.
`array_distinct(col)`	Removes duplicate values from the array.
`array_except(col1, col2)`	Returns a new array containing the elements present in col1 but not in col2, without duplicates.
`array_insert(arr, pos, value)`	Inserts an item into a given array at a specified array index.
`array_intersect(col1, col2)`	Returns a new array containing the intersection of elements in col1 and col2, without duplicates.
`array_join(col, delimiter[, null_replacement])`	Returns a string column by concatenating the elements of the input array column using the delimiter.
`array_max(col)`	Returns the maximum value of the array.
`array_min(col)`	Returns the minimum value of the array.
`array_position(col, value)`	Locates the position of the first occurrence of the given value in the given array.
`array_prepend(col, value)`	Returns an array containing the given element as the first element and the rest of the elements from the original array.
`array_remove(col, element)`	Remove all elements that equal to element from the given array.
`array_repeat(col, count)`	Creates an array containing a column repeated count times.
`array_size(col)`	Returns the total number of elements in the array.
`array_sort(col[, comparator])`	Sorts the input array in ascending order.
`array_union(col1, col2)`	Returns a new array containing the union of elements in col1 and col2, without duplicates.
`arrays_overlap(a1, a2)`	Returns a boolean column indicating if the input arrays have common non-null elements, returning true if they do, null if the arrays do not contain any common elements but are not empty and at least one of them contains a null element, and false otherwise.
`arrays_zip(*cols)`	Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
`flatten(col)`	Creates a single array from an array of arrays.
`get(col, index)`	Returns the element of an array at the given (0-based) index.
`sequence(start, stop[, step])`	Generate a sequence of integers from start to stop, incrementing by step.
`shuffle(col[, seed])`	Generates a random permutation of the given array.
`slice(x, start, length)`	Returns a new array column by slicing the input array column from a start index to a specific length.
`sort_array(col[, asc])`	Sorts the input array in ascending or descending order according to the natural ordering of the array elements.

Struct functions

Function	Description
`named_struct(*cols)`	Creates a struct with the given field names and values.
`struct(*cols)`	Creates a new struct column.

Map functions

Function	Description
`create_map(*cols)`	Creates a new map column from an even number of input columns or column references.
`map_concat(*cols)`	Returns the union of all given maps.
`map_contains_key(col, value)`	Returns true if the map contains the key.
`map_entries(col)`	Returns an unordered array of all entries in the given map.
`map_from_arrays(col1, col2)`	Creates a new map from two arrays.
`map_from_entries(col)`	Transforms an array of key-value pair entries (structs with two fields) into a map.
`map_keys(col)`	Returns an unordered array containing the keys of the map.
`map_values(col)`	Returns an unordered array containing the values of the map.
`str_to_map(text[, pairDelim, keyValueDelim])`	Converts a string into a map after splitting the text into key/value pairs using delimiters.

Aggregate functions

Function	Description
`any_value(col[, ignoreNulls])`	Returns some value of col for a group of rows.
`approx_count_distinct(col[, rsd])`	Returns a new Column, which estimates the approximate distinct count of elements in a specified column or a group of columns.
`approx_percentile(col, percentage[, accuracy])`	Returns the approximate percentile of the numeric column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value.
`approx_top_k(col, k, maxItemsTracked)`	Returns the top k most frequently occurring item values in a string, boolean, date, timestamp, or numeric column col along with their approximate counts.
`array_agg(col)`	Returns a list of objects with duplicates.
`avg(col)`	Returns the average of the values in a group.
`bit_and(col)`	Returns the bitwise AND of all non-null input values, or null if none.
`bit_or(col)`	Returns the bitwise OR of all non-null input values, or null if none.
`bit_xor(col)`	Returns the bitwise XOR of all non-null input values, or null if none.
`bitmap_construct_agg(col)`	Returns a bitmap with the positions of the bits set from all the values from the input column.
`bitmap_or_agg(col)`	Returns a bitmap that is the bitwise OR of all of the bitmaps from the input column.
`bool_and(col)`	Returns true if all values of col are true.
`bool_or(col)`	Returns true if at least one value of col is true.
`collect_list(col)`	Collects the values from a column into a list, maintaining duplicates, and returns this list of objects.
`collect_set(col)`	Collects the values from a column into a set, eliminating duplicates, and returns this set of objects.
`corr(col1, col2)`	Returns a new Column for the Pearson Correlation Coefficient for col1 and col2.
`count(col)`	Returns the number of items in a group.
`count_distinct(col, *cols)`	Returns a new Column for distinct count of col or cols.
`count_if(col)`	Returns the number of TRUE values for the col.
`count_min_sketch(col, eps, confidence[, seed])`	Returns a count-min sketch of a column with the given esp, confidence and seed.
`covar_pop(col1, col2)`	Returns a new Column for the population covariance of col1 and col2.
`covar_samp(col1, col2)`	Returns a new Column for the sample covariance of col1 and col2.
`every(col)`	Returns true if all values of col are true.
`first(col[, ignorenulls])`	Returns the first value in a group.
`first_value(col[, ignoreNulls])`	Returns the first value of col for a group of rows.
`grouping(col)`	Indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.
`grouping_id(*cols)`	Returns the level of grouping, equals to
`histogram_numeric(col, nBins)`	Computes a histogram on numeric 'col' using nb bins.
`hll_sketch_agg(col[, lgConfigK])`	Returns the updatable binary representation of the Datasketches HllSketch configured with lgConfigK arg.
`hll_union_agg(col[, allowDifferentLgConfigK])`	Returns the updatable binary representation of the Datasketches HllSketch, generated by merging previously created Datasketches HllSketch instances via a Datasketches Union instance.
`kurtosis(col)`	Returns the kurtosis of the values in a group.
`last(col[, ignorenulls])`	Returns the last value in a group.
`last_value(col[, ignoreNulls])`	Returns the last value of col for a group of rows.
`listagg(col[, delimiter])`	Returns the concatenation of non-null input values, separated by the delimiter.
`listagg_distinct(col[, delimiter])`	Returns the concatenation of distinct non-null input values, separated by the delimiter.
`max(col)`	Returns the maximum value of the expression in a group.
`max_by(col, ord)`	Returns the value from the col parameter that is associated with the maximum value from the ord parameter.
`mean(col)`	Returns the average of the values in a group.
`median(col)`	Returns the median of the values in a group.
`min(col)`	Returns the minimum value of the expression in a group.
`min_by(col, ord)`	Returns the value from the col parameter that is associated with the minimum value from the ord parameter.
`mode(col[, deterministic])`	Returns the most frequent value in a group.
`percentile(col, percentage[, frequency])`	Returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].
`percentile_approx(col, percentage[, accuracy])`	Returns the approximate percentile of the numeric column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value.
`product(col)`	Returns the product of the values in a group.
`regr_avgx(y, x)`	Returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
`regr_avgy(y, x)`	Returns the average of the dependent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
`regr_count(y, x)`	Returns the number of non-null number pairs in a group, where y is the dependent variable and x is the independent variable.
`regr_intercept(y, x)`	Returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
`regr_r2(y, x)`	Returns the coefficient of determination for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
`regr_slope(y, x)`	Returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
`regr_sxx(y, x)`	Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
`regr_sxy(y, x)`	Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
`regr_syy(y, x)`	Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.
`skewness(col)`	Returns the skewness of the values in a group.
`some(col)`	Returns true if at least one value of col is true.
`std(col)`	Alias for `stddev_samp`.
`stddev(col)`	Alias for `stddev_samp`.
`stddev_pop(col)`	Returns population standard deviation of the expression in a group.
`stddev_samp(col)`	Returns the unbiased sample standard deviation of the expression in a group.
`string_agg(col[, delimiter])`	Returns the concatenation of non-null input values, separated by the delimiter.
`string_agg_distinct(col[, delimiter])`	Returns the concatenation of distinct non-null input values, separated by the delimiter.
`sum(col)`	Returns the sum of all values in the expression.
`sum_distinct(col)`	Returns the sum of distinct values in the expression.
`try_avg(col)`	Returns the mean calculated from values of a group and the result is null on overflow.
`try_sum(col)`	Returns the sum calculated from values of a group and the result is null on overflow.
`var_pop(col)`	Returns the population variance of the values in a group.
`var_samp(col)`	Returns the unbiased sample variance of the values in a group.
`variance(col)`	Alias for `var_samp`

Window functions

Function	Description
`cume_dist()`	Returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
`dense_rank()`	Returns the rank of rows within a window partition, without any gaps.
`lag(col[, offset, default])`	Returns the value that is offset rows before the current row, and default if there is less than offset rows before the current row.
`lead(col[, offset, default])`	Returns the value that is offset rows after the current row, and default if there is less than offset rows after the current row.
`nth_value(col, offset[, ignoreNulls])`	Returns the value that is the offsetth row of the window frame (counting from 1), and null if the size of window frame is less than offset rows.
`ntile(n)`	Returns the ntile group id (from 1 to n inclusive) in an ordered window partition.
`percent_rank()`	Returns the relative rank (i.e.
`rank()`	Returns the rank of rows within a window partition.
`row_number()`	Returns a sequential number starting at 1 within a window partition.

Generator functions

Function	Description
`explode(col)`	Returns a new row for each element in the given array or map.
`explode_outer(col)`	Returns a new row for each element in the given array or map.
`inline(col)`	Explodes an array of structs into a table.
`inline_outer(col)`	Explodes an array of structs into a table.
`posexplode(col)`	Returns a new row for each element with position in the given array or map.
`posexplode_outer(col)`	Returns a new row for each element with position in the given array or map.
`stack(*cols)`	Separates col1, ..., colk into n rows.

Partition transformation functions

Function	Description
`bucket`	Partition transform function: A transform for any type that partitions by a hash of the input column.
`partitioning.years(col)`	A transform for timestamps and dates to partition data into years.
`partitioning.months(col)`	A transform for timestamps and dates to partition data into months.
`partitioning.days(col)`	A transform for timestamps and dates to partition data into days.
`partitioning.hours(col)`	A transform for timestamps to partition data into hours.
`partitioning.bucket(numBuckets, col)`	A transform for any type that partitions by a hash of the input column.

H3 geospatial functions

Function	Description
`h3_boundaryasgeojson(col)`	Returns the boundary of an H3 cell in GeoJSON format
`h3_boundaryaswkb(col)`	Returns the boundary of an H3 cell in WKB format
`h3_boundaryaswkt(col)`	Returns the boundary of an H3 cell in WKT format
`h3_centerasgeojson(col)`	Returns the center of an H3 cell in GeoJSON format
`h3_centeraswkb(col)`	Returns the center of an H3 cell in WKB format
`h3_centeraswkt(col)`	Returns the center of an H3 cell in WKT format
`h3_compact(col)`	Compacts the input set of H3 cell IDs as best as possible
`h3_coverash3(col1,col2)`	Returns an array of cell IDs represented as long integers, corresponding to hexagons or pentagons of the specified resolution that minimally cover the input linear or areal geography
`h3_coverash3string(col1,col2)`	Returns an array of cell IDs represented as strings, corresponding to hexagons or pentagons of the specified resolution that minimally cover the input linear or areal geography
`h3_distance(col1,col2)`	Returns the grid distance between two H3 cell IDs
`h3_h3tostring(col)`	Converts an H3 cell ID to a string representing the cell ID as a hexadecimal string
`h3_hexring(col1,col2)`	Returns an array of H3 cell IDs that form a hollow hexagonal ring centered at the origin H3 cell and that are at grid distance k from the origin H3 cell
`h3_ischildof(col1,col2)`	Returns `True` if the first H3 cell ID is a child of the second H3 cell ID
`h3_ispentagon(col)`	Returns `True` if the input H3 cell ID represents a pentagon
`h3_isvalid(col)`	Returns `True` if the input represents a valid H3 cell ID
`h3_kring(col1,col2)`	Returns the H3 cell IDs that are within (grid) distance k of the origin cell ID
`h3_kringdistances(col1,col2)`	Returns all H3 cell IDs (represented as long integers or strings) within grid distance k from the origin H3 cell ID, along with their distance from the origin H3 cell ID
`h3_longlatash3(col1,col2,col3)`	Returns the H3 cell ID (as a BIGINT) corresponding to the provided longitude and latitude at the specified resolution
`h3_longlatash3string(col1,col2,col3)`	Returns the H3 cell ID (as a string) corresponding to the provided longitude and latitude at the specified resolution
`h3_maxchild(col1,col2)`	Returns the child of maximum value of the input H3 cell at the specified resolution
`h3_minchild(col1,col2)`	Returns the child of minimum value of the input H3 cell at the specified resolution
`h3_pointash3(col1,col2)`	Returns the H3 cell ID (as a BIGINT) corresponding to the provided point at the specified resolution
`h3_pointash3string(col1,col2)`	Returns the H3 cell ID (as a string) corresponding to the provided point at the specified resolution
`h3_polyfillash3(col1,col2)`	Returns an array of cell IDs represented as long integers, corresponding to hexagons or pentagons of the specified resolution that are contained by the input areal geography
`h3_polyfillash3string(col1,col2)`	Returns an array of cell IDs represented as strings, corresponding to hexagons or pentagons of the specified resolution that are contained by the input areal geography
`h3_resolution(col)`	Returns the resolution of the H3 cell ID
`h3_stringtoh3(col)`	Converts the string representation H3 cell ID to its big integer representation
`h3_tessellateaswkb(col1,col2)`	Returns an array of structs representing the chips covering geography at the specified resolution
`h3_tochildren(col1,col2)`	Returns the children H3 cell IDs of the input H3 cell ID at the specified resolution
`h3_toparent(col1,col2)`	Returns the parent H3 cell ID of the input H3 cell ID at the specified resolution
`h3_try_coverash3(col1,col2)`	Returns an array of cell IDs represented as long integers, corresponding to hexagons or pentagons of the specified resolution that minimally cover the input linear or areal geography
`h3_try_coverash3string(col1,col2)`	Returns an array of cell IDs represented as strings, corresponding to hexagons or pentagons of the specified resolution that minimally cover the input linear or areal geography
`h3_try_distance(col1,col2)`	Returns the grid distance between two H3 cell IDs of the same resolution, or `None` if the distance if undefined
`h3_try_polyfillash3(col1,col2)`	Returns an array of cell IDs represented as long integers, corresponding to hexagons or pentagons of the specified resolution that are contained by the input areal geography
`h3_try_polyfillash3string(col1,col2)`	Returns an array of cell IDs represented as strings, corresponding to hexagons or pentagons of the specified resolution that are contained by the input areal geography
`h3_try_tessellateaswkb(col1,col2)`	Returns an array of structs representing the chips covering geography at the specified resolution, or null if the geometry is invalid
`h3_try_validate(col)`	Returns the input value if it is a valid H3 cell or None otherwise
`h3_uncompact(col1,col2)`	Uncompacts the input set of H3 cell IDs to the specified resolution
`h3_validate(col)`	Returns the input value if it is a valid H3 cell or emits an error otherwise

ST geospatial functions

Function	Description
`st_addpoint(col1,col2,col3)`	Adds a new point to the n-th position in the input linestring Geography or Geometry
`st_area(col)`	Returns the area of the input geography or geometry
`st_azimuth(col1,col2)`	Returns the north-based azimuth from the first point to the second in radians in `[0, 2π)`
`st_asbinary(col1,col2)`	Returns the input Geography or Geometry value in WKB format
`st_asewkb(col1,col2)`	Returns the input Geometry value in EWKB format
`st_asewkt(col)`	Returns the input Geography or Geometry value in EWKT format
`st_asgeojson(col)`	Returns the input Geography or Geometry value in GeoJSON format
`st_astext(col)`	Returns the input Geography or Geometry value in WKT format
`st_aswkb(col1,col2)`	Returns the input Geography or Geometry value in WKB format
`st_aswkt(col)`	Returns the input Geography or Geometry value in WKT format
`st_buffer(col1,col2)`	Returns the buffer of the input geometry using the specified radius
`st_boundary(col)`	Returns the boundary of the input geometry as a geometry
`st_centroid(col)`	Returns the centroid of the input geometry as a 2D point geometry
`st_closestpoint(col1,col2)`	Returns the 2D projection of a point on the first geometry that is closest to the second geometry
`st_concavehull(col1,col2,col3)`	Returns the concave hull of the input geometry as a geometry using the specified length ratio
`st_contains(col1,col2)`	Returns `True` if the first geometry contains the second geometry
`st_convexhull(col)`	Returns the convex hull of the input geometry as a geometry
`st_covers(col1,col2)`	Returns `True` if the first geometry covers the second geometry
`st_difference(col1,col2)`	Returns the point-set different of the two input geometries as a 2D geometry
`st_dimension(col)`	Returns the topological dimension of the 2D projection of the input geometry
`st_disjoint(col1,col2)`	Returns `True` if the two geometries are disjoint
`st_distance(col1,col2)`	Returns the 2D Cartesian distance between the two input geometries
`st_distancesphere(col1,col2)`	Returns the spherical distance (in meters) between two point geometries, measured on a sphere whose radius is the mean radius of the WGS84 ellipsoid
`st_distancespheroid(col1,col2)`	Returns the geodesic distance (in meters) between two point geometries on the WGS84 ellipsoid
`st_dump(col)`	Returns an array containing the single geometries in the input geometry
`st_dwithin(col1,col2,col3)`	Returns `True` if the 2D Cartesian distance between the two input geometries is smaller than or equal to the input distance
`st_endpoint(col)`	Returns the last point of the input linestring, or `None` if it doesn't exist
`st_envelope(col)`	Returns the 2D Cartesian axis-aligned minimum bounding box (envelope) of the input non-empty geometry, as a geometry
`st_envelope_agg(col)`	Returns the envelope of all the geometries in the column, or `None` if the column has zero rows, or contains only `None` values
`st_equals(col1,col2)`	Returns `True` if the two geometries are geometrically equal
`st_exteriorring(col)`	Returns the exterior ring (shell), as a linestring, of the input Geography or Geometry value representing a polygon
`st_flipcoordinates(col)`	Swaps X and Y coordinates of the input geometry
`st_geogfromgeojson(col)`	Parses the GeoJSON description and returns the corresponding Geography value
`st_geogfromtext(col)`	Parses the WKT description and returns the corresponding Geography value
`st_geogfromwkb(col)`	Parses the input WKB description and returns the corresponding Geography value
`st_geogfromewkt(col)`	Parses an Extended Well-Known Text (EWKT) description and returns the corresponding Geography value
`st_geogfromwkt(col)`	Parses the WKT description and returns the corresponding Geography value
`st_geohash(col1,col2)`	Returns the geohash of the input Geometry
`st_geometryn(col1,col2)`	Returns the 1-based n-th element of the input multi geometry, or `None` if it doesn't exist
`st_geometrytype(col)`	Returns the type of the input Geography or Geometry value as a string
`st_geomfromewkb(col)`	Parses the input EWKB description and returns the corresponding Geometry value
`st_geomfromewkt(col)`	Parses an Extended Well-Known Text (EWKT) description and returns the corresponding Geometry value
`st_geomfromgeohash(col)`	Returns the geohash grid box corresponding to the input geohash value as a 2D polygon geometry
`st_geomfromgeojson(col)`	Parses the GeoJSON description and returns the corresponding Geometry value
`st_geomfromtext(col1,col2)`	Parses the WKT description and returns the corresponding Geometry value
`st_geomfromwkb(col1,col2)`	Parses the input WKB description and returns the corresponding Geometry value
`st_geomfromwkt(col1,col2)`	Parses the WKT description and returns the corresponding Geometry value
`st_intersection(col1,col2)`	Returns the point-set intersection of the two input geometries as a 2D geometry
`st_intersects(col1,col2)`	Returns `True` if the two geometries intersect
`st_interiorringn(col1,col2)`	Returns the n-th interior ring of the input polygon as a linestring
`st_isempty(col)`	Returns `True` if the input Geography or Geometry value does not contain any non-empty points
`st_isvalid(col)`	Returns `True` if the input geometry is a valid geometry in the OGC sense
`st_length(col)`	Returns the length of the input geometry or geography value
`st_m(col)`	Returns the M coordinate of the input point geometry, or `None` if the input point geometry is empty or if it does not have an M coordinate
`st_makeline(col)`	Returns a linestring geometry whose points are the non-empty points of the geometries in the input array of geometries, which are expected to be points, linestrings, or multipoints
`st_makepolygon(col1,col2)`	Constructs a polygon from the input outer boundary and optional array of inner boundaries, represented as closed linestrings
`st_multi(col)`	Returns the input Geography or Geometry value as an equivalent multi geospatial value, keeping the original SRID
`st_ndims(col)`	Returns the coordinate dimension of the input Geography or Geometry value
`st_npoints(col)`	Returns the number of non-empty points in the input Geography or Geometry value
`st_numgeometries(col)`	Returns the number of geometries in the input geometry
`st_numinteriorrings(col)`	Returns the number of interior rings of the input polygon
`st_perimeter(col)`	Returns the perimeter of the input geography or geometry
`st_point(col1,col2,col3)`	Returns a 2D point Geometry with the given x and y coordinates and SRID value
`st_pointfromgeohash(col)`	Returns the center of the geohash grid box corresponding to the input geohash value as a 2D point geometry
`st_pointn(col1,col2)`	Returns the 1-based n-th point of the input linestring, or `None` if it doesn't exist
`st_removepoint(col1,col2)`	Removes the n-th point from the input linestring Geography or Geometry
`st_reverse(col)`	Reverses the order of vertices in the input Geography or Geometry value
`st_rotate(col1,col2)`	Rotates the input geometry around the Z axis by the given rotation angle (in radians)
`st_scale(col1,col2,col3,col4)`	Scales the input geometry in the X, Y, and Z (optional) directions using the given factors
`st_setpoint(col1,col2,col3)`	Sets the n-th point in the input linestring Geography or Geometry
`st_setsrid(col1,col2)`	Returns a new Geometry value whose SRID is the specified SRID value
`st_simplify(col1,col2)`	Simplifies the input geometry using the Douglas-Peucker algorithm
`st_srid(col)`	Returns the SRID of the input geospatial value
`st_startpoint(col)`	Returns the first point of the input linestring, or `None` if it doesn't exist
`st_touches(col1,col2)`	Returns `True` if the two geometries touch each other
`st_transform(col1,col2)`	Transforms the X and Y coordinates of the input geometry to the coordinate reference system (CRS) described by the provided SRID value
`st_translate(col1,col2,col3,col4)`	Translates the input geometry in the X, Y, and Z (optional) directions using the provided offsets
`st_union(col1,col2)`	Returns the point-set union of the two input geometries as a 2D geometry
`st_union_agg(col)`	Returns the point-wise union of all the geometries in the column, or `None` if the column has zero rows, or contains only `None` values
`st_within(col1,col2)`	Returns `True` if the first geometry is within the second geometry
`st_x(col)`	Returns the X coordinate of the input point geometry, or `None` if the input point geometry is empty
`st_xmax(col)`	Returns the maximum X coordinate of the input geometry, or `None` if the input geometry is empty
`st_xmin(col)`	Returns the minimum X coordinate of the input geometry, or `None` if the input geometry is empty
`st_y(col)`	Returns the Y coordinate of the input point geometry, or `None` if the input point geometry is empty
`st_ymax(col)`	Returns the maximum Y coordinate of the input geometry, or `None` if the input geometry is empty
`st_ymin(col)`	Returns the minimum Y coordinate of the input geometry, or `None` if the input geometry is empty
`st_z(col)`	Returns the Z coordinate of the input point geometry, or `None` if the input point geometry is empty or if it does not have a Z coordinate
`st_zmax(col)`	Returns the maximum Z coordinate of the input geometry, or `None` if the input geometry is empty or does not contain Z coordinates
`st_zmin(col)`	Returns the minimum Z coordinate of the input geometry, or `None` if the input geometry is empty or does not contain Z coordinates
`to_geography(col)`	Parses the input BINARY or string value and returns the corresponding Geography value
`to_geometry(col)`	Parses the input BINARY or string value and returns the corresponding Geometry value
`try_to_geography(col)`	Parses the input BINARY or string value and returns the corresponding Geography value, or `None` if the input is invalid
`try_to_geometry(col)`	Parses the input BINARY or string value and returns the corresponding Geometry value, or `None` if the input is invalid

CSV functions

Function	Description
`from_csv(col, schema[, options])`	Parses a column containing a CSV string into a row with the specified schema.
`schema_of_csv(csv[, options])`	Parses a CSV string and infers its schema in DDL format.
`to_csv(col[, options])`	Converts a column containing a StructType into a CSV string.

JSON functions

Function	Description
`from_json(col, schema[, options])`	Parses a column containing a JSON string into a MapType with StringType as keys type, StructType or ArrayType with the specified schema.
`get_json_object(col, path)`	Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.
`json_array_length(col)`	Returns the number of elements in the outermost JSON array.
`json_object_keys(col)`	Returns all the keys of the outermost JSON object as an array.
`json_tuple(col, *fields)`	Creates a new row for a json column according to the given field names.
`schema_of_json(json[, options])`	Parses a JSON string and infers its schema in DDL format.
`to_json(col[, options])`	Converts a column containing a StructType, ArrayType or a MapType into a JSON string.

Variant functions

Function	Description
`ai_parse_document`	Parses a column containing binary data (blob) and returns a VariantType. Throws an exception if the input blob is invalid.
`is_variant_null(v)`	Check if a variant value is a variant null.
`parse_json(col)`	Parses a column containing a JSON string into a VariantType.
`schema_of_variant(v)`	Returns schema in the SQL format of a variant.
`schema_of_variant_agg(v)`	Returns the merged schema in the SQL format of a variant column.
`try_variant_get(v, path, targetType)`	Extracts a sub-variant from v according to path, and then cast the sub-variant to targetType.
`variant_get(v, path, targetType)`	Extracts a sub-variant from v according to path, and then cast the sub-variant to targetType.
`try_parse_json(col)`	Parses a column containing a JSON string into a VariantType.
`to_variant_object(col)`	Converts a column containing nested inputs (array/map/struct) into a variants where maps and structs are converted to variant objects which are unordered unlike SQL structs.

XML functions

Function	Description
`from_xml(col, schema[, options])`	Parses a column containing a XML string to a row with the specified schema.
`schema_of_xml(xml[, options])`	Parses a XML string and infers its schema in DDL format.
`to_xml(col[, options])`	Converts a column containing a StructType into a XML string.
`xpath(xml, path)`	Returns a string array of values within the nodes of xml that match the XPath expression.
`xpath_boolean(xml, path)`	Returns true if the XPath expression evaluates to true, or if a matching node is found.
`xpath_double(xml, path)`	Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
`xpath_float(xml, path)`	Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
`xpath_int(xml, path)`	Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
`xpath_long(xml, path)`	Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
`xpath_number(xml, path)`	Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
`xpath_short(xml, path)`	Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
`xpath_string(xml, path)`	Returns the text contents of the first xml node that matches the XPath expression.

URL functions

Function	Description
`parse_url(url, partToExtract[, key])`	Extracts a specified part from a URL.
`try_parse_url(url, partToExtract[, key])`	Try to perform the `parse_url` operation, but return a NULL value instead of raising an error if the parsing cannot be performed.
`url_decode(str)`	Decodes a URL-encoded string in `application/x-www-form-urlencoded` format to its original format.
`url_encode(str)`	Encodes a string into a URL-encoded string in 'application/x-www-form-urlencoded' format.
`try_url_decode(str)`	Try to perform the `url_decode` operation, but return a NULL value instead of raising an error if the decoding cannot be performed.

Miscellaneous functions

Function	Description
`aes_decrypt(input, key[, mode, padding, aad])`	Returns a decrypted value of input using AES in mode with padding.
`aes_encrypt(input, key[, mode, padding, iv, aad])`	Returns an encrypted value of input using AES in given mode with the specified padding.
`assert_true(col[, errMsg])`	Returns null if the input column is true; throws an exception with the provided error message otherwise.
`bitmap_and_agg(col)`	Returns a bitmap that is the bitwise AND of all of the bitmaps from the input column.
`bitmap_bit_position(col)`	Returns the bit position for the given input column.
`bitmap_bucket_number(col)`	Returns the bucket number for the given input column.
`bitmap_count(col)`	Returns the number of set bits in the input bitmap.
`current_catalog()`	Returns the current catalog.
`current_database()`	Returns the current database.
`current_schema()`	Returns the current schema.
`current_user()`	Returns the current user.
`hll_sketch_estimate(col)`	Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
`hll_union(col1, col2[, allowDifferentLgConfigK])`	Merges two binary representations of Datasketches HllSketch objects, using a Datasketches Union object.
`input_file_block_length()`	Returns the length of the block being read, or -1 if not available.
`input_file_block_start()`	Returns the start offset of the block being read, or -1 if not available.
`input_file_name()`	Creates a string column for the file name of the current Spark task.
`java_method(*cols)`	Calls a method with reflection.
`kll_merge_agg_bigint(col[, k])`	Aggregate function: merges binary KllLongsSketch representations and returns the merged sketch.
`kll_merge_agg_double(col[, k])`	Aggregate function: merges binary KllDoublesSketch representations and returns the merged sketch.
`kll_merge_agg_float(col[, k])`	Aggregate function: merges binary KllFloatsSketch representations and returns the merged sketch.
`kll_sketch_agg_bigint(col[, k])`	Aggregate function: returns the compact binary representation of the Datasketches KllLongsSketch built with the values in the input column.
`kll_sketch_agg_double(col[, k])`	Aggregate function: returns the compact binary representation of the Datasketches KllDoublesSketch built with the values in the input column.
`kll_sketch_agg_float(col[, k])`	Aggregate function: returns the compact binary representation of the Datasketches KllFloatsSketch built with the values in the input column.
`kll_sketch_get_n_bigint(col)`	Returns the number of items collected in the KLL bigint sketch.
`kll_sketch_get_n_double(col)`	Returns the number of items collected in the KLL double sketch.
`kll_sketch_get_n_float(col)`	Returns the number of items collected in the KLL float sketch.
`kll_sketch_get_quantile_bigint(sketch, rank)`	Extracts a quantile value from a KLL bigint sketch given an input rank value.
`kll_sketch_get_quantile_double(sketch, rank)`	Extracts a quantile value from a KLL double sketch given an input rank value.
`kll_sketch_get_quantile_float(sketch, rank)`	Extracts a quantile value from a KLL float sketch given an input rank value.
`kll_sketch_get_rank_bigint(sketch, quantile)`	Extracts a rank value from a KLL bigint sketch given an input quantile value.
`kll_sketch_get_rank_double(sketch, quantile)`	Extracts a rank value from a KLL double sketch given an input quantile value.
`kll_sketch_get_rank_float(sketch, quantile)`	Extracts a rank value from a KLL float sketch given an input quantile value.
`kll_sketch_merge_bigint(left, right)`	Merges two KLL bigint sketch buffers together into one.
`kll_sketch_merge_double(left, right)`	Merges two KLL double sketch buffers together into one.
`kll_sketch_merge_float(left, right)`	Merges two KLL float sketch buffers together into one.
`kll_sketch_to_string_bigint(col)`	Returns a string with human readable summary information about the KLL bigint sketch.
`kll_sketch_to_string_double(col)`	Returns a string with human readable summary information about the KLL double sketch.
`kll_sketch_to_string_float(col)`	Returns a string with human readable summary information about the KLL float sketch.
`monotonically_increasing_id()`	Generates monotonically increasing 64-bit integers.
`raise_error(errMsg)`	Throws an exception with the provided error message.
`reflect(*cols)`	Calls a method with reflection.
`session_user()`	Returns the user name of current execution context.
`spark_partition_id()`	A column for partition ID.
`theta_difference(col1, col2)`	Returns the set difference of two binary representations of Datasketches Theta Sketch objects (elements in first sketch but not in second), using a Datasketches ANotB object.
`theta_intersection(col1, col2)`	Returns the intersection of two binary representations of Datasketches Theta Sketch objects, using a Datasketches Intersection object.
`theta_intersection_agg(col)`	Aggregate function: returns the compact binary representation of the Datasketches Theta Sketch that is the intersection of the Theta sketches in the input column.
`theta_sketch_agg(col[, lgNomEntries])`	Aggregate function: returns the compact binary representation of the Datasketches Theta Sketch with the values in the input column configured with lgNomEntries nominal entries.
`theta_sketch_estimate(col)`	Returns the estimated number of unique values given the binary representation of a Datasketches Theta Sketch.
`theta_union(col1, col2[, lgNomEntries])`	Merges two binary representations of Datasketches Theta Sketch objects, using a Datasketches Union object.
`theta_union_agg(col[, lgNomEntries])`	Aggregate function: returns the compact binary representation of the Datasketches Theta Sketch that is the union of the Theta sketches in the input column.
`try_aes_decrypt(input, key[, mode, padding, aad])`	Try to perform the `aes_decrypt` operation, but return a NULL value instead of raising an error if the decryption cannot be performed.
`try_reflect(*cols)`	Try to perform a `reflect` operation, but return a NULL value instead of raising an error if the invoke method thrown exception.
`typeof(col)`	Return DDL-formatted type string for the data type of the input.
`user()`	Returns the current user.
`uuid()`	Returns an universally unique identifier (UUID) string.
`version()`	Returns the Spark version.

UDF, UDTF, UDT functions

Function	Description
`arrow_udtf`	Creates a PyArrow-native user defined table function (UDTF). This function provides a PyArrow-native interface for UDTFs, where the eval method receives PyArrow RecordBatches or Arrays and returns an Iterator of PyArrow Tables or RecordBatches. This enables true vectorized computation without row-by-row processing overhead.
`call_udf(udfName, *cols)`	Call a user-defined function.
`pandas_udf([f, returnType, functionType])`	Creates a pandas user defined function (a.k.a.
`udf([f, returnType, useArrow])`	Creates a user defined function (UDF).
`udtf([cls, returnType, useArrow])`	Creates a user defined table function (UDTF).
`unwrap_udt(col)`	Unwrap UDT data type column into its underlying type.

Table-valued functions

Function	Description
`TableValuedFunction.collations()`	Get all of the Spark SQL string collations.
`TableValuedFunction.explode(collection)`	Returns a DataFrame containing a new row for each element in the given array or map.
`TableValuedFunction.explode_outer(collection)`	Returns a DataFrame containing a new row for each element with position in the given array or map.
`TableValuedFunction.inline(input)`	Explodes an array of structs into a table.
`TableValuedFunction.inline_outer(input)`	Explodes an array of structs into a table.
`TableValuedFunction.json_tuple(input, *fields)`	Creates a new row for a json column according to the given field names.
`TableValuedFunction.posexplode(collection)`	Returns a DataFrame containing a new row for each element with position in the given array or map.
`TableValuedFunction.posexplode_outer(collection)`	Returns a DataFrame containing a new row for each element with position in the given array or map.
`TableValuedFunction.python_worker_logs()`	Returns a DataFrame of logs collected from Python workers.
`TableValuedFunction.range(start[, end, ...])`	Create a DataFrame with single pyspark.sql.types.LongType column named id, containing elements in a range from start to end (exclusive) with step value step.
`TableValuedFunction.sql_keywords()`	Get Spark SQL keywords.
`TableValuedFunction.stack(n, *fields)`	Separates col1, ..., colk into n rows.
`TableValuedFunction.variant_explode(input)`	Separates a variant object/array into multiple rows containing its fields/elements.
`TableValuedFunction.variant_explode_outer(input)`	Separates a variant object/array into multiple rows containing its fields/elements.

Compression functions

Function	Description
`try_zstd_decompress`	Attempts to return the decompressed value of expr using Zstandard. Supports data compressed in both single-pass mode and streaming mode. On decompression failure, it returns NULL.
`zstd_compress`	Returns a compressed value of expr using Zstandard with the specified compression level. The default level is 3. Uses single-pass mode by default.
`zstd_decompress`	Returns the decompressed value of expr using Zstandard. Supports data compressed in both single-pass mode and streaming mode. On decompression failure, it throws an exception.

Normal functions​

Sort functions​

Conditional functions​

Pattern matching functions​

Mathematical functions​

String functions​

Bitwise functions​

Date/time functions​

Hash functions​

Collection functions​

Array functions​

Struct functions​

Map functions​

Aggregate functions​

Window functions​

Generator functions​

Partition transformation functions​

H3 geospatial functions​

ST geospatial functions​

CSV functions​

JSON functions​

Variant functions​

XML functions​

URL functions​

Miscellaneous functions​

UDF, UDTF, UDT functions​

Table-valued functions​

Compression functions​

Normal functions

Sort functions

Conditional functions

Pattern matching functions

Mathematical functions

String functions

Bitwise functions

Date/time functions

Hash functions

Collection functions

Array functions

Struct functions

Map functions

Aggregate functions

Window functions

Generator functions

Partition transformation functions

H3 geospatial functions

ST geospatial functions

CSV functions

JSON functions

Variant functions

XML functions

URL functions

Miscellaneous functions

UDF, UDTF, UDT functions

Table-valued functions

Compression functions