ANSI_MODE configuration parameter (Databricks SQL)

The ANSI_MODE configuration parameter controls key behaviors of built-in functions and cast operations.

Settings

  • TRUE

    Follows the SQL standard in how it deals with certain arithmetic operations and type conversions, similar to most databases and data warehouses. Following this standard promotes better data quality, integrity, and portability.

  • FALSE

    Databricks SQL uses Hive-compatible behavior.

You can set this parameter at the session level using the SET statement and at the global level using SQL configuration parameters or Global SQL Endpoints API.

System default

The system default value is FALSE.

Detailed description

The Databricks SQL reference documentation describes SQL standard behavior.

The following sections describe the differences between ANSI_MODE TRUE (ANSI mode) and FALSE (non-ANSI mode).

Operators

In non-ANSI mode, arithmetic operations performed on numeric types may return overflowed values or NULL, while in ANSI mode such operations return an error.

Operator Description Example ANSI_MODE = true ANSI_MODE = false
dividend / divisor Returns dividend divided by divisor. 1/0 Error NULL
- expr Returns the negated value of expr. -(-128y) Error -128y (Overflow)
expr1 - expr2 Returns the subtraction of expr2 from expr1. -128y - 1y Error 127y (Overflow)
expr1 + expr2 Returns the sum of expr1 and expr2. 127y + 1y Error -128y (Overflow)
dividend % divisor Returns the remainder after dividend / divisor. 1 % 0 Error NULL
multiplier * multiplicand Returns multiplier multiplied by multiplicand. 100y * 100y Error 16y (Overflow)
arrayExpr[index] Returns the element of an arrayExpr at index. Invalid array index Error NULL
mapExpr[key] Returns the value of mapExpr for key. Invalid map key Error NULL
divisor div dividend Returns the integral part of the division of divisor by dividend. 1 div 0 Error NULL

Functions

The behavior of some built-in functions can be different under ANSI mode vs non-ANSI mode under the conditions specified below.

Operator Description Condition ANSI_MODE = true ANSI_MODE = false
abs(expr) Returns the absolute value of the numeric value in expr. abs(-128y) Error -128y (Overflow)
element_at(mapExpr, key) Returns the value of mapExpr for key. Invalid map key Error NULL
element_at(arrayExpr, index) Returns the element of an arrayExpr at index. Invalid array index Error NULL
elt(index, expr1 [, …] ) Returns the nth expression. Invalid index Error NULL
make_date(y,m,d) Creates a date from year, month, and day fields. Invalid result date Error NULL
make_timestamp(y,m,d,h,mi,s[,tz]) Creates a timestamp from fields. Invalid result timestamp Error NULL
make_interval(y,m,w,d,h,mi,s) Creates an interval from fields. Invalid result interval Error NULL
mod(dividend, divisor) Returns the remainder after dividend / divisor. mod(1, 0) Error NULL
next_day(expr,dayOfWeek) Returns the first date which is later than expr and named as in dayOfWeek. Invalid day of week Error NULL
parse_url(url, partToExtract[, key]) Extracts a part from url. Invalid URL Error NULL
pmod(dividend, divisor) Returns the positive remainder after dividend / divisor. pmod(1, 0) Error NULL
size(expr) Returns the cardinality of expr. size(NULL) NULL -1
to_date(expr[,fmt]) Returns expr cast to a date using an optional formatting. Invalid expr or format string Error NULL
to_timestamp(expr[,fmt]) Returns expr cast to a timestamp using an optional formatting. Invalid expr or format string Error NULL
to_unix_timestamp(expr[,fmt]) Returns the timestamp in expr as a UNIX timestamp. Invalid expr or format string Error NULL
unix_timestamp([expr[, fmt]]) Returns the UNIX timestamp of current or specified time. Invalid expr or format string Error NULL

Casting rules

The rules and behaviors regarding CAST are stricter in ANSI mode. They can be divided into the following three categories:

Compile-time conversion rules

Source type Target type Example ANSI_MODE = true ANSI_MODE = false
Boolean Timestamp cast(TRUE AS TIMESTAMP) Error 1970-01-01 00:00:00.000001 UTC
Date Boolean cast(DATE'2001-08-09' AS BOOLEAN) Error NULL
Timestamp Boolean cast(TIMESTAMP'1970-01-01 00:00:00Z' AS BOOLEAN) Error FALSE
Integral numeric Binary cast(15 AS BINARY) Error binary representation

Runtime errors

Source type Target type Condition Example ANSI_MODE = true ANSI_MODE = false
String Non-string Invalid input cast('a' AS INTEGER) Error NULL
Array, Struct, Map Array, Struct, Map Invalid input cast(ARRAY('1',’2','3') AS ARRAY<DATE>) Error NULL
Numeric Numeric Overflow cast(12345 AS BYTE) Error NULL

Note

For each of these casts you can use try_cast instead of cast to return NULL rather than of an error.

Implicit type coercion rules

Under ANSI_MODE = TRUE, Databricks SQL uses clear SQL data type casting rules.

By contrast ANSI_MODE = FALSE is inconsistent and more lenient. For example:

  • When using a STRING type with any arithmetic operator, the string is implicitly cast to DOUBLE.
  • When comparing a STRING to any numeric type the string is implicitly cast to the type it compares to.
  • When performing a UNION, COALESCE, or other operations where a least common type must be found all types are cast to STRING if there is any STRING type present.
  • When Databricks SQL resolves to a function that accepts only non-STRING data types, it still attempts to crosscast STRING arguments.

Databricks recommends using the explicit cast or try_cast function instead of relying on ANSI_MODE = FALSE.

Examples

> SET ansi_mode = true;

> SELECT cast(12345 AS TINYINT);
  Casting 12345 to tinyint causes overflow

> SELECT cast('a' AS INTEGER);
  Invalid input syntax for type numeric: a.
  To return NULL instead, use 'try_cast'

> SELECT try_cast('a' AS INTEGER);
  NULL

> SELECT substring('hello', c1) FROM VALUES('3') AS T(c1);
  Cannot resolve 'substring('hello', T.c1, 2147483647)' due to data type mismatch:
  argument 2 requires int type, however, 'T.c1' is of string type.

> SELECT c1 + c2 FROM VALUES('5', '7.6') AS T(c1, c2);
  Cannot resolve '(T.c1 + T.c2)' due to data type mismatch:
  '(T.c1 + T.c2)' requires (numeric or interval day to second or interval year to month or interval) type, not string

> SELECT c1 = c2 FROM VALUES(5, '10') AS T(c1, c2);
  Error in query: cannot resolve '(T.c1 = T.c2)' due to data type mismatch:
  differing types in '(T.c1 = T.c2)' (int and string).
> SET ansi_mode = false;

-- Silent binary integer overflow
> SELECT cast(12345 AS TINYINT);
57

-- Returns NULL instead of an error
> SELECT cast('a' AS INTEGER);
  NULL

-- try_cast() is safe for both modes
> SELECT try_cast('a' AS INTEGER);
  NULL

-- Implicitly casts STRING to INTEGER, risking runtime errors.
> SELECT substring('hello', c1) FROM VALUES('3') AS T(c1);
  llo

-- Implicitly casts STRING to DOUBLE, risking runtime errors.
> SELECT c1 + c2 FROM VALUES(5, '7.6') AS T(c1, c2);
  12.6

-- Implicitly casts string to integer equating 10 with 10.1
> SELECT c1 = c2 FROM VALUES(10, '10.1') AS T(c1, c2);
  true