ANSI compliance

Spark SQL has two experimental options to support compliance with the ANSI SQL standard: spark.sql.ansi.enabled and spark.sql.storeAssignmentPolicy.

When spark.sql.ansi.enabled is set to true, Spark SQL follows the standard in basic behaviors (for example, arithmetic operations, type conversion, SQL functions and SQL parsing). Moreover, Spark SQL has an independent option to control implicit casting behaviors when inserting rows in a table. The casting behaviors are defined as store assignment rules in the standard.

When spark.sql.storeAssignmentPolicy is set to ANSI, Spark SQL complies with the ANSI store assignment rules. This is a separate configuration because its default value is ANSI, while the configuration spark.sql.ansi.enabled is disabled by default.

The following table summarizes the behavior:

Property Name Default Meaning
spark.sql.ansi.enabled false

(Experimental) When true, Spark attempts to conform to the ANSI SQL specification:

  • Throws a runtime exception if an overflow occurs in any operation on an integer or decimal field.
  • Forbids using the reserved keywords of ANSI SQL as identifiers in the SQL parser.
spark.sql.storeAssignmentPolicy ANSI

(Experimental) When inserting a value into a column with a different data type, Spark performs type coercion. There are three policies for the type coercion rules: ANSI, legacy, and strict.

  • ANSI: Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL. It disallows certain unreasonable type conversions such as converting string to int or double to boolean.
  • legacy: Spark allows the type coercion as long as it is a valid Cast, which is very loose. For example, converting string to int or double to boolean is allowed. It is also the only behavior in Spark 2.x and it is compatible with Hive.
  • strict: Spark doesn’t allow any possible precision loss or data truncation in type coercion, for example, converting double to int or decimal to double is not allowed.

The following subsections present behavior changes in arithmetic operations, type conversions, and SQL parsing when ANSI mode is enabled.

Arithmetic operations

In Spark SQL, arithmetic operations performed on numeric types (with the exception of decimal) are not checked for overflows by default. This means that in case an operation causes overflows, the result is the same with the corresponding operation in a Java or Scala program (For example, if the sum of 2 integers is higher than the maximum value representable, the result is a negative number). On the other hand, Spark SQL returns null for decimal overflows. When spark.sql.ansi.enabled is set to true and an overflow occurs in numeric and interval arithmetic operations, it throws an arithmetic exception at runtime.

-- `spark.sql.ansi.enabled=true`
SELECT 2147483647 + 1;
java.lang.ArithmeticException: integer overflow

-- `spark.sql.ansi.enabled=false`
SELECT 2147483647 + 1;
+----------------+
|(2147483647 + 1)|
+----------------+
|     -2147483648|
+----------------+

Type conversion

Spark SQL has three kinds of type conversions: explicit casting, type coercion, and store assignment casting. When spark.sql.ansi.enabled is set to true, explicit casting by CAST syntax throws a runtime exception for illegal cast patterns defined in the standard, e.g. casts from a string to an integer. On the other hand, INSERT INTO syntax throws an analysis exception when the ANSI mode enabled via spark.sql.storeAssignmentPolicy=ANSI.

Currently, the ANSI mode affects explicit casting and assignment casting only. In future releases, the behavior of type coercion might change along with the other two type conversion rules.

-- Examples of explicit casting

-- `spark.sql.ansi.enabled=true`
SELECT CAST('a' AS INT);
java.lang.NumberFormatException: invalid input syntax for type numeric: a

SELECT CAST(2147483648L AS INT);
java.lang.ArithmeticException: Casting 2147483648 to int causes overflow

-- `spark.sql.ansi.enabled=false` (This is a default behavior)
SELECT CAST('a' AS INT);
+--------------+
|CAST(a AS INT)|
+--------------+
|          null|
+--------------+

SELECT CAST(2147483648L AS INT);
+-----------------------+
|CAST(2147483648 AS INT)|
+-----------------------+
|            -2147483648|
+-----------------------+

-- Examples of store assignment rules
CREATE TABLE t (v INT);

-- `spark.sql.storeAssignmentPolicy=ANSI`
INSERT INTO t VALUES ('1');
org.apache.spark.sql.AnalysisException: Cannot write incompatible data to table '`default`.`t`':
- Cannot safely cast 'v': string to int;

-- `spark.sql.storeAssignmentPolicy=LEGACY` (This is a legacy behavior until Spark 2.x)
INSERT INTO t VALUES ('1');
SELECT * FROM t;
+---+
|  v|
+---+
|  1|
+---+

SQL functions

The behavior of some SQL functions can be different under ANSI mode (spark.sql.ansi.enabled=true).

  • size: This function returns null for null input under ANSI mode.

SQL keywords

When spark.sql.ansi.enabled is true, Spark SQL will use the ANSI mode parser. In this mode, Spark SQL has two kinds of keywords: * Reserved keywords: Keywords that are reserved and can’t be used as identifiers for table, view, column, function, alias, etc. * Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. For example, EXPLAIN SELECT ... is a command, but EXPLAIN can be used as identifiers in other places.

When the ANSI mode is disabled, Spark SQL has two kinds of keywords:

  • Non-reserved keywords: Same definition as the one when the ANSI mode enabled.
  • Strict-non-reserved keywords: A strict version of non-reserved keywords, which cannot be used as table alias.

By default spark.sql.ansi.enabled is false.

Below is a list of all the keywords in Spark SQL.

Keyword Spark SQL ANSI Mode Spark SQL Default Mode SQL-2016
ADD non-reserved non-reserved non-reserved
AFTER non-reserved non-reserved non-reserved
ALL reserved non-reserved reserved
ALTER non-reserved non-reserved reserved
ANALYZE non-reserved non-reserved non-reserved
AND reserved non-reserved reserved
ANTI non-reserved strict-non-reserved non-reserved
ANY reserved non-reserved reserved
ARCHIVE non-reserved non-reserved non-reserved
ARRAY non-reserved non-reserved reserved
AS reserved non-reserved reserved
ASC non-reserved non-reserved non-reserved
AT non-reserved non-reserved reserved
AUTHORIZATION reserved non-reserved reserved
BETWEEN non-reserved non-reserved reserved
BOTH reserved non-reserved reserved
BUCKET non-reserved non-reserved non-reserved
BUCKETS non-reserved non-reserved non-reserved
BY non-reserved non-reserved reserved
CACHE non-reserved non-reserved non-reserved
CASCADE non-reserved non-reserved non-reserved
CASE reserved non-reserved reserved
CAST reserved non-reserved reserved
CHANGE non-reserved non-reserved non-reserved
CHECK reserved non-reserved reserved
CLEAR non-reserved non-reserved non-reserved
CLUSTER non-reserved non-reserved non-reserved
CLUSTERED non-reserved non-reserved non-reserved
CODEGEN non-reserved non-reserved non-reserved
COLLATE reserved non-reserved reserved
COLLECTION non-reserved non-reserved non-reserved
COLUMN reserved non-reserved reserved
COLUMNS non-reserved non-reserved non-reserved
COMMENT non-reserved non-reserved non-reserved
COMMIT non-reserved non-reserved reserved
COMPACT non-reserved non-reserved non-reserved
COMPACTIONS non-reserved non-reserved non-reserved
COMPUTE non-reserved non-reserved non-reserved
CONCATENATE non-reserved non-reserved non-reserved
CONSTRAINT reserved non-reserved reserved
COST non-reserved non-reserved non-reserved
CREATE reserved non-reserved reserved
CROSS reserved strict-non-reserved reserved
CUBE non-reserved non-reserved reserved
CURRENT non-reserved non-reserved reserved
CURRENT_DATE reserved non-reserved reserved
CURRENT_TIME reserved non-reserved reserved
CURRENT_TIMESTAMP reserved non-reserved reserved
CURRENT_USER reserved non-reserved reserved
DATA non-reserved non-reserved non-reserved
DATABASE non-reserved non-reserved non-reserved
DATABASES non-reserved non-reserved non-reserved
DAY reserved non-reserved reserved
DBPROPERTIES non-reserved non-reserved non-reserved
DEFINED non-reserved non-reserved non-reserved
DELETE non-reserved non-reserved reserved
DELIMITED non-reserved non-reserved non-reserved
DESC non-reserved non-reserved non-reserved
DESCRIBE non-reserved non-reserved reserved
DFS non-reserved non-reserved non-reserved
DIRECTORIES non-reserved non-reserved non-reserved
DIRECTORY non-reserved non-reserved non-reserved
DISTINCT reserved non-reserved reserved
DISTRIBUTE non-reserved non-reserved non-reserved
DIV non-reserved non-reserved not a keyword
DROP non-reserved non-reserved reserved
ELSE reserved non-reserved reserved
END reserved non-reserved reserved
ESCAPE reserved non-reserved reserved
ESCAPED non-reserved non-reserved non-reserved
EXCEPT reserved strict-non-reserved reserved
EXCHANGE non-reserved non-reserved non-reserved
EXISTS non-reserved non-reserved reserved
EXPLAIN non-reserved non-reserved non-reserved
EXPORT non-reserved non-reserved non-reserved
EXTENDED non-reserved non-reserved non-reserved
EXTERNAL non-reserved non-reserved reserved
EXTRACT non-reserved non-reserved reserved
FALSE reserved non-reserved reserved
FETCH reserved non-reserved reserved
FIELDS non-reserved non-reserved non-reserved
FILTER reserved non-reserved reserved
FILEFORMAT non-reserved non-reserved non-reserved
FIRST non-reserved non-reserved non-reserved
FOLLOWING non-reserved non-reserved non-reserved
FOR reserved non-reserved reserved
FOREIGN reserved non-reserved reserved
FORMAT non-reserved non-reserved non-reserved
FORMATTED non-reserved non-reserved non-reserved
FROM reserved non-reserved reserved
FULL reserved strict-non-reserved reserved
FUNCTION non-reserved non-reserved reserved
FUNCTIONS non-reserved non-reserved non-reserved
GLOBAL non-reserved non-reserved reserved
GRANT reserved non-reserved reserved
GROUP reserved non-reserved reserved
GROUPING non-reserved non-reserved reserved
HAVING reserved non-reserved reserved
HOUR reserved non-reserved reserved
IF non-reserved non-reserved not a keyword
IGNORE non-reserved non-reserved non-reserved
IMPORT non-reserved non-reserved non-reserved
IN reserved non-reserved reserved
INDEX non-reserved non-reserved non-reserved
INDEXES non-reserved non-reserved non-reserved
INNER reserved strict-non-reserved reserved
INPATH non-reserved non-reserved non-reserved
INPUTFORMAT non-reserved non-reserved non-reserved
INSERT non-reserved non-reserved reserved
INTERSECT reserved strict-non-reserved reserved
INTERVAL non-reserved non-reserved reserved
INTO reserved non-reserved reserved
IS reserved non-reserved reserved
ITEMS non-reserved non-reserved non-reserved
JOIN reserved strict-non-reserved reserved
KEYS non-reserved non-reserved non-reserved
LAST non-reserved non-reserved non-reserved
LATERAL non-reserved non-reserved reserved
LAZY non-reserved non-reserved non-reserved
LEADING reserved non-reserved reserved
LEFT reserved strict-non-reserved reserved
LIKE non-reserved non-reserved reserved
LIMIT non-reserved non-reserved non-reserved
LINES non-reserved non-reserved non-reserved
LIST non-reserved non-reserved non-reserved
LOAD non-reserved non-reserved non-reserved
LOCAL non-reserved non-reserved reserved
LOCATION non-reserved non-reserved non-reserved
LOCK non-reserved non-reserved non-reserved
LOCKS non-reserved non-reserved non-reserved
LOGICAL non-reserved non-reserved non-reserved
MACRO non-reserved non-reserved non-reserved
MAP non-reserved non-reserved non-reserved
MATCHED non-reserved non-reserved non-reserved
MERGE non-reserved non-reserved non-reserved
MINUS non-reserved strict-non-reserved non-reserved
MINUTE reserved non-reserved reserved
MONTH reserved non-reserved reserved
MSCK non-reserved non-reserved non-reserved
NAMESPACE non-reserved non-reserved non-reserved
NAMESPACES non-reserved non-reserved non-reserved
NATURAL reserved strict-non-reserved reserved
NO non-reserved non-reserved reserved
NOT reserved non-reserved reserved
NULL reserved non-reserved reserved
NULLS non-reserved non-reserved non-reserved
OF non-reserved non-reserved reserved
ON reserved strict-non-reserved reserved
ONLY reserved non-reserved reserved
OPTION non-reserved non-reserved non-reserved
OPTIONS non-reserved non-reserved non-reserved
OR reserved non-reserved reserved
ORDER reserved non-reserved reserved
OUT non-reserved non-reserved reserved
OUTER reserved non-reserved reserved
OUTPUTFORMAT non-reserved non-reserved non-reserved
OVER non-reserved non-reserved non-reserved
OVERLAPS reserved non-reserved reserved
OVERLAY non-reserved non-reserved non-reserved
OVERWRITE non-reserved non-reserved non-reserved
PARTITION non-reserved non-reserved reserved
PARTITIONED non-reserved non-reserved non-reserved
PARTITIONS non-reserved non-reserved non-reserved
PERCENT non-reserved non-reserved non-reserved
PIVOT non-reserved non-reserved non-reserved
PLACING non-reserved non-reserved non-reserved
POSITION non-reserved non-reserved reserved
PRECEDING non-reserved non-reserved non-reserved
PRIMARY reserved non-reserved reserved
PRINCIPALS non-reserved non-reserved non-reserved
PROPERTIES non-reserved non-reserved non-reserved
PURGE non-reserved non-reserved non-reserved
QUERY non-reserved non-reserved non-reserved
RANGE non-reserved non-reserved reserved
RECORDREADER non-reserved non-reserved non-reserved
RECORDWRITER non-reserved non-reserved non-reserved
RECOVER non-reserved non-reserved non-reserved
REDUCE non-reserved non-reserved non-reserved
REFERENCES reserved non-reserved reserved
REFRESH non-reserved non-reserved non-reserved
REGEXP non-reserved non-reserved not a keyword
RENAME non-reserved non-reserved non-reserved
REPAIR non-reserved non-reserved non-reserved
REPLACE non-reserved non-reserved non-reserved
RESET non-reserved non-reserved non-reserved
RESTRICT non-reserved non-reserved non-reserved
REVOKE non-reserved non-reserved reserved
RIGHT reserved strict-non-reserved reserved
RLIKE non-reserved non-reserved non-reserved
ROLE non-reserved non-reserved non-reserved
ROLES non-reserved non-reserved non-reserved
ROLLBACK non-reserved non-reserved reserved
ROLLUP non-reserved non-reserved reserved
ROW non-reserved non-reserved reserved
ROWS non-reserved non-reserved reserved
SCHEMA non-reserved non-reserved non-reserved
SCHEMAS non-reserved non-reserved not a keyword
SECOND reserved non-reserved reserved
SELECT reserved non-reserved reserved
SEMI non-reserved strict-non-reserved non-reserved
SEPARATED non-reserved non-reserved non-reserved
SERDE non-reserved non-reserved non-reserved
SERDEPROPERTIES non-reserved non-reserved non-reserved
SESSION_USER reserved non-reserved reserved
SET non-reserved non-reserved reserved
SETS non-reserved non-reserved non-reserved
SHOW non-reserved non-reserved non-reserved
SKEWED non-reserved non-reserved non-reserved
SOME reserved non-reserved reserved
SORT non-reserved non-reserved non-reserved
SORTED non-reserved non-reserved non-reserved
START non-reserved non-reserved reserved
STATISTICS non-reserved non-reserved non-reserved
STORED non-reserved non-reserved non-reserved
STRATIFY non-reserved non-reserved non-reserved
STRUCT non-reserved non-reserved non-reserved
SUBSTR non-reserved non-reserved non-reserved
SUBSTRING non-reserved non-reserved non-reserved
TABLE reserved non-reserved reserved
TABLES non-reserved non-reserved non-reserved
TABLESAMPLE non-reserved non-reserved reserved
TBLPROPERTIES non-reserved non-reserved non-reserved
TEMP non-reserved non-reserved not a keyword
TEMPORARY non-reserved non-reserved non-reserved
TERMINATED non-reserved non-reserved non-reserved
THEN reserved non-reserved reserved
TIME reserved non-reserved reserved
TO reserved non-reserved reserved
TOUCH non-reserved non-reserved non-reserved
TRAILING reserved non-reserved reserved
TRANSACTION non-reserved non-reserved non-reserved
TRANSACTIONS non-reserved non-reserved non-reserved
TRANSFORM non-reserved non-reserved non-reserved
TRIM non-reserved non-reserved non-reserved
TRUE non-reserved non-reserved reserved
TRUNCATE non-reserved non-reserved reserved
TYPE non-reserved non-reserved non-reserved
UNARCHIVE non-reserved non-reserved non-reserved
UNBOUNDED non-reserved non-reserved non-reserved
UNCACHE non-reserved non-reserved non-reserved
UNION reserved strict-non-reserved reserved
UNIQUE reserved non-reserved reserved
UNKNOWN reserved non-reserved reserved
UNLOCK non-reserved non-reserved non-reserved
UNSET non-reserved non-reserved non-reserved
UPDATE non-reserved non-reserved reserved
USE non-reserved non-reserved non-reserved
USER reserved non-reserved reserved
USING reserved strict-non-reserved reserved
VALUES non-reserved non-reserved reserved
VIEW non-reserved non-reserved non-reserved
VIEWS non-reserved non-reserved non-reserved
WHEN reserved non-reserved reserved
WHERE reserved non-reserved reserved
WINDOW non-reserved non-reserved reserved
WITH reserved non-reserved reserved
YEAR reserved non-reserved reserved
ZONE non-reserved non-reserved non-reserved