from_json function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Returns a struct value with the jsonStr and schema.

Syntax

from_json(jsonStr, schema [, options])

Arguments

  • jsonStr: A STRING expression specifying a json document.

  • schema: A STRING expression or invocation of schema_of_json function.

  • options: An optional MAP<STRING,STRING> literal specifying directives.

Prior to Databricks Runtime 12.2 schema must be a literal.

Returns

A struct with field names and types matching the schema definition.

jsonStr should be well-formed with respect to schema and options. schema must be defined as comma-separated column name and data type pairs as used in for example CREATE TABLE.

options, if provided, can be any of the following:

  • primitivesAsString (default false): infers all primitive values as a string type.

  • prefersDecimal (default false): infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.

  • allowComments (default false): ignores Java and C++ style comment in JSON records.

  • allowUnquotedFieldNames (default false): allows unquoted JSON field names.

  • allowSingleQuotes (default true): allows single quotes in addition to double quotes.

  • allowNumericLeadingZeros (default false): allows leading zeros in numbers (for example, 00012).

  • allowBackslashEscapingAnyCharacter (default false): allows accepting quoting of all character using backslash quoting mechanism.

  • allowUnquotedControlChars (default false): allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not.

  • mode (default PERMISSIVE): allows a mode for dealing with corrupt records during parsing.

    • PERMISSIVE: when it meets a corrupted record, puts the malformed string into a field configured by columnNameOfCorruptRecord, and sets malformed fields to null. To keep corrupt records, you can set a string type field named columnNameOfCorruptRecord in an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds a columnNameOfCorruptRecord field in an output schema.

    • FAILFAST: throws an exception when it meets corrupted records.

  • columnNameOfCorruptRecord (default is the value specified in spark.sql.columnNameOfCorruptRecord): allows renaming the new field having malformed string created by PERMISSIVE mode. This overrides spark.sql.columnNameOfCorruptRecord.

  • dateFormat (default yyyy-MM-dd): sets the string that indicates a date format. Custom date formats follow the formats at Datetime patterns. This applies to date type.

  • timestampFormat (default yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]): sets the string that indicates a timestamp format. Custom date formats follow the formats at Datetime patterns. This applies to timestamp type.

  • multiLine (default false): parses one record, which may span multiple lines, per file.

  • encoding (by default it is not set): allows to forcibly set one of standard basic or extended encoding for the JSON files. For example UTF-16BE, UTF-32LE. If the encoding is not specified and multiLine is set to true, it is detected automatically.

  • lineSep (default covers all \r, \r\n and \n): defines the line separator that should be used for parsing.

  • samplingRatio (default 1.0): defines fraction of input JSON objects used for schema inferring.

  • dropFieldIfAllNull (default false): whether to ignore column of all null values or empty array/struct during schema inference.

  • locale (default is en-US): sets a locale as language tag in IETF BCP 47 format. For instance, this is used while parsing dates and timestamps.

  • allowNonNumericNumbers (default true): allows JSON parser to recognize set of not-a-number (NaN) tokens as legal floating number values:

    • +INF for positive infinity, as well as alias of +Infinity and Infinity.

    • -INF for negative infinity), alias -Infinity.

    • NaN for other not-a-numbers, like result of division by zero.

  • readerCaseSensitive (default true): specifies the case sensitivity behavior when rescuedDataColumn is enabled. If true, rescue the data columns whose names differ by case from the schema; otherwise, read the data in a case-insensitive manner. Available in Databricks SQL and Databricks Runtime 13.3 and above.

Examples

> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
 {1,0.8}

> SELECT from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
 {2015-08-26 00:00:00}