from_json function
Applies to:  Databricks SQL 
 Databricks Runtime
Returns a struct value with the jsonStr and schema.
Syntax
from_json(jsonStr, schema [, options])
Arguments
- jsonStr: A- STRINGexpression specifying a json document.
- schema: A- STRINGexpression or invocation of- schema_of_jsonfunction.
- options: An optional- MAP<STRING,STRING>literal specifying directives.
jsonStr should be well-formed with respect to schema and options.
The schema must be defined as comma-separated column names and data type pairs, similar to the format used in CREATE TABLE. Prior to Databricks Runtime 12.2 schema must be a literal.
Alternatively, you can use from_json with Lakeflow Declarative Pipelines to automatically infer and evolve the schema by setting schema to NULL and specifying a schemaLocationKey. For examples, see Infer and evolve the schema using from_json in Lakeflow Declarative Pipelines.
注記
options, if provided, can be any of the following:
- primitivesAsString(default- false): infers all primitive values as a string type.
- prefersDecimal(default- false): infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.
- allowComments(default- false): ignores Java and C++ style comment in JSON records.
- allowUnquotedFieldNames(default- false): allows unquoted JSON field names.
- allowSingleQuotes(default- true): allows single quotes in addition to double quotes.
- allowNumericLeadingZeros(default- false): allows leading zeros in numbers (for example,- 00012).
- allowBackslashEscapingAnyCharacter(default- false): allows accepting quoting of all character using backslash quoting mechanism.
- allowUnquotedControlChars(default- false): allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not.
- mode(default- PERMISSIVE): allows a mode for dealing with corrupt records during parsing.- PERMISSIVE: when it meets a corrupted record, puts the malformed string into a field configured by- columnNameOfCorruptRecord, and sets malformed fields to null. To keep corrupt records, you can set a string type field named- columnNameOfCorruptRecordin an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds a- columnNameOfCorruptRecordfield in an output schema.
- FAILFAST: throws an exception when it meets corrupted records.
 
- columnNameOfCorruptRecord(default is the value specified in- spark.sql.columnNameOfCorruptRecord): allows renaming the new field having malformed string created by- PERMISSIVEmode. This overrides- spark.sql.columnNameOfCorruptRecord.
- dateFormat(default- yyyy-MM-dd): sets the string that indicates a date format. Custom date formats follow the formats at Datetime patterns. This applies to date type.
- timestampFormat(default- yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]): sets the string that indicates a timestamp format. Custom date formats follow the formats at Datetime patterns. This applies to timestamp type.
- multiLine(default- false): parses one record, which may span multiple lines, per file.
- encoding(by default it is not set): allows to forcibly set one of standard basic or extended encoding for the JSON files. For example UTF-16BE, UTF-32LE. If the encoding is not specified and- multiLineis set to- true, it is detected automatically.
- lineSep(default covers all- \r,- \r\nand- \n): defines the line separator that should be used for parsing.
- samplingRatio(default 1.0): defines fraction of input JSON objects used for schema inferring.
- dropFieldIfAllNull(default- false): whether to ignore column of all null values or empty array/struct during schema inference.
- locale(default is- en-US):- setsa locale as language tag in IETF BCP 47 format. For instance, this is used while parsing dates and timestamps.
- allowNonNumericNumbers(default- true): allows JSON parser to recognize set of not-a-number (- NaN) tokens as legal floating number values:- +INFfor positive infinity, as well as alias of- +Infinityand- Infinity.
- -INFfor negative infinity), alias- -Infinity.
- NaNfor other not-a-numbers, like result of division by zero.
 
- readerCaseSensitive(default- true): specifies the case sensitivity behavior when- rescuedDataColumnis enabled. If true, rescue the data columns whose names differ by case from the schema; otherwise, read the data in a case-insensitive manner. Available in Databricks SQL and Databricks Runtime 13.3 LTS and above.
Returns
A struct with field names and types matching the schema definition.
Examples
SQL
> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
{"a":1,"b":0.8}
-- The column name must to match the case of the JSON field
> SELECT from_json('{"a":1}', 'A INT');
{"A":null}
> SELECT from_json('{"datetime":"26/08/2015"}', 'datetime Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
{"datetime":2015-08-26 00:00:00}
-- Disambiguate field names with different cases
> SELECT cast(from_json('{"a":1, "A":0.8}', 'a INT, A DOUBLE') AS STRUCT<a: INT, b: DOUBLE>);
 {"a":1, "b":0.8}