from_json function
Applies to: Databricks SQL
Databricks Runtime
Returns a struct value with the jsonStr and schema.
Syntax
from_json(jsonStr, schema [, options])
Arguments
jsonStr: ASTRINGexpression specifying a json document.schema: ASTRINGexpression or invocation ofschema_of_jsonfunction.options: An optionalMAP<STRING,STRING>literal specifying directives.
jsonStr should be well-formed with respect to schema and options.
The schema must be defined as comma-separated column names and data type pairs, similar to the format used in CREATE TABLE. Prior to Databricks Runtime 12.2 schema must be a literal.
Alternatively, you can use from_json with Lakeflow Declarative Pipelines to automatically infer and evolve the schema by setting schema to NULL and specifying a schemaLocationKey. For examples, see Infer and evolve the schema using from_json in Lakeflow Declarative Pipelines.
note
options, if provided, can be any of the following:
primitivesAsString(defaultfalse): infers all primitive values as a string type.prefersDecimal(defaultfalse): infers all floating-point values as a decimal type. If the values do not fit in decimal, then it infers them as doubles.allowComments(defaultfalse): ignores Java and C++ style comment in JSON records.allowUnquotedFieldNames(defaultfalse): allows unquoted JSON field names.allowSingleQuotes(defaulttrue): allows single quotes in addition to double quotes.allowNumericLeadingZeros(defaultfalse): allows leading zeros in numbers (for example,00012).allowBackslashEscapingAnyCharacter(defaultfalse): allows accepting quoting of all character using backslash quoting mechanism.allowUnquotedControlChars(defaultfalse): allows JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not.mode(defaultPERMISSIVE): allows a mode for dealing with corrupt records during parsing.PERMISSIVE: when it meets a corrupted record, puts the malformed string into a field configured bycolumnNameOfCorruptRecord, and sets malformed fields to null. To keep corrupt records, you can set a string type field namedcolumnNameOfCorruptRecordin an user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds acolumnNameOfCorruptRecordfield in an output schema.FAILFAST: throws an exception when it meets corrupted records.
columnNameOfCorruptRecord(default is the value specified inspark.sql.columnNameOfCorruptRecord): allows renaming the new field having malformed string created byPERMISSIVEmode. This overridesspark.sql.columnNameOfCorruptRecord.dateFormat(defaultyyyy-MM-dd): sets the string that indicates a date format. Custom date formats follow the formats at Datetime patterns. This applies to date type.timestampFormat(defaultyyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]): sets the string that indicates a timestamp format. Custom date formats follow the formats at Datetime patterns. This applies to timestamp type.multiLine(defaultfalse): parses one record, which may span multiple lines, per file.encoding(by default it is not set): allows to forcibly set one of standard basic or extended encoding for the JSON files. For example UTF-16BE, UTF-32LE. If the encoding is not specified andmultiLineis set totrue, it is detected automatically.lineSep(default covers all\r,\r\nand\n): defines the line separator that should be used for parsing.samplingRatio(default 1.0): defines fraction of input JSON objects used for schema inferring.dropFieldIfAllNull(defaultfalse): whether to ignore column of all null values or empty array/struct during schema inference.locale(default isen-US):setsa locale as language tag in IETF BCP 47 format. For instance, this is used while parsing dates and timestamps.allowNonNumericNumbers(defaulttrue): allows JSON parser to recognize set of not-a-number (NaN) tokens as legal floating number values:+INFfor positive infinity, as well as alias of+InfinityandInfinity.-INFfor negative infinity), alias-Infinity.NaNfor other not-a-numbers, like result of division by zero.
readerCaseSensitive(defaulttrue): specifies the case sensitivity behavior whenrescuedDataColumnis enabled. If true, rescue the data columns whose names differ by case from the schema; otherwise, read the data in a case-insensitive manner. Available in Databricks SQL and Databricks Runtime 13.3 LTS and above.
Returns
A struct with field names and types matching the schema definition.
Examples
SQL
> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
{"a":1,"b":0.8}
-- The column name must to match the case of the JSON field
> SELECT from_json('{"a":1}', 'A INT');
{"A":null}
> SELECT from_json('{"datetime":"26/08/2015"}', 'datetime Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
{"datetime":2015-08-26 00:00:00}
-- Disambiguate field names with different cases
> SELECT cast(from_json('{"a":1, "A":0.8}', 'a INT, A DOUBLE') AS STRUCT<a: INT, b: DOUBLE>);
{"a":1, "b":0.8}