JSON path expression

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

A JSON path expression is used to extract values from a JSON string using the : operator

Syntax

{ { identifier | [ field ] | [ * ] | [ index ] }
  [ . identifier | [ field ] | [ * ] | [ index ] ] [...] }

The brackets surrounding field, * and index are actual brackets and not indicating an optional syntax.

Parameters

  • identifier: A case insensitive identifier of a JSON field.

  • [ field ]: A bracketed case sensitive STRING literal identifying a JSON field.

  • [ * ]: Identifying all elements in a JSON array.

  • [ index ]: An integer literal identifying a specific element in a 0-based JSON array.

Returns

A STRING.

When a JSON field exists with an un-delimited null value, you will receive a SQL NULL value for that column, not a null text value.

You can use :: operator to cast values to basic data types.

Use the from_json function to cast nested results into more complex data types, such as arrays or structs.

Notes

You can use an un-delimited identifier to refer to a JSON field if the name does not contain spaces, or special characters, and there is no field of the same name in different case.

Use a delimited identifier if there is no field of the same name in different case.

The [ field ] notation can always be used, but requires you to exactly match the case of the field.

If Databricks SQL cannot uniquely identify a field an error is returned. If no match is found for any field Databricks SQL returns NULL.

Examples

The following examples use the data created with the statement in Example data.

Extract using identifier and delimiters

> SELECT raw:owner, raw:OWNER, raw:['owner'], raw:['OWNER'] FROM store_data;
  amy  amy  amy  NULL

-- Use backticks to escape special characters. References are case insensitive when you use backticks.
-- Use brackets to make them case sensitive.
> SELECT raw:`zip code`, raw:`Zip Code`, raw:['fb:testid'] FROM store_data;
  94025      94025      1234

Extract nested fields

-- Use dot notation
> SELECT raw:store.bicycle FROM store_data;
  '{ "price":19.95, "color":"red" }'

-- Use brackets
> SELECT raw:['store']['bicycle'] FROM store_data;
  '{ "price":19.95, "color":"red" }'

Extract values from arrays

-- Index elements
> SELECT raw:store.fruit[0], raw:store.fruit[1] FROM store_data;
  '{ "weight":8, "type":"apple" }'  '{ "weight":9, "type":"pear" }'

-- Extract subfields from arrays
> SELECT raw:store.book[*].isbn FROM store_data;
  '[ null, "0-553-21311-3", "0-395-19395-8" ]'

-- Access arrays within arrays or structs within arrays
> SELECT raw:store.basket[*],
         raw:store.basket[*][0] first_of_baskets,
         raw:store.basket[0][*] first_basket,
         raw:store.basket[*][*] all_elements_flattened,
         raw:store.basket[0][2].b subfield
  FROM store_data;
  basket                       first_of_baskets   first_basket          all_elements_flattened            subfield
 ---------------------------- ------------------ --------------------- --------------------------------- ----------
  [                            [                  [                     [1,2,{"b":"y","a":"x"},3,4,5,6]   y
    [1,2,{"b":"y","a":"x"}],     1,                 1,
    [3,4],                       3,                 2,
    [5,6]                        5                  {"b":"y","a":"x"}
  ]                            ]                  ]

NULL behavior

> SELECT '{"key":null}':key IS NULL sql_null, '{"key":"null"}':key IS NULL;
  true          false

Cast values

-- price is returned as a double, not a string
> SELECT raw:store.bicycle.price::double FROM store_data
  19.95

-- use from_json to cast into more complex types
> SELECT from_json(raw:store.bicycle, 'price double, color string') bicycle FROM store_data
  '{ "price":19.95, "color":"red" }'

-- the column returned is an array of string arrays
> SELECT from_json(raw:store.basket[*], 'array<array<string>>') baskets FROM store_data
  '[
    ["1","2","{\"b\":\"y\",\"a\":\"x\"}]",
    ["3","4"],
    ["5","6"]
  ]'

Example data

CREATE TABLE store_data AS SELECT
'{
   "store":{
      "fruit": [
        {"weight":8,"type":"apple"},
        {"weight":9,"type":"pear"}
      ],
      "basket":[
        [1,2,{"b":"y","a":"x"}],
        [3,4],
        [5,6]
      ],
      "book":[
        {
          "author":"Nigel Rees",
          "title":"Sayings of the Century",
          "category":"reference",
          "price":8.95
        },
        {
          "author":"Herman Melville",
          "title":"Moby Dick",
          "category":"fiction",
          "price":8.99,
          "isbn":"0-553-21311-3"
        },
        {
          "author":"J. R. R. Tolkien",
          "title":"The Lord of the Rings",
          "category":"fiction",
          "reader":[
            {"age":25,"name":"bob"},
            {"age":26,"name":"jack"}
          ],
          "price":22.99,
          "isbn":"0-395-19395-8"
        }
      ],
      "bicycle":{
        "price":19.95,
        "color":"red"
      }
    },
    "owner":"amy",
    "zip code":"94025",
    "fb:testid":"1234"
 }' as raw