higher-order-functions-tutorial-python

This tutorial walks you through four higher-order functions. While this in-depth blog explains the concepts, justifications, and motivations of why handling complex data types such as arrays are important in SQL, and equally explains why their existing implementation are inefficient and cumbersome, this tutorial shows how to use higher-order functions in SQL in processing structured data and arrays in IoT device events. In particular, they come handy and you can put them to good use if you enjoy functional programming and can quickly and can efficiently write a lambda expression as part of these higher-order SQL functions.

This tutorial explores four functions and how you can put them to a wide range of uses in your processing and transforming array types:

transform()
filter()
exists()
aggregate()

The takeaway from this short tutorial is that there exists myriad ways to slice and dice nested JSON structures with Spark SQL utility functions. These dedicated higher-order functions are primarily suited to manipulating arrays in Spark SQL, making it easier and the code more concise when processing table values with arrays or nested arrays.

from pyspark.sql.functions import *from pyspark.sql.types import * schema = StructType() \          .add("dc_id", StringType()) \          .add("source", MapType(StringType(), StructType() \                        .add("description", StringType()) \                        .add("ip", StringType()) \                        .add("id", IntegerType()) \                        .add("temp", ArrayType(IntegerType())) \                        .add("c02_level", ArrayType(IntegerType())) \                        .add("geo", StructType() \                              .add("lat", DoubleType()) \                              .add("long", DoubleType())))) 

# Convenience function for turning JSON strings into DataFrames.def jsonToDataFrame(json, schema=None):  # SparkSessions are available with Spark 2.0+  reader = spark.read  if schema:    reader.schema(schema)  return reader.json(sc.parallelize([json]))

dataDF = jsonToDataFrame( """{     "dc_id": "dc-101",    "source": {        "sensor-igauge": {        "id": 10,        "ip": "68.28.91.22",        "description": "Sensor attached to the container ceilings",        "temp":[35,35,35,36,35,35,32,35,30,35,32,35],        "c02_level": [1475,1476,1473],        "geo": {"lat":38.00, "long":97.00}                              },      "sensor-ipad": {        "id": 13,        "ip": "67.185.72.1",        "description": "Sensor ipad attached to carbon cylinders",        "temp": [45,45,45,46,45,45,42,35,40,45,42,45],        "c02_level": [1370,1371,1372],        "geo": {"lat":47.41, "long":-122.00}      },      "sensor-inest": {        "id": 8,        "ip": "208.109.163.218",        "description": "Sensor attached to the factory ceilings",        "temp": [40,40,40,40,40,43,42,40,40,45,42,45],        "c02_level": [1346,1345, 1343],        "geo": {"lat":33.61, "long":-111.89}      },      "sensor-istick": {        "id": 5,        "ip": "204.116.105.67",        "description": "Sensor embedded in exhaust pipes in the ceilings",        "temp":[30,30,30,30,40,43,42,40,40,35,42,35],        "c02_level": [1574,1570, 1576],        "geo": {"lat":35.93, "long":-85.46}      }    }  }""", schema) display(dataDF)  

dc_id

source

dc-101

{"sensor-igauge": {"description": "Sensor attached to the container ceilings", "ip": "68.28.91.22", "id": 10, "temp": [35, 35, 35, 36, 35, 35, 32, 35, 30, 35, 32, 35], "c02_level": [1475, 1476, 1473], "geo": {"lat": 38, "long": 97}}, "sensor-ipad": {"description": "Sensor ipad attached to carbon cylinders", "ip": "67.185.72.1", "id": 13, "temp": [45, 45, 45, 46, 45, 45, 42, 35, 40, 45, 42, 45], "c02_level": [1370, 1371, 1372], "geo": {"lat": 47.41, "long": -122}}, "sensor-inest": {"description": "Sensor attached to the factory ceilings", "ip": "208.109.163.218", "id": 8, "temp": [40, 40, 40, 40, 40, 43, 42, 40, 40, 45, 42, 45], "c02_level": [1346, 1345, 1343], "geo": {"lat": 33.61, "long": -111.89}}, "sensor-istick": {"description": "Sensor embedded in exhaust pipes in the ceilings", "ip": "204.116.105.67", "id": 5, "temp": [30, 30, 30, 30, 40, 43, 42, 40, 40, 35, 42, 35], "c02_level": [1574, 1570, 1576], "geo": {"lat": 35.93, "long": -85.46}}}

Showing all 1 rows.

dataDF.printSchema()

explodedDF = dataDF.select("dc_id", explode("source"))display(explodedDF)

dc_id

key

value

dc-101

sensor-igauge

{"description": "Sensor attached to the container ceilings", "ip": "68.28.91.22", "id": 10, "temp": [35, 35, 35, 36, 35, 35, 32, 35, 30, 35, 32, 35], "c02_level": [1475, 1476, 1473], "geo": {"lat": 38, "long": 97}}

dc-101

sensor-ipad

{"description": "Sensor ipad attached to carbon cylinders", "ip": "67.185.72.1", "id": 13, "temp": [45, 45, 45, 46, 45, 45, 42, 35, 40, 45, 42, 45], "c02_level": [1370, 1371, 1372], "geo": {"lat": 47.41, "long": -122}}

dc-101

sensor-inest

{"description": "Sensor attached to the factory ceilings", "ip": "208.109.163.218", "id": 8, "temp": [40, 40, 40, 40, 40, 43, 42, 40, 40, 45, 42, 45], "c02_level": [1346, 1345, 1343], "geo": {"lat": 33.61, "long": -111.89}}

dc-101

sensor-istick

{"description": "Sensor embedded in exhaust pipes in the ceilings", "ip": "204.116.105.67", "id": 5, "temp": [30, 30, 30, 30, 40, 43, 42, 40, 40, 35, 42, 35], "c02_level": [1574, 1570, 1576], "geo": {"lat": 35.93, "long": -85.46}}

Showing all 4 rows.

## use col.getItem(key) to get individual values within our Map#devicesDataDF = explodedDF.select("dc_id", "key", \                        "value.ip", \                        col("value.id").alias("device_id"), \                        col("value.c02_level").alias("c02_levels"), \                        "value.temp")display(devicesDataDF)

devicesDataDF.printSchema()

devicesDataDF.createOrReplaceTempView("data_center_iot_devices")

%sql select * from data_center_iot_devices

%sql describe data_center_iot_devices

Its functional signature, transform(values, value -> lambda expression), has two components:

transform(values..) is the higher-order function. This takes an array and an anonymous function as its input. Internally, transform takes care of setting up a new array, applying the anonymous function to each element, and then assigning the result to the output array.
The value -> expression is an anonymous function. The function is further divided into two components separated by a -> symbol:
- The argument list: This case has only one argument: value. You can specify multiple arguments by creating a comma-separated list of arguments enclosed by parenthesis, for example: (x, y) -> x + y.
- The body: This is a SQL expression that can use the arguments and outer variables to calculate the new value.

In short, the programmatic signature for transform() is as follows:

transform(array<T>, function<T, U>): array<U> This produces an array by applying a function to each element of an input array. Note that the functional programming equivalent operation is map. This has been named transform in order to prevent confusion with the map expression (that creates a map from a key value expression).

This basic scheme for transform(...) works the same way as with other higher-order functions, as you will see shortly.

The following query transforms the values in an array by converting each elmement's temperature reading from Celsius to Fahrenheit.

Let's transform (and hence convert) all our Celsius reading into Fahrenheit. (Use conversion formula: ((C * 9) / 5) + 32) The lambda expression here is the formula to convert C->F. Now, temp and ((t * 9) div 5) + 32 are the arguments to the higher-order function transform(). The anonymous function will iterate through each element in the array, temp, apply the function to it, and transforming its value and placing into an output array. The result is a new column with tranformed values: fahrenheit_temp.

%sql select key, ip, device_id, temp,     transform (temp, t -> ((t * 9) div 5) + 32 ) as fahrenheit_temp     from data_center_iot_devices

key

device_id

temp

fahrenheit_temp

sensor-igauge

68.28.91.22

[35, 35, 35, 36, 35, 35, 32, 35, 30, 35, 32, 35]

[95, 95, 95, 96, 95, 95, 89, 95, 86, 95, 89, 95]

sensor-ipad

67.185.72.1

[45, 45, 45, 46, 45, 45, 42, 35, 40, 45, 42, 45]

[113, 113, 113, 114, 113, 113, 107, 95, 104, 113, 107, 113]

sensor-inest

208.109.163.218

[40, 40, 40, 40, 40, 43, 42, 40, 40, 45, 42, 45]

[104, 104, 104, 104, 104, 109, 107, 104, 104, 113, 107, 113]

sensor-istick

204.116.105.67

[30, 30, 30, 30, 40, 43, 42, 40, 40, 35, 42, 35]

[86, 86, 86, 86, 104, 109, 107, 104, 104, 95, 107, 95]

Showing all 4 rows.

%sql select dc_id, key, ip, device_id, c02_levels, temp,      transform (c02_levels, t -> t > 1300) as high_c02_levels     from data_center_iot_devices    

dc_id

key

device_id

c02_levels

temp

high_c02_levels

dc-101

sensor-igauge

68.28.91.22

[1475, 1476, 1473]

[35, 35, 35, 36, 35, 35, 32, 35, 30, 35, 32, 35]

[true, true, true]

dc-101

sensor-ipad

67.185.72.1

[1370, 1371, 1372]

[45, 45, 45, 46, 45, 45, 42, 35, 40, 45, 42, 45]

[true, true, true]

dc-101

sensor-inest

208.109.163.218

[1346, 1345, 1343]

[40, 40, 40, 40, 40, 43, 42, 40, 40, 45, 42, 45]

[true, true, true]

dc-101

sensor-istick

204.116.105.67

[1574, 1570, 1576]

[30, 30, 30, 30, 40, 43, 42, 40, 40, 35, 42, 35]

[true, true, true]

Showing all 4 rows.

As with transform, filter has a similar signature, filter(array<T>, function<T, Boolean>): array<T> Unlike transform() with a boolean expression, it produces an output array from an input array by only adding elements for which predicate function<T, Boolean> holds.

For instance, let's include only readings in our c02_levels that exceed dangerous levels (cO2_level > 1300). Again the functional signature is not dissimilar to transform(). However, note the difference in how filter() generated the resulting array compared to transform() with similar lambda expression.

%sql select dc_id, key, ip, device_id, c02_levels, temp,      filter (c02_levels, t -> t > 1300) as high_c02_levels     from data_center_iot_devices

dc_id

key

device_id

c02_levels

temp

high_c02_levels

dc-101

sensor-igauge

68.28.91.22

[1475, 1476, 1473]

[35, 35, 35, 36, 35, 35, 32, 35, 30, 35, 32, 35]

[1475, 1476, 1473]

dc-101

sensor-ipad

67.185.72.1

[1370, 1371, 1372]

[45, 45, 45, 46, 45, 45, 42, 35, 40, 45, 42, 45]

[1371, 1372]

dc-101

sensor-inest

208.109.163.218

[1346, 1345, 1343]

[40, 40, 40, 40, 40, 43, 42, 40, 40, 45, 42, 45]

[]

dc-101

sensor-istick

204.116.105.67

[1574, 1570, 1576]

[30, 30, 30, 30, 40, 43, 42, 40, 40, 35, 42, 35]

[1574, 1570, 1576]

Showing all 4 rows.

%sql select dc_id, key, ip, device_id, c02_levels, temp,      filter (c02_levels, t -> t < 1300 ) as high_c02_levels     from data_center_iot_devices

%sql select dc_id, key, ip, device_id, c02_levels, temp,      exists (temp, t -> t = 45 ) as value_exists     from data_center_iot_devices

dc_id

key

device_id

c02_levels

temp

value_exists

dc-101

sensor-igauge

68.28.91.22

[1475, 1476, 1473]

[35, 35, 35, 36, 35, 35, 32, 35, 30, 35, 32, 35]

false

dc-101

sensor-ipad

67.185.72.1

[1370, 1371, 1372]

[45, 45, 45, 46, 45, 45, 42, 35, 40, 45, 42, 45]

true

dc-101

sensor-inest

208.109.163.218

[1346, 1345, 1343]

[40, 40, 40, 40, 40, 43, 42, 40, 40, 45, 42, 45]

true

dc-101

sensor-istick

204.116.105.67

[1574, 1570, 1576]

[30, 30, 30, 30, 40, 43, 42, 40, 40, 35, 42, 35]

false

Showing all 4 rows.

%sql select dc_id, key, ip, device_id, c02_levels, temp,      exists (c02_levels, t -> t = 1570 ) as high_c02_levels     from data_center_iot_devices

dc_id

key

device_id

c02_levels

temp

high_c02_levels

dc-101

sensor-igauge

68.28.91.22

[1475, 1476, 1473]

[35, 35, 35, 36, 35, 35, 32, 35, 30, 35, 32, 35]

false

dc-101

sensor-ipad

67.185.72.1

[1370, 1371, 1372]

[45, 45, 45, 46, 45, 45, 42, 35, 40, 45, 42, 45]

false

dc-101

sensor-inest

208.109.163.218

[1346, 1345, 1343]

[40, 40, 40, 40, 40, 43, 42, 40, 40, 45, 42, 45]

false

dc-101

sensor-istick

204.116.105.67

[1574, 1570, 1576]

[30, 30, 30, 30, 40, 43, 42, 40, 40, 35, 42, 35]

true

Showing all 4 rows.

By far this function and its method is more advanced than others. It also allows you to do aggregation, as seen in the next section. Its signature allows us to some extra bit with the last lambda expression as its functional argument.

reduce(array<T>, B, function<B, T, B>, function<B, R>): R Reduce the elements of array<T> into a single value R by merging the elements into a buffer B using function<B, T, B> and by applying a finish function<B, R> on the final buffer. The initial value B is determined by a zero expression.

The finalize function is optional, if you do not specify the function the finalize function the identity function (id -> id) is used. This is the only higher-order function that takes two lambda functions.

For instance, if you want to compute an average of the temperature readings, use lambda expressions: The first one accumulates all the results into an internal temporary buffer, and the second function applies to the final accumulated buffer. With respect to our signature above, B is 0; function<B,T,B> is t + acc, and function<B,R> is acc div size(temp). Furthermore, in the finalize lambda expression, convert the average temperature to Fahrenheit.

%sql select key, ip, device_id, temp,    reduce(temp, 0, (t, acc) -> t + acc, acc-> (acc div size(temp) * 9 div 5) + 32 ) as average_f_temp    from data_center_iot_devices    sort by average_f_temp desc

%sql select key, ip, device_id, c02_levels,    reduce(c02_levels, 0, (t, acc) -> t + acc, acc-> acc div size(c02_levels)) as average_c02_levels    from data_center_iot_devices    sort by  average_c02_levels desc 

%sql select key, ip, device_id, c02_levels,     aggregate(c02_levels,               (1.0 as product, 0 as N),               (buffer, c02) -> (c02 * buffer.product, buffer.N+1),               buffer -> round(Power(buffer.product, 1.0 / buffer.N))) as c02_geomean     from data_center_iot_devices     sort by c02_geomean desc

schema2 = StructType() \                    .add("device_id", IntegerType()) \                    .add("battery_level", ArrayType(IntegerType())) \                    .add("c02_level", ArrayType(IntegerType())) \                    .add("signal", ArrayType(IntegerType())) \                    .add("temp", ArrayType(IntegerType())) \                    .add("cca3", ArrayType(StringType())) \                    .add("device_type", StringType()) \                    .add("ip", StringType()) \                    .add("timestamp", TimestampType())

dataDF2 = jsonToDataFrame("""[  {"device_id": 0, "device_type": "sensor-ipad", "ip": "68.161.225.1", "cca3": ["USA", "United States"], "temp": [25,26, 27], "signal": [23,22,24], "battery_level": [8,9,7], "c02_level": [917, 921, 925], "timestamp" :1475600496 },   {"device_id": 1, "device_type": "sensor-igauge", "ip": "213.161.254.1", "cca3": ["NOR", "Norway"], "temp": [30, 32,35], "signal": [18,18,19], "battery_level": [6, 6, 5], "c02_level": [1413, 1416, 1417], "timestamp" :1475600498 },   {"device_id": 3, "device_type": "sensor-inest", "ip": "66.39.173.154", "cca3": ["USA", "United States"], "temp":[47, 47, 48], "signal": [12,12,13], "battery_level": [1, 1, 0],  "c02_level": [1447,1446, 1448], "timestamp" :1475600502 },   {"device_id": 4, "device_type": "sensor-ipad", "ip": "203.82.41.9", "cca3":["PHL", "Philippines"], "temp":[29, 29, 28], "signal":[11, 11, 11], "battery_level":[0, 0, 0], "c02_level": [983, 990, 982], "timestamp" :1475600504 },  {"device_id": 5, "device_type": "sensor-istick", "ip": "204.116.105.67", "cca3": ["USA", "United States"], "temp":[50,51,50], "signal": [16,16,17], "battery_level": [8,8, 8], "c02_level": [1574,1575,1576], "timestamp" :1475600506 },   {"device_id": 6, "device_type": "sensor-ipad", "ip": "220.173.179.1", "cca3": ["CHN", "China"], "temp": [21,21,22], "signal": [18,18,19], "battery_level": [9,9,9], "c02_level": [1249,1249,1250], "timestamp" :1475600508 },  {"device_id": 7, "device_type": "sensor-ipad", "ip": "118.23.68.227", "cca3": ["JPN", "Japan"], "temp":[27,27,28], "signal": [15,15,29], "battery_level":[0,0,0], "c02_level": [1531,1532,1531], "timestamp" :1475600512 },  {"device_id": 8, "device_type": "sensor-inest", "ip": "208.109.163.218", "cca3": ["USA", "United States"], "temp":[40,40,41], "signal": [16,16,17], "battery_level":[ 9, 9, 10], "c02_level": [1208,1209,1208], "timestamp" :1475600514},  {"device_id": 9, "device_type": "sensor-ipad", "ip": "88.213.191.34", "cca3": ["ITA", "Italy"], "temp": [19,28,5], "signal": [11, 5, 24], "battery_level": [0,-1,0], "c02_level": [1171, 1240, 1400], "timestamp" :1475600516 },  {"device_id": 10, "device_type": "sensor-igauge", "ip": "68.28.91.22", "cca3": ["USA", "United States"], "temp": [32,33,32], "signal": [26,26,25], "battery_level": [7,7,8], "c02_level": [886,886,887], "timestamp" :1475600518 },  {"device_id": 11, "device_type": "sensor-ipad", "ip": "59.144.114.250", "cca3": ["IND", "India"], "temp": [46,45,44], "signal": [25,25,24], "battery_level": [4,5,5], "c02_level": [863,862,864], "timestamp" :1475600520 },  {"device_id": 12, "device_type": "sensor-igauge", "ip": "193.156.90.200", "cca3": ["NOR", "Norway"], "temp": [18,17,18], "signal": [26,25,26], "battery_level": [8,9,8], "c02_level": [1220,1221,1220], "timestamp" :1475600522 },  {"device_id": 13, "device_type": "sensor-ipad", "ip": "67.185.72.1", "cca3": ["USA", "United States"], "temp": [34,35,34], "signal": [20,21,20], "battery_level": [8,8,8], "c02_level": [1504,1504,1503], "timestamp" :1475600524 },  {"device_id": 14, "device_type": "sensor-inest", "ip": "68.85.85.106", "cca3": ["USA", "United States"], "temp": [39,40,38], "signal": [17, 17, 18], "battery_level": [8,8,7], "c02_level": [831,832,831], "timestamp" :1475600526 },  {"device_id": 15, "device_type": "sensor-ipad", "ip": "161.188.212.254", "cca3": ["USA", "United States"], "temp": [27,27,28], "signal": [26,26,25], "battery_level": [5,5,5], "c02_level": [1378,1376,1378], "timestamp" :1475600528 },  {"device_id": 16, "device_type": "sensor-igauge", "ip": "221.3.128.242", "cca3": ["CHN", "China"], "temp": [10,10,11], "signal": [24,24,23], "battery_level": [6,5,6], "c02_level": [1423, 1423, 1423], "timestamp" :1475600530 },  {"device_id": 17, "device_type": "sensor-ipad", "ip": "64.124.180.215", "cca3": ["USA", "United States"], "temp": [38,38,39], "signal": [17,17,17], "battery_level": [9,9,9], "c02_level": [1304,1304,1304], "timestamp" :1475600532 },  {"device_id": 18, "device_type": "sensor-igauge", "ip": "66.153.162.66", "cca3": ["USA", "United States"], "temp": [26, 0, 99], "signal": [10, 1, 5], "battery_level": [0, 0, 0], "c02_level": [902,902, 1300], "timestamp" :1475600534 },  {"device_id": 19, "device_type": "sensor-ipad", "ip": "193.200.142.254", "cca3": ["AUT", "Austria"], "temp": [32,32,33], "signal": [27,27,28], "battery_level": [5,5,5], "c02_level": [1282, 1282, 1281], "timestamp" :1475600536 }  ]""", schema2) display(dataDF2) 

device_id

battery_level

c02_level

signal

temp

cca3

device_type

timestamp

[8, 9, 7]

[917, 921, 925]

[23, 22, 24]

[25, 26, 27]

["USA", "United States"]

sensor-ipad

68.161.225.1

2016-10-04T17:01:36.000+0000

[6, 6, 5]

[1413, 1416, 1417]

[18, 18, 19]

[30, 32, 35]

["NOR", "Norway"]

sensor-igauge

213.161.254.1

2016-10-04T17:01:38.000+0000

[1, 1, 0]

[1447, 1446, 1448]

[12, 12, 13]

[47, 47, 48]

["USA", "United States"]

sensor-inest

66.39.173.154

2016-10-04T17:01:42.000+0000

[0, 0, 0]

[983, 990, 982]

[11, 11, 11]

[29, 29, 28]

["PHL", "Philippines"]

sensor-ipad

203.82.41.9

2016-10-04T17:01:44.000+0000

[8, 8, 8]

[1574, 1575, 1576]

[16, 16, 17]

[50, 51, 50]

["USA", "United States"]

sensor-istick

204.116.105.67

2016-10-04T17:01:46.000+0000

[9, 9, 9]

[1249, 1249, 1250]

[18, 18, 19]

[21, 21, 22]

["CHN", "China"]

sensor-ipad

220.173.179.1

2016-10-04T17:01:48.000+0000

[0, 0, 0]

[1531, 1532, 1531]

[15, 15, 29]

[27, 27, 28]

["JPN", "Japan"]

sensor-ipad

118.23.68.227

2016-10-04T17:01:52.000+0000

[9, 9, 10]

[1208, 1209, 1208]

[16, 16, 17]

[40, 40, 41]

["USA", "United States"]

sensor-inest

208.109.163.218

2016-10-04T17:01:54.000+0000

[0, -1, 0]

[1171, 1240, 1400]

[11, 5, 24]

[19, 28, 5]

["ITA", "Italy"]

sensor-ipad

88.213.191.34

2016-10-04T17:01:56.000+0000

[7, 7, 8]

[886, 886, 887]

[26, 26, 25]

[32, 33, 32]

["USA", "United States"]

sensor-igauge

68.28.91.22

2016-10-04T17:01:58.000+0000

[4, 5, 5]

[863, 862, 864]

[25, 25, 24]

[46, 45, 44]

["IND", "India"]

sensor-ipad

59.144.114.250

2016-10-04T17:02:00.000+0000

[8, 9, 8]

[1220, 1221, 1220]

[26, 25, 26]

[18, 17, 18]

["NOR", "Norway"]

sensor-igauge

193.156.90.200

2016-10-04T17:02:02.000+0000

[8, 8, 8]

[1504, 1504, 1503]

[20, 21, 20]

[34, 35, 34]

["USA", "United States"]

sensor-ipad

67.185.72.1

2016-10-04T17:02:04.000+0000

[8, 8, 7]

[831, 832, 831]

[17, 17, 18]

[39, 40, 38]

["USA", "United States"]

sensor-inest

68.85.85.106

2016-10-04T17:02:06.000+0000

[5, 5, 5]

[1378, 1376, 1378]

[26, 26, 25]

[27, 27, 28]

["USA", "United States"]

sensor-ipad

161.188.212.254

2016-10-04T17:02:08.000+0000

Showing all 19 rows.

dataDF2.printSchema()

dataDF2.createOrReplaceTempView("iot_nested_data")

%sql select cca3, device_type, battery_level,     transform (battery_level, bl -> bl > 0) as boolean_battery_level     from iot_nested_data

cca3

device_type

battery_level

boolean_battery_level

["USA", "United States"]

sensor-ipad

[8, 9, 7]

[true, true, true]

["NOR", "Norway"]

sensor-igauge

[6, 6, 5]

[true, true, true]

["USA", "United States"]

sensor-inest

[1, 1, 0]

[true, true, false]

["PHL", "Philippines"]

sensor-ipad

[0, 0, 0]

[false, false, false]

["USA", "United States"]

sensor-istick

[8, 8, 8]

[true, true, true]

["CHN", "China"]

sensor-ipad

[9, 9, 9]

[true, true, true]

["JPN", "Japan"]

sensor-ipad

[0, 0, 0]

[false, false, false]

["USA", "United States"]

sensor-inest

[9, 9, 10]

[true, true, true]

["ITA", "Italy"]

sensor-ipad

[0, -1, 0]

[false, false, false]

["USA", "United States"]

sensor-igauge

[7, 7, 8]

[true, true, true]

["IND", "India"]

sensor-ipad

[4, 5, 5]

[true, true, true]

["NOR", "Norway"]

sensor-igauge

[8, 9, 8]

[true, true, true]

["USA", "United States"]

sensor-ipad

[8, 8, 8]

[true, true, true]

["USA", "United States"]

sensor-inest

[8, 8, 7]

[true, true, true]

["USA", "United States"]

sensor-ipad

[5, 5, 5]

[true, true, true]

Showing all 19 rows.

%sql select cca3,     transform (cca3, c -> lcase(c)) as lower_cca3,     transform (cca3, c -> ucase(c)) as upper_cca3     from iot_nested_data

Command skipped

%sql select cca3, device_type, battery_level,     filter (battery_level, bl -> bl < 5) as low_levels     from iot_nested_data

cca3

device_type

battery_level

low_levels

["USA", "United States"]

sensor-ipad

[8, 9, 7]

[]

["NOR", "Norway"]

sensor-igauge

[6, 6, 5]

[]

["USA", "United States"]

sensor-inest

[1, 1, 0]

["PHL", "Philippines"]

sensor-ipad

[0, 0, 0]

["USA", "United States"]

sensor-istick

[8, 8, 8]

[]

["CHN", "China"]

sensor-ipad

[9, 9, 9]

[]

["JPN", "Japan"]

sensor-ipad

[0, 0, 0]

["USA", "United States"]

sensor-inest

[9, 9, 10]

[]

["ITA", "Italy"]

sensor-ipad

[0, -1, 0]

["USA", "United States"]

sensor-igauge

[7, 7, 8]

[]

["IND", "India"]

sensor-ipad

[4, 5, 5]

[4]

["NOR", "Norway"]

sensor-igauge

[8, 9, 8]

[]

["USA", "United States"]

sensor-ipad

[8, 8, 8]

[]

["USA", "United States"]

sensor-inest

[8, 8, 7]

[]

["USA", "United States"]

sensor-ipad

[5, 5, 5]

[]

Showing all 19 rows.

%sql select cca3, device_type, battery_level,     reduce(battery_level, 0, (t, acc) -> t + acc,  acc -> acc div size(battery_level) ) as average_battery_level     from iot_nested_data     sort by average_battery_level desc

cca3

device_type

battery_level

average_battery_level

["CHN", "China"]

sensor-ipad

[9, 9, 9]

["USA", "United States"]

sensor-inest

[9, 9, 10]

["USA", "United States"]

sensor-ipad

[9, 9, 9]

["USA", "United States"]

sensor-ipad

[8, 9, 7]

["USA", "United States"]

sensor-istick

[8, 8, 8]

["NOR", "Norway"]

sensor-igauge

[8, 9, 8]

["USA", "United States"]

sensor-ipad

[8, 8, 8]

["USA", "United States"]

sensor-igauge

[7, 7, 8]

["USA", "United States"]

sensor-inest

[8, 8, 7]

["NOR", "Norway"]

sensor-igauge

[6, 6, 5]

["USA", "United States"]

sensor-ipad

[5, 5, 5]

["CHN", "China"]

sensor-igauge

[6, 5, 6]

["AUT", "Austria"]

sensor-ipad

[5, 5, 5]

["IND", "India"]

sensor-ipad

[4, 5, 5]

["USA", "United States"]

sensor-inest

[1, 1, 0]

Showing all 19 rows.

%sql select cca3, device_type, temp,     reduce(temp, 0, (t, acc) -> t + acc,  acc -> acc div size(temp) ) as average_temp     from iot_nested_data     sort by average_temp desc

%sql select cca3, device_type, c02_level,     reduce(c02_level, 0, (t, acc) -> t + acc,  acc -> acc div size(c02_level) ) as average_c02_level     from iot_nested_data     sort by average_c02_level desc

cca3

device_type

c02_level

average_c02_level

["USA", "United States"]

sensor-istick

[1574, 1575, 1576]

1575

["JPN", "Japan"]

sensor-ipad

[1531, 1532, 1531]

1531

["USA", "United States"]

sensor-ipad

[1504, 1504, 1503]

1503

["USA", "United States"]

sensor-inest

[1447, 1446, 1448]

1447

["CHN", "China"]

sensor-igauge

[1423, 1423, 1423]

1423

["NOR", "Norway"]

sensor-igauge

[1413, 1416, 1417]

1415

["USA", "United States"]

sensor-ipad

[1378, 1376, 1378]

1377

["USA", "United States"]

sensor-ipad

[1304, 1304, 1304]

1304

["AUT", "Austria"]

sensor-ipad

[1282, 1282, 1281]

1281

["ITA", "Italy"]

sensor-ipad

[1171, 1240, 1400]

1270

["CHN", "China"]

sensor-ipad

[1249, 1249, 1250]

1249

["NOR", "Norway"]

sensor-igauge

[1220, 1221, 1220]

1220

["USA", "United States"]

sensor-inest

[1208, 1209, 1208]

1208

["USA", "United States"]

sensor-igauge

[902, 902, 1300]

1034

["PHL", "Philippines"]

sensor-ipad

[983, 990, 982]

985

["USA", "United States"]

sensor-ipad

[917, 921, 925]

921

["USA", "United States"]

sensor-igauge

[886, 886, 887]

886

Showing all 19 rows.

%sql select cca3, device_type, signal, temp, c02_level,     reduce(signal, 0, (s, sacc) -> s + sacc,  sacc -> sacc div size(signal) ) as average_signal,     reduce(temp, 0, (t, tacc) -> t + tacc,  tacc -> tacc div size(temp) ) as average_temp,     reduce(c02_level, 0, (c, cacc) -> c + cacc,  cacc -> cacc div size(c02_level) ) as average_c02_level     from iot_nested_data     sort by average_signal desc

cca3

device_type

signal

temp

c02_level

average_signal

average_temp

average_c02_level

["AUT", "Austria"]

sensor-ipad

[27, 27, 28]

[32, 32, 33]

[1282, 1282, 1281]

1281

["USA", "United States"]

sensor-igauge

[26, 26, 25]

[32, 33, 32]

[886, 886, 887]

886

["NOR", "Norway"]

sensor-igauge

[26, 25, 26]

[18, 17, 18]

[1220, 1221, 1220]

1220

["USA", "United States"]

sensor-ipad

[26, 26, 25]

[27, 27, 28]

[1378, 1376, 1378]

1377

["IND", "India"]

sensor-ipad

[25, 25, 24]

[46, 45, 44]

[863, 862, 864]

863

["USA", "United States"]

sensor-ipad

[23, 22, 24]

[25, 26, 27]

[917, 921, 925]

921

["CHN", "China"]

sensor-igauge

[24, 24, 23]

[10, 10, 11]

[1423, 1423, 1423]

1423

["USA", "United States"]

sensor-ipad

[20, 21, 20]

[34, 35, 34]

[1504, 1504, 1503]

1503

["JPN", "Japan"]

sensor-ipad

[15, 15, 29]

[27, 27, 28]

[1531, 1532, 1531]

1531

["NOR", "Norway"]

sensor-igauge

[18, 18, 19]

[30, 32, 35]

[1413, 1416, 1417]

1415

["CHN", "China"]

sensor-ipad

[18, 18, 19]

[21, 21, 22]

[1249, 1249, 1250]

1249

["USA", "United States"]

sensor-inest

[17, 17, 18]

[39, 40, 38]

[831, 832, 831]

831

["USA", "United States"]

sensor-ipad

[17, 17, 17]

[38, 38, 39]

[1304, 1304, 1304]

1304

["USA", "United States"]

sensor-istick

[16, 16, 17]

[50, 51, 50]

[1574, 1575, 1576]

1575

["USA", "United States"]

sensor-inest

[16, 16, 17]

[40, 40, 41]

[1208, 1209, 1208]

1208

Showing all 19 rows.

higher-order-functions-tutorial-python(Python)

Higher-Order and Lambda Functions: Explore Complex and Structured Data in SQL

SQL Higher-Order Functions and Lambda Expressions

How to use `transform()`

How to use `filter()`

How to use `exists()`

How to use `reduce()`

How to use `aggregate()`

Another example using similar nested structure with IoT JSON data.

How to use `transform()`

How to use `filter()`

How to use `reduce()`

higher-order-functions-tutorial-python(Python)

Higher-Order and Lambda Functions: Explore Complex and Structured Data in SQL

SQL Higher-Order Functions and Lambda Expressions

How to use transform()

How to use filter()

How to use exists()

How to use reduce()

How to use aggregate()

Another example using similar nested structure with IoT JSON data.

How to use transform()

How to use filter()

How to use reduce()

How to use `transform()`

How to use `filter()`

How to use `exists()`

How to use `reduce()`

How to use `aggregate()`

How to use `transform()`

How to use `filter()`

How to use `reduce()`