Higher-order functions

Databricks provides dedicated primitives for manipulating arrays in Apache Spark SQL; these make working with arrays much easier and more concise and do away with the large amounts of boilerplate code typically required. The primitives revolve around two functional programming constructs: higher-order functions and anonymous (lambda) functions. These work together to allow you to define functions that manipulate arrays in SQL. A higher-order function takes an array, implements how the array is processed, and what the result of the computation will be. It delegates to a lambda function how to process each item in the array.

Introduction to higher-order functions notebook

Open notebook in new tab

Higher-order functions tutorial Python notebook

Open notebook in new tab

Apache Spark built-in functions

Apache Spark has built-in functions for manipulating complex types (for example, array types), including higher-order functions.

The following notebook illustrates Apache Spark built-in functions.

Apache Spark built-in functions notebook

Open notebook in new tab