Higher-order functions

Databricks provides dedicated primitives for manipulating arrays in Apache Spark SQL. These primitives make working with arrays easier and more concise and don't require large amounts of boilerplate code. The primitives revolve around two functional programming constructs: higher-order functions and anonymous (lambda) functions. These work together to allow you to define functions that manipulate arrays in SQL.

Introduction

A higher-order function takes an array, implements how that array is processed, and dictates the computation result. It delegates to a lambda function how to process each item in the array.

The following notebooks introduce you to these functions.

Higher-order functions tutorial Python notebook

Open notebook in new tab

Introduction to higher-order functions notebook

Open notebook in new tab

Apache Spark built-in functions

Apache Spark has built-in functions for manipulating complex types, such as array types, including higher-order functions.

The following notebook illustrates Apache Spark built-in functions.

Apache Spark built-in functions notebook

Open notebook in new tab

Introduction​

Higher-order functions tutorial Python notebook

Introduction to higher-order functions notebook

Apache Spark built-in functions​

Apache Spark built-in functions notebook

Introduction

Apache Spark built-in functions