Skip to main content

arrays_overlap

Returns a boolean column indicating if the input arrays have common non-null elements. Returns true if they do, null if the arrays do not contain any common elements but are not empty and at least one of them contains a null element, and false otherwise.

Syntax

Python
from pyspark.sql import functions as sf

sf.arrays_overlap(a1, a2)

Parameters

Parameter

Type

Description

a1

pyspark.sql.Column or str

The name of the column that contains the first array.

a2

pyspark.sql.Column or str

The name of the column that contains the second array.

Returns

pyspark.sql.Column: A new Column of Boolean type, where each value indicates whether the corresponding arrays from the input columns contain any common elements.

Examples

Example 1: Basic usage of arrays_overlap function.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b"], ["b", "c"]), (["a"], ["b", "c"])], ['x', 'y'])
df.select(sf.arrays_overlap(df.x, df.y)).show()
Output
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
| true|
| false|
+--------------------+

Example 2: Usage of arrays_overlap function with arrays containing null elements.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", None], ["b", None]), (["a"], ["b", "c"])], ['x', 'y'])
df.select(sf.arrays_overlap(df.x, df.y)).show()
Output
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
| NULL|
| false|
+--------------------+

Example 3: Usage of arrays_overlap function with arrays that are null.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(None, ["b", "c"]), (["a"], None)], ['x', 'y'])
df.select(sf.arrays_overlap(df.x, df.y)).show()
Output
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
| NULL|
| NULL|
+--------------------+

Example 4: Usage of arrays_overlap on arrays with identical elements.

Python
from pyspark.sql import functions as sf
df = spark.createDataFrame([(["a", "b"], ["a", "b"]), (["a"], ["a"])], ['x', 'y'])
df.select(sf.arrays_overlap(df.x, df.y)).show()
Output
+--------------------+
|arrays_overlap(x, y)|
+--------------------+
| true|
| true|
+--------------------+