Skip to main content

filter

Returns an array of elements for which a predicate holds in a given array. Supports Spark Connect.

For the corresponding Databricks SQL function, see filter function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.filter(col=<col>, f=<f>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

Name of column or expression.

f

function

A function that returns the Boolean expression. Can take one of the following forms: Unary (x: Column) -> Column or Binary (x: Column, i: Column) -> Column where the second argument is a 0-based index of the element.

Returns

pyspark.sql.Column: filtered array of elements where given function evaluated to True when passed as an argument.

Examples

Python
from pyspark.databricks.sql import functions as dbf
df = spark.createDataFrame(
[(1, ["2018-09-20", "2019-02-03", "2019-07-01", "2020-06-01"])],
("key", "values")
)
def after_second_quarter(x):
return dbf.month(dbf.to_date(x)) > 6
df.select(
dbf.filter("values", after_second_quarter).alias("after_second_quarter")
).show(truncate=False)
Output
+------------------------+
|after_second_quarter |
+------------------------+
|[2018-09-20, 2019-07-01]|
+------------------------+