matching operationsΒΆ

The %in% operator matches column values against a vector of values, and returns a TRUE/FALSE Boolean Column Object indicating if there is a match or not. It is commonly used in conjunction with filter.

Syntax:

  • SparkDataFrame$col %in% values
  • “col in values”

Parameters:

  • SparkDataFrame: Any SparkDataFrame
  • col: Column in SparkDataFrame
  • values: Vector, values to be matched against

Output:

  • Column Object
require(SparkR)

# Create SparkDataFrame
df <- createDataFrame(mtcars)
head(df)
# Match df$carb for values 1 and 4
# Column with boolean values returned
head(select(df, df$carb %in% c(1,4)))
# Use filter() with %in%
# Filter SparkDataFrame for rows where df$carb = 1 and df$carb = 4 (same condition as above cell)
filtered <- filter(df, df$carb %in% c(1,4))
collect(filtered)
# Alternative Syntax using SQL statement strings
filtered2 <- filter(df, "carb in (1,4)")
collect(filtered2)