matching operations

The %in% operator matches column values against a vector of values, and returns a TRUE/FALSE Boolean Column Object indicating if there is a match or not. It is commonly used in conjunction with filter.

Syntax:

  • DataFrame$col %in% values
  • “col in values”

Parameters:

  • DataFrame: Any SparkR DataFrame
  • col: Column in SparkR DataFrame
  • values: Vector, values to be matched against

Output:

  • Column Object
# Create SparkR DataFrame
df <- createDataFrame(sqlContext, mtcars)
head(df)
# Match df$carb for values 1 and 4
# Column with boolean values returned
head(select(df, df$carb %in% c(1,4)))
# Use filter() with %in%
# Filter DataFrame for rows where df$carb = 1 and df$carb = 4 (same condition as above cell)
filtered <- filter(df, df$carb %in% c(1,4))
collect(filtered)
# Alternative Syntax using SQL statement strings
filtered2 <- filter(df, "carb in (1,4)")
collect(filtered2)