intersectΒΆ

intersect returns a new SparkDataFrame containing only rows that are in both SparkDataFrames. This is equivalent to the INTERSECT query in SQL.

Syntax:

  • intersect(df1, df2)

Parameters:

  • df1: Any SparkDataFrame
  • df2: Any SparkDataFrame

Output:

  • SparkDataFrame
require(SparkR)

newHires <- data.frame(name = c("Thomas", "George", "George", "John"),
                       surname = c("Smith", "Williams", "Brown", "Taylor"))
salesTeam <- data.frame(name = c("Lucas", "Bill", "George"),
                        surname = c("Martin", "Clark", "Williams"))

# Create SparkDataFrame
newHiresDF <- createDataFrame(newHires)
salesTeamDF <- createDataFrame(salesTeam)

head(newHiresDF)
head(salesTeamDF)
# Use intersect() to find rows that occur in both SparkDataFrames
newSalesHire <- intersect(newHiresDF, salesTeamDF)
head(newSalesHire)
# Returns a SparkDataFrame
str(newSalesHire)