unionAll

unionAll combines two SparkR DataFrames by rows. This is equivalent to the UNION ALL operator in SQL. Note that this does not remove duplicate rows across the two DataFrames.

Syntax:

  • unionAll(df1, df2)

Parameters:

  • df1: Any SparkR DataFrame
  • df2: Any SparkR DataFrame

Output:

  • SparkR DataFrame
# Create 2 SparkR DataFrames
smallDF <- createDataFrame(sqlContext, data.frame(name = c("Mouse", "Rabbit", "Bird"),
                                                count = c(3, 5, 4)))
bigDF <- createDataFrame(sqlContext, data.frame(name = c("Elephant", "Buffalo", "Bird"),
                                                count = c(1, 2, 4)))
head(smallDF)
head(bigDF)
# Combine the 2 DataFrames.
unionDF <- unionAll(smallDF, bigDF)

# Count number of rows. Since no dupe rows were removed, we should get 6 rows
count(unionDF)
head(unionDF)

To combine more than 2 DataFrames, you can use rbind.