collectΒΆ

collect returns all elements of a SparkDataFrame as a local R data.frame.

Note:

Ensure that your SparkDataFrame fits on a single worker, or you will encounter an error that says that your results exceed spark.driver.maxResultSize (4.0GB). Also note that this function involves transvering data between Spark workers and driver and then serializing data in the R process.

Syntax:

  • collect(SparkDataFrame, stringsAsFactors)

Parameters:

  • SparkDataFrame: Any SparkDataFrame
  • stringsAsFactors: Optional boolean flag indicating if column values should be treated as local R Factors or not. Default is FALSE

Output:

  • local R data.frame
require(SparkR)

# Create SparkDataFrame
df <- createDataFrame(iris)
str(df)
# Collect df into a local R data.frame
# Species column converted to type Factor
local <- collect(df, stringsAsFactors = TRUE)
str(local)
# View a few rows from local R data.frame
local[1:6,]