collect

collect returns all elements of a SparkR DataFrame as a local R data frame.

Note:

Ensure that your SparkR DataFrame fits on a single worker, or you will encounter an error that says that your results exceed spark.driver.maxResultSize (4.0GB).

Syntax:

  • collect(DataFrame, stringsAsFactors)

Parameters:

  • DataFrame: Any SparkR DataFrame
  • stringsAsFactors: Boolean value indicating if column values should be treated as local R Factors or not. Optional.

Output:

  • local R data frame
# Create SparkR DataFrame
df <- createDataFrame(sqlContext, iris)
str(df)
# Collect df into a local R data frame
# Species column converted to type Factor
local <- collect(df, TRUE)
str(local)
# View a few rows from local R data frame
local[1:6,]