createDataFrameΒΆ

createDataFrame creates SparkR DataFrames from local R data frames.

SparkR’s distributed DataFrame implementation supports operations like selection, filtering, aggregation etc. on large datasets.

Syntax:

  • createDataFrame(sqlContext, localdf)

Parameters:

  • sqlContext: SQLContext. This is already created for you in the Databricks notebooks, do not recreate!
  • localdf: local R data frame

Output:

  • SparkR DataFrame
# Create SparkR DataFrame using the faithful dataset from R
df <- createDataFrame(sqlContext, faithful)

# Displays the content of the DataFrame to stdout
head(df)
# Create a local R data frame
localdf <- data.frame(customer = c("James", "Peter", "Jane", "James"),
                      amount = c(5, 5, 6, 5))
str(localdf)
# Convert to SparkR DataFrame
sparkdf <- createDataFrame(sqlContext, localdf)
str(sparkdf)