cache persists the DataFrame with the default storage level (MEMORY_ONLY).
Read the programming guide <http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence>__ to find out more about the different storage levels. To be able to choose a particular storage level, use persist.
- DataFrame: Any SparkR DataFrame
- SparkR DataFrame
# Create SparkR DataFrame using the faithful dataset from R df <- createDataFrame(sqlContext, faithful) # Displays the content of the DataFrame to stdout head(df)
# Cache df in memory cache(df)
We can perform DataFrame operations on the cached DataFrame. To see how much of your data was cached, perform a DataFrame Action and refer to Spark UI > Storage.
# Perform DataFrame operations on cached df # Here we use the action collect() collect(df)