cache persists the SparkDataFrame with the default storage level (MEMORY_ONLY).
- SparkDataFrame: Any SparkDataFrame
require(SparkR) # Create SparkDataFrame using the faithful dataset from R df <- createDataFrame(faithful) # Displays the content of the SparkDataFrame to stdout head(df)
# Cache df in memory cache(df)
We can perform SparkDataFrame operations on the cached SparkDataFrame. To see how much of your data was cached, perform a SparkDataFrame Action and refer to Spark UI > Storage.
# Perform SparkDataFrame operations on cached df # Here we use the action collect() collect(df)