persist persists the DataFrame with the specified storage level.
Read the programming guide <http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence>__ to find out more about the different storage levels and how to choose between them.
- persist(DataFrame, “storageLevel”)
- DataFrame: Any SparkR DataFrame
- storageLevel: String, storage level to persist DataFrame at. For eg: “MEMORY_ONLY”, “MEMORY_AND_DISK”, “DISK_ONLY”, etc.
- SparkR DataFrame
# Create SparkR DataFrame using the faithful dataset from R df <- createDataFrame(sqlContext, faithful) # Displays the content of the DataFrame to stdout head(df)
# Cache df in memory only (this is thet same result as using cache()) persist(df, "MEMORY_ONLY")
To see how much of your data was cached, perform a DataFrame Action and refer to Spark UI > Storage.
# Perform DataFrame operations on cached df collect(df)