persistΒΆ

persist persists the SparkDataFrame with the specified storage level.

Note: Read the programming guide <http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>__ to find out more about the different storage levels and how to choose between them.

Syntax:

  • persist(SparkDataFrame, “storageLevel”)

Parameters:

  • SparkDataFrame: Any SparkDataFrame
  • storageLevel: String, storage level to persist SparkDataFrame at. For eg: “MEMORY_ONLY”, “MEMORY_AND_DISK”, “DISK_ONLY”, etc.

Output:

  • SparkDataFrame
require(SparkR)

# Create SparkR DataFrame using the faithful dataset from R
df <- createDataFrame(faithful)

# Displays the content of the SparkDataFrame to stdout
head(df)
# Cache df in memory only (this is thet same result as using cache())
persist(df, "MEMORY_ONLY")

To see how much of your data was cached, perform a SparkDataFrame Action and refer to Spark UI > Storage.

# Perform SparkDataFrame operations on cached df
collect(df)