cacheΒΆ

cache persists the SparkDataFrame with the default storage level (MEMORY_ONLY).

Note: Read the programming guide <http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>__ to find out more about the different storage levels. To be able to choose a particular storage level, use persist.

Syntax:

  • cache(SparkDataFrame)

Parameters:

  • SparkDataFrame: Any SparkDataFrame

Output:

  • SparkDataFrame
require(SparkR)

# Create SparkDataFrame using the faithful dataset from R
df <- createDataFrame(faithful)

# Displays the content of the SparkDataFrame to stdout
head(df)
# Cache df in memory
cache(df)

We can perform SparkDataFrame operations on the cached SparkDataFrame. To see how much of your data was cached, perform a SparkDataFrame Action and refer to Spark UI > Storage.

# Perform SparkDataFrame operations on cached df
# Here we use the action collect()
collect(df)