sampleΒΆ

sample returns a sampled subset of a SparkDataFrame using a random seed, with or without replacement.

Syntax:

  • sample(SparkDataFrame, withReplacement, fraction)

Parameters:

  • SparkDataFrame: Any SparkDataFrame
  • withReplacement: Boolean value indicating if sampling should be done with replacement
  • fraction: Numeric, fraction of dataset to be returned

Output:

  • SparkDataFrame
require(SparkR)

# Create SparkDataFrame
df <- createDataFrame(quakes)

# Count number of rows in df
count(df)
# Create a 0.1 sample of df, without replacement
# Random seed will be used
subsetDF <- sample(df, FALSE, 0.1)

# Count number of rows in subsetDF
# Approximate 0.1 subset of original df
count(subsetDF)
head(subsetDF)