subsetΒΆ

subset returns a new SparkDataFrame that contains rows that meet the row filter conditions for the selected columns.

Syntax:

  • subset(df, rowExpr, cols)
  • df[rowExpr, cols]

Parameters:

  • SparkDataFrame: Any SparkR DataFrame
  • rowExpr: Logical condition for row filtering
  • cols: Column or list of columns to include in subset

Output:

  • SparkDataFrame
# Create SparkDataFrame
df <- createDataFrame(mtcars)
head(df)
# Subset df for rows where df$carb > 1
# Include only columns 1 & 2
subsetDF <- subset(df, df$carb > 1, 1:2)
head(subsetDF)
# Subset df for rows where df$gear == 3 or df$gear == 4
# Include only columns 1,2,3
subsetDF2 <- subset(df, df$gear %in% c(3,4), c(1,2,3))
head(subsetDF2)
# If no cols parameter is specified, all columns will be returned
# Select all columns where df$cyl == 4 and df$cyl == 6
head(subset(df, df$cyl %in% c(4, 6)))

We can also obtain SparkDataFrame subsets with R-like syntax.

# Return subset of df for rows where df$cyl == 6
# Include all columns
head(df[df$cyl==6,])