describe

describe computes statistics for numeric columns. If no columns are given, statistics for all numerical columns will be returned.

Statistics returned: count, mean, stddev, min, max

Syntax:

  • describe(df)
  • describe(df, “colName”, ...)

Parameters:

  • df: Any SparkR DataFrame
  • colName: String, column in DataFrame

Output:

  • SparkR DataFrame
# Create SparkR DataFrame
df <- createDataFrame(sqlContext, mtcars)
head(df)
# Compute statistics for all numerical columns
collect(describe(df))
# Compute statistics for only mpg and disp columns
collect(describe(df, "mpg", "disp"))