describeΒΆ

describe computes statistics for numeric columns. If no columns are given, statistics for all numerical columns will be returned.

Statistics returned: count, mean, stddev, min, max

Syntax:

  • describe(df)
  • describe(df, “colName”, ...)

Parameters:

  • df: Any SparkDataFrame
  • colName: String, column in SparkDataFrame

Output:

  • SparkDataFrame
require(SparkR)

# Create SparkDataFrame
df <- createDataFrame(mtcars)
head(df)
# Compute statistics for all numerical columns
collect(describe(df))
# Compute statistics for only mpg and disp columns
collect(describe(df, "mpg", "disp"))