crosstabΒΆ

crosstab computes a pair-wise frequency table of the given columns, also known as a contingency table.

Note:

The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned. Pairs that have no occurrences will have zero as their counts.

Syntax:

  • crosstab(SparkDataFrame, col1, col2)

Parameters:

  • SparkDataFrame: Any SparkDataFrame
  • col1: String, any column in SparkDataFrame
  • col2: String, any column in SparkDataFrame

Output:

  • Local R Data Frame
require(SparkR)

# Create SparkDataFrame
df <- createDataFrame(mtcars)
head(df)

SparkR’s crosstab is similar to the table function in base R.

# Create contingency table with df$cyl and df$gear
# Note that a local R data frame is returned
crosstab(df, "cyl", "gear")