crosstab computes a pair-wise frequency table of the given columns, also known as a contingency table.
The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned. Pairs that have no occurrences will have zero as their counts.
- crosstab(SparkDataFrame, col1, col2)
- SparkDataFrame: Any SparkDataFrame
- col1: String, any column in SparkDataFrame
- col2: String, any column in SparkDataFrame
- Local R Data Frame
require(SparkR) # Create SparkDataFrame df <- createDataFrame(mtcars) head(df)
SparkR’s crosstab is similar to the table function in base R.
# Create contingency table with df$cyl and df$gear # Note that a local R data frame is returned crosstab(df, "cyl", "gear")