crosstab computes a pair-wise frequency table of the given columns, also known as a contingency table.
The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero pair frequencies will be returned. Pairs that have no occurrences will have zero as their counts.
- crosstab(DataFrame, col1, col2)
- DataFrame: Any SparkR DataFrame
- col1: String, any column in DataFrame
- col2: String, any column in DataFrame
- Local R Data Frame
# Create SparkR DataFrame df <- createDataFrame(sqlContext, mtcars) head(df)
SparkR’s crosstab is similar to the table function in base R. In SparkR, the table function has been overwritten to convert an existing Spark SQL Table into a DataFrame, and will not return contingency tables as you might have expected.
# Create contingency table with df$cyl and df$gear # Note that a local R data frame is returned crosstab(df, "cyl", "gear")