Skip to main content

crosstab (DataFrameStatFunctions)

Computes a pair-wise frequency table of the given columns, also known as a contingency table. The first column of each row contains the distinct values of col1, and the column names are the distinct values of col2. The name of the first column is $col1_$col2. Pairs with no occurrences have a count of zero. DataFrame.crosstab and DataFrameStatFunctions.crosstab are aliases of each other.

Syntax

crosstab(col1, col2)

Parameters

Parameter

Type

Description

col1

str

The name of the first column. Distinct items make up the first column of each row.

col2

str

The name of the second column. Distinct items make up the column names of the resulting DataFrame.

Returns

DataFrame

Examples

Python
df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"])
df.stat.crosstab("c1", "c2").sort("c1_c2").show()
# +-----+---+---+---+
# |c1_c2| 10| 11| 8|
# +-----+---+---+---+
# | 1| 0| 2| 0|
# | 3| 1| 0| 0|
# | 4| 0| 0| 2|
# +-----+---+---+---+