Skip to main content

box

Creates a box-and-whisker plot from DataFrame columns.

A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of the box to show the range of the data. By default, they extend no more than 1.5 × IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

Syntax

box(column=None, **kwargs)

Parameters

Parameter

Type

Description

column

str or list of str, optional

Column name or list of names to use for creating the box plot. If None (default), all numeric columns are used.

**kwargs

optional

Additional keyword arguments. Supports precision: a float used to compute approximate statistics for the box plot. Default: 0.01. Use smaller values for more precise statistics.

Returns

plotly.graph_objs.Figure

Examples

Python
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [
("A", 50, 55),
("B", 55, 60),
("C", 60, 65),
("D", 65, 70),
("E", 70, 75),
("F", 10, 15),
("G", 85, 90),
("H", 5, 150),
]
columns = ["student", "math_score", "english_score"]
df = spark.createDataFrame(data, columns)
df.plot.box()