PySparkPlotAccessor class

Accessor for DataFrame plotting functionality in PySpark.

Syntax

Python
# Call the accessor directly
df.plot(kind="line", ...)

# Use a dedicated method
df.plot.line(...)

Methods

Method	Description
`area(x, y, **kwargs)`	Draws a stacked area plot.
`bar(x, y, **kwargs)`	Draws a vertical bar plot.
`barh(x, y, **kwargs)`	Draws a horizontal bar plot.
`box(column, **kwargs)`	Draws a box-and-whisker plot from DataFrame columns.
`hist(column, bins, **kwargs)`	Draws a histogram of the DataFrame columns.
`kde(bw_method, column, ind, **kwargs)`	Generates a Kernel Density Estimate plot using Gaussian kernels.
`line(x, y, **kwargs)`	Plots DataFrame columns as lines.
`pie(x, y, **kwargs)`	Generates a pie plot.
`scatter(x, y, **kwargs)`	Creates a scatter plot.

Method	Description
`area(x, y, **kwargs)`	Draws a stacked area plot.
`bar(x, y, **kwargs)`	Draws a vertical bar plot.
`barh(x, y, **kwargs)`	Draws a horizontal bar plot.
`box(column, **kwargs)`	Draws a box-and-whisker plot from DataFrame columns.
`hist(column, bins, **kwargs)`	Draws a histogram of the DataFrame columns.
`kde(bw_method, column, ind, **kwargs)`	Generates a Kernel Density Estimate plot using Gaussian kernels.
`line(x, y, **kwargs)`	Plots DataFrame columns as lines.
`pie(x, y, **kwargs)`	Generates a pie plot.
`scatter(x, y, **kwargs)`	Creates a scatter plot.

Examples

Line plot

Python
data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)]
columns = ["category", "int_val", "float_val"]
df = spark.createDataFrame(data, columns)
df.plot.line(x="category", y="int_val")

Bar plot

Python
data = [("A", 10, 1.5), ("B", 30, 2.5), ("C", 20, 3.5)]
columns = ["category", "int_val", "float_val"]
df = spark.createDataFrame(data, columns)
df.plot.bar(x="category", y="int_val")

Scatter plot

Python
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.scatter(x="length", y="width")

Area plot

Python
from datetime import datetime

data = [
    (3, 5, 20, datetime(2018, 1, 31)),
    (2, 5, 42, datetime(2018, 2, 28)),
    (3, 6, 28, datetime(2018, 3, 31)),
    (9, 12, 62, datetime(2018, 4, 30)),
]
columns = ["sales", "signups", "visits", "date"]
df = spark.createDataFrame(data, columns)
df.plot.area(x="date", y=["sales", "signups", "visits"])

Box plot

Python
data = [
    ("A", 50, 55), ("B", 55, 60), ("C", 60, 65),
    ("D", 65, 70), ("E", 70, 75), ("F", 10, 15),
]
columns = ["student", "math_score", "english_score"]
df = spark.createDataFrame(data, columns)
df.plot.box()

KDE plot

Python
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.kde(bw_method=0.3, ind=100)

Histogram

Python
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.hist(bins=4)

Syntax​

Methods​

Examples​

Line plot​

Bar plot​

Scatter plot​

Area plot​

Box plot​

KDE plot​

Histogram​

Syntax

Methods

Examples

Line plot

Bar plot

Scatter plot

Area plot

Box plot

KDE plot

Histogram