kde
Generates a Kernel Density Estimate (KDE) plot using Gaussian kernels.
In statistics, kernel density estimation is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.
Syntax
kde(bw_method, column=None, ind=None, **kwargs)
Parameters
Parameter | Type | Description |
|---|---|---|
| int or float | The method used to calculate the estimator bandwidth. See |
| str or list of str, optional | Column name or list of names to use for creating the KDE plot. If |
| list of float, NumPy array, or int, optional | Evaluation points for the estimated PDF. If |
| optional | Additional keyword arguments. |
Returns
plotly.graph_objs.Figure
Examples
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
data = [(5.1, 3.5, 0), (4.9, 3.0, 0), (7.0, 3.2, 1), (6.4, 3.2, 1), (5.9, 3.0, 2)]
columns = ["length", "width", "species"]
df = spark.createDataFrame(data, columns)
df.plot.kde(bw_method=0.3, ind=100)