Boxplot visualizations in Databricks notebooks

The boxplot visualization shows the distribution summary of numerical data, optionally grouped by category. Using a boxplot visualization, you can quickly compare the value ranges across categories and visualize the locality, spread and skewness groups of the values through their quartiles. Within each box, the darker line shows the interquartile range. For more information about interpreting boxplot visualizations, see the Box plot article on Wikipedia.

The result must include a numerical column and can include one or more grouping columns.

The following boxplot visualization shows the range of prices for each shipping method.

Example boxplot visualization

This article covers the options for boxplot visualizations.

General

  • Horizontal Chart: If selected, the box, which shows number distribution, is plotted along the x-axis.

  • Y Column: Y-axis values. For a vertical chart, choose a number column. For a horizontal chart, choose a categorical column.

  • X Column: X-axis values. For a vertical chart, choose a categorical column. For a horizontal chart, choose a number column.

  • Group By: Additional columns to group by, after the default grouping is applied. By default, results are grouped by the X-axis unless Horizontal Chart is also selected, in which case results are grouped by the Y-axis.

  • Legend Placement: Where to place the legend.

  • Legend Items Order: Either Normal or Reversed.

  • Show All Points: Whether to show each X-axis value separately or to show the range of values using a box.

  • Missing and NULL Values: Whether to hide missing or NULL values or to convert them to 0 and show them in the visualization.

Y Axis

  • Scale: Whether to auto-detect the scale units.

  • Name: Specify a display name for the Y-axis column if different from the column name.

  • Sort Values: Whether to sort values, regardless of the query.

  • Reverse Order: Whether to reverse the sort order.

  • Show Labels: Whether to show Y-axis labels.

  • Hide Axis: Whether to hide the Y-axis labels and line.

X Axis

  • Scale: Whether to auto-detect the scale units.

  • Name: Specify a display name for the X-axis column if different from the column name.

  • Min Value: Reduce visual clutter by showing values only above a minimum.

  • Max Value: Reduce visual clutter by showing values only below a maximum.

Series

Optionally override the display name for one or more series in the legend.

Colors

Optionally override the default color for one or more series.

Data Labels

Optionally override formatting options for one or more series.

Temporarily hide or show only a series

To hide a series in a visualization, click the series in the legend. To show the series again, click it again in the legend.

To show only a single series, double-click the series in the legend. To show other series, click each one.