Visualization types

This article outlines the types of visualizations available to use in Databricks notebooks and in Databricks SQL, and shows you how to create an example of each visualization type.

Note

To learn about the visualization types available for AI/BI dashboards, see Dashboard visualization types.

Bar chart

Bar charts represent the change in metrics over time or to show proportionality, similar to a pie chart.

Note

Bar charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Bar chart example

Configuration values: For this bar chart visualization, the following values were set:

  • X column:

    • Dataset column: o_orderdate

    • Date level: Months

  • Y columns:

    • Dataset column: o_totalprice

    • Aggregation type: Sum

  • Group by (dataset column): o_orderpriority

  • Stacking: Stack

  • X axis name (override default value): Order month

  • Y axis name (override default value): Total price

Configuration options: For bar chart configuration options, see chart configuration options.

SQL query: For this bar chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.orders

Line chart

Line charts present the change in one or more metrics over time.

Note

Line charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Line chart example

Configuration values: For this line chart visualization, the following values were set:

  • X column:

    • Dataset column: o_orderdate

    • Date level: Years

  • Y columns:

    • Dataset column: o_totalprice

    • Aggregation type: Average

  • Group by (dataset column): o_orderpriority

  • X axis name (override default value): Order year

  • Y axis name (override default value): Average price

Configuration options: For line chart configuration options, see chart configuration options.

SQL query: For this line chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.orders

Area chart

Area charts combine the line and bar chart to show how one or more groups’ numeric values change over the progression of a second variable, typically that of time. They are often used to show sales funnel changes through time.

Note

Area charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Area chart example

Configuration values: For this area chart visualization, the following values were set:

  • X column:

    • Dataset column: o_orderdate

    • Date level: Years

  • Y columns:

    • Dataset column: o_totalprice

    • Aggregation type: Sum

  • Group by (dataset column): o_orderpriority

  • Stacking: Stack

  • X axis name (override default value): Order year

  • Y axis name (override default value): Total price

Configuration options: For area chart configuration options, see chart configuration options.

SQL query: For this area chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.orders

Pie charts

Pie charts show proportionality between metrics. They are not meant for conveying time series data.

Note

Pie charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Pie chart example

Configuration values: For this pie chart visualization, the following values were set:

  • X column (dataset column): o_orderpriority

  • Y columns:

    • Dataset column: o_totalprice

    • Aggregation type: Sum

  • Label (override default value): Total price

Configuration options: For pie chart configuration options, see chart configuration options.

SQL query: For this pie chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.orders

Histogram charts

A histogram plots the frequency that a given value occurs in a dataset. A histogram helps you to understand whether a dataset has values that are clustered around a small number of ranges or are more spread out. A histogram is displayed as a bar chart in which you control the number of distinct bars (also called bins).

Note

Histogram charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Histogram chart example

Configuration values: For this histogram chart visualization, the following values were set:

  • X column (dataset column): o_totalprice

  • Number of bins: 20

  • X axis name (override default value): Total price

Configuration options: For histogram chart configuration options, see histogram chart configuration options.

SQL query: For this histogram chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.orders

Heatmap chart

Heatmap charts blend features of bar charts, stacking, and bubble charts allowing you to visualize numerical data using colors. A common color palette for a heatmap shows the highest values using warmer colors, like orange or red, and the lowest values using cooler colors, like blue or purple.

For example, consider the following heatmap that visualizes the most frequently occurring distances of taxi rides on each day and groups the results by the day of the week, distance, and the total fare.

Note

Heatmap charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Heatmap example

Configuration values: For this heatmap chart visualization, the following values were set:

  • X column (dataset column): o_orderpriority

  • Y columns (dataset column): o_orderstatus

  • Color column:

    • Dataset column: o_totalprice

    • Aggregation type: Average

  • X axis name (override default value): Order priority

  • Y axis name(override default value): Order status

  • Color scheme (override default value): YIGnBu

Configuration options: For heatmap configuration options, see heatmap chart configuation options.

SQL query: For this heatmap chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.orders

Scatter chart

Scatter visualizations are commonly used to show the relationship between two numerical variables. Additionally, a third dimension can be encoded with color to show how the numerical variables are different across groups.

Note

Scatter charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Scatter example

Configuration values: For this scatter chart visualization, the following values were set:

  • X column (dataset column): l_quantity

  • Y column (dataset column): l_extendedprice

  • Group by (dataset column): l_returnflag

  • X axis name (override default value): Quantity

  • Y axis name (override default value): Extended price

Configuration options: For scatter chart configuration options, see chart configuration options.

SQL query: For this scatter chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.lineitem

Bubble chart

Bubble charts are scatter charts where the size of each point marker reflects a relevant metric.

Note

Bubble charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Bubble example

Configuration values: For this bubble chart visualization, the following values were set:

  • X (dataset column): l_quantity

  • Y columns (dataset column): l_extendedprice

  • Group by (dataset column): l-returnflag

  • Bubble size column (dataset column): l_tax

  • Bubble size coefficient: 20

  • X axis name (override default value): Quantity

  • Y axis name (override default value): Extended price

Configuration options: For bubble chart configuration options, see chart configuration options.

SQL query: For this bubble chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.lineitem

Box chart

The box chart visualization shows the distribution summary of numerical data, optionally grouped by category. Using a box chart visualization, you can quickly compare the value ranges across categories and visualize the locality, spread and skewness groups of the values through their quartiles. In each box, the darker line shows the interquartile range. For more information about interpreting box plot visualizations, see the Box chart article on Wikipedia.

Note

Box charts only support aggregation for up to 64,000 rows. If a dataset is larger than 64,000 rows, data will be truncated.

Box chart example

Configuration values: For this box chart visualization, the following values were set:

  • X column (dataset column): l-returnflag

  • Y columns (dataset column): l_extendedprice

  • Group by (dataset column): l_shipmode

  • X axis name (override default value): Return flag1

  • Y axis name (override default value): Extended price

Configuration options: For box chart configuration options, see box chart configuation options.

SQL query: For this box chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.lineitem

Combo chart

Combo charts combine line and bar charts to present the changes over time with proportionality.

Note

Combo charts support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set.

Combo example

Configuration values: For this combo chart visualization, the following values were set:

  • X column (dataset column): l_shipdate

  • Y columns:

    • First dataset column: l_extendedprice

    • Aggregation type: average

    • Second dataset column: l_quantity

    • Aggregation type: average

  • X axis name (override default value): Ship date

  • Left Y axis name (override default value): Quantity

  • Right Y axis name (override default value): Average price

  • Series:

    • Order1 (dataset column): AVG(l_extendedprice)

    • Y axis: right

    • Type: Line

    • Order2 (dataset column): AVG(l_quantity)

    • Y axis: left

    • Type: Bar

Configuration options: For combo chart configuration options, see chart configuration options.

SQL query: For this combo chart visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.lineitem

Cohort analysis

A cohort analysis examines the outcomes of predetermined groups, called cohorts, as they progress through a set of stages. The cohort visualization only aggregates over dates (it allows for monthly aggregations). It does not do any other aggregations of data within the result set. All other aggregations are done within the query itself.

Cohort example

Configuration values: For this cohort visualization, the following values were set:

  • Date (bucket) (database column): cohort_month

  • Stage (database column): months

  • Bucket population size (database column): size

  • Stage value (database column): active

  • Time interval: monthly

Configuration options: For cohort configuration options, see cohort chart configuation options.

SQL query: For this cohort visualization, the following SQL query was used to generate the data set.

-- match each customer with its cohort by month
with cohort_dates as (
  SELECT o_custkey, min(date_trunc('month', o_orderdate)) as cohort_month
  FROM samples.tpch.orders
  GROUP BY 1
),
-- find the size of each cohort
cohort_size as (
  SELECT cohort_month, count(distinct o_custkey) as size
  FROM cohort_dates
  GROUP BY 1
)
-- for each cohort and month thereafter, find the number of active customers
SELECT
  cohort_dates.cohort_month,
  ceil(months_between(date_trunc('month', samples.tpch.orders.o_orderdate), cohort_dates.cohort_month)) as months,
  count(distinct samples.tpch.orders.o_custkey) as active,
  first(size) as size
FROM samples.tpch.orders
  left join cohort_dates on samples.tpch.orders.o_custkey = cohort_dates.o_custkey
  left join cohort_size on cohort_dates.cohort_month = cohort_size.cohort_month
WHERE datediff(date_trunc('month', samples.tpch.orders.o_orderdate), cohort_dates.cohort_month) != 0
GROUP BY 1, 2
ORDER BY 1, 2

Counter display

Counters display a single value prominently, with an option to compare them against a target value. To use counters, specify which row of data to display on the counter visualization for the Value Column and Target Column.

Note

Counter only supports aggregation for up to 64,000 rows. If a dataset is larger than 64,000 rows, data will be truncated.

Counter example

Configuration values: For this counter visualization, the following values were set:

  • Value column

    • Dataset column: avg(o_totalprice)

    • Row: 1

  • Target column:

    • Dataset column: avg(o_totalprice)

    • Row: 2

  • Format target value: Enable

SQL query: For this counter visualization, the following SQL query was used to generate the data set.

select o_orderdate, avg(o_totalprice)
from samples.tpch.orders
GROUP BY 1
ORDER BY 1 DESC

Funnel visualization

The funnel visualization helps analyze the change in a metric at different stages. To use the funnel, specify a step and a value column.

Note

Funnel only supports aggregation for up to 64,000 rows. If a dataset is larger than 64,000 rows, data will be truncated.

Funnel example

Configuration values: For this funnel visualization, the following values were set:

  • Step column (dataset column): o_orderstatus

  • Value column (dataset column): Revenue

SQL query: For this funnel visualization, the following SQL query was used to generate the data set.

SELECT o_orderstatus, sum(o_totalprice) as Revenue
FROM samples.tpch.orders
GROUP BY 1

Choropleth map visualization

In choropleth visualizations, geographic localities, such as countries or states, are colored according to the aggregate values of each key column. The query must return geographic locations by name.

Note

Choropleth visualizations do not do any aggregations of data within the result set. All aggregations must be computed within the query itself.

Map choropleth example

Configuration values: For this choropleth visualization, the following values were set:

  • Map (dataset column): Countries

  • Geographic column (dataset column): Nation

  • Geographic type: Short name

  • Value column (dataset column): revenue

  • Clustering mode: equidistant

Configuration options: For choropleth configuration options, see choropleth configuation options.

SQL query: For this choropleth visualization, the following SQL query was used to generate the data set.

SELECT
initcap(n_name) as Country,
sum(c_acctbal)
FROM samples.tpch.customer
join samples.tpch.nation where n_nationkey = c_nationkey
GROUP BY 1

Marker map visualization

In marker visualizations, a marker is placed at a set of coordinates on the map. The query result must return latitude and longitude pairs.

Note

Marker does not do any aggregations of data within the result set. All aggregations must be computed within the query itself.

Map marker example

This marker example is generated from a dataset that includes both latitude and longitude values - which are not available in the Databricks sample datasets. For choropleth configuration options, see marker configuration options.

Pivot table visualization

A pivot table visualization aggregates records from a query result into a new tabular display. It’s similar to PIVOT or GROUP BY statements in SQL. You configure the pivot table visualization with drag-and-drop fields.

Note

Pivot tables support backend aggregations, providing support for queries returning more than 64K rows of data without truncation of the result set. However, Pivot table (legacy) only support aggregation for up to 64,000 rows. If a dataset is larger than 64,000 rows, data will be truncated.

Pivot table example

Configuration values: For this pivot table visualization, the following values were set:

  • Select rows (dataset column): l_retkurnflag

  • Select columns (dataset column): l_shipmode

  • Cell

    • Dataset column: l_quantity

    • Aggregation type: Sum

SQL query: For this pivot table visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.lineitem

Sankey

A sankey diagram visualizes the flow from one set of values to another.

Note

Sankey visualizations do not do any aggregations of data within the result set. All aggregations must be computed within the query itself.

Sankey example

SQL query: For this Sankey visualization, the following SQL query was used to generate the data set.

SELECT pickup_zip as stage1, dropoff_zip as stage2, sum(fare_amount) as value
FROM samples.nyctaxi.trips
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 10

Sunburst sequence

A sunburst diagram helps visualize hierarchical data using concentric circles.

Note

Sunburst sequence does not do any aggregations of data within the result set. All aggregations must be computed within the query itself.

Sunburst example

SQL query: For this sunburst visualization, the following SQL query was used to generate the data set.

SELECT pickup_zip as stage1, dropoff_zip as stage2, sum(fare_amount) as value
FROM samples.nyctaxi.trips
GROUP BY 1, 2
ORDER BY 3 DESC
LIMIT 10

Table

The table visualization displays data in a standard table, but with the ability to manually reorder, hide, and format the data. See Table options.

Note

Table visualizations do not do any aggregations of data within the result set. All aggregations must be computed within the query itself.

For table configuration options, see table configuration options.

Word cloud

A word cloud visually represents the frequency a word occurs in the data.

Note

Word cloud only supports aggregation for up to 64,000 rows. If a dataset is larger than 64,000 rows, data will be truncated.

Word cloud example

Configuration values: For this word cloud visualization, the following values were set: test

  • Words column (dataset column): o_comment

  • Words Length Limit: 5

  • Frequencies limit: 2

SQL query: For this word cloud visualization, the following SQL query was used to generate the data set.

select * from samples.tpch.orders