Databricks visualizations that use X and Y axes are called charts. There are eight different types of charts. Because the types are similar, you can often switch seamlessly between them to find the one that best conveys your meaning. In the following animation, all eight types were built from the same result:
Your table should have at least two columns: one column of values for the X axis and one or more columns of values for the Y axis. It can also return values for trace grouping, displaying error bars, and bubble sizes.
The charts in the preceding animation were all produced from the following tabular result:
Start by setting your X and Y axis values. The visualization preview updates automatically; you don’t need to save the visualization to see how a change affects its appearance.
If the Y-Axis field is a numeric type, set the drop-down next to Y Axis to the type of aggregation to use: Sum (the default), Average, Count, Count Distinct, Max, Min, or Median. If the Y-Axis field is a string type, set the drop-down next to the desired aggregation: Count or Count Distinct.
Tabs in the visualization editor screen give you fine-grained control over the rest of chart.
Use the X Axis and Y Axis tabs to modify the axes ranges and labels.
Use the Series tab to change your data aliases, z-index behavior, assign traces between the left- and right- Y axes. It also lets you combine different trace forms on one chart, like in the following chart.
Use the Colors tab to change the appearance of the traces on your charts.
Use the Data Labels to configure what displays when you hover your mouse over a chart.
Use the Group by setting to generate multiple traces against the same X and Y axes. This setting groups records into distinct traces instead of drawing one line.
As shown in the following example, the grouping column is used to sort
(x,y) pairs together.
Use the Group by column for melted data sets. Use multiple Y-columns for pivoted data sets. It’s possible to use a Group by column in a visualization with multiple Y-columns.
You can “stack” Y axis values on top of one another. The name is borrowed from stacked bar charts, but it can be useful with area charts as well. The following image shows the same data, unstacked on the left and stacked on the right.
Each Y axis value is displayed as the sum of itself and the Y values “beneath” it.
Stacking and grouping are related. You won’t stack data unless you have also grouped it.
You can use the Series tab to control the order in which traces are stacked. Stacking is available only for line, bar, and area charts.
For certain chart types, you can add error bars around your data points by using values from your table.
Error bars are always symmetrical. The distance above and below a given
(x,y)pair is always the same.
Errors are the same color as their target trace.
Errors are shown for all traces or no traces. You cannot configure them to appear on some traces and not others.
The values in your errors column are charted on the same axis as their associated trace. This means your error values must be absolute. You cannot, for example, have errors expressed in percentages for Y values expressed in hundreds.
Errors are not aggregated when you stack records; an error bar will be shown for each trace. You can work around this by only providing non-zero error values for those records where the error should be displayed prominently. In the preceding example a flat error bar is shown at every trace point, but only the
Paid trace error bars have any length.
Each chart type is useful for certain kinds of presentation. You can mix and match multiple types on the same chart as needed:
Line: presents change in one or more metrics over time.
Bar: presents change in metrics over time or to show proportionality, like a pie chart. You can combine bar charts using stacking.
Combo: combines line and bar charts to present changes over time with proportionality.
Area: combines the line and bar chart to show how one or more groups’ numeric values change over the progression of a second variable, typically that of time. They are often used to show sales funnel changes through time. You can combine area charts using stacking.
Pie: shows proportionality between metrics. They are not meant for conveying time series data.
Scatter: excels at showing many groups of data points. A scatter chart is just like a line plot, but without the connecting lines. A scatter chart is more precise but less useful for time series data.
Scatter charts are necessary for visualizations where some groups appear just once. The line chart does not display singleton values because it can only show data where two or more points are present. One option is to force singletons into scatter type on the Series tab of the visualization editor while keeping other traces in line type.
Scatter charts do not support aggregation in the visualization.
Bubble: are scatter charts where the size of each point marker reflects a relevant metric.
Bubble charts do not support aggregation in the visualization.
Heatmap: blend features of bar charts, stacking, and bubble charts. You can choose from several built-in color schemes. A heatmap cannot be grouped since the entire chart is one trace.
Box: automatically show the distribution of data points across grouped categories.
Box charts do not support aggregation in the visualization.
Temporarily hide or show only a series
To hide a series in a visualization, click the series in the legend. To show the series again, click it again in the legend.
To show only a single series, double-click the series in the legend. To show other series, click each one.
Multiple records per X-axis value
Databricks SQL can make some obscure shapes if your query returns two or more rows with the same X axis value. This often happens in SQL if you unintentionally
JOIN a table with a one-to-many relationship.
In this example a vertical line is drawn because there are two records for 1 January. You can resolve this by filtering out the doubled entries on the X axis or revise the query to include the grouping field, as shown in the following image.
Unordered X-axis records
Databricks SQL automatically determines most common X axis scales: timestamps, linear, and logarithms. If it can’t parse your X column into an ordered series, it falls back to treating each X value as a “category”. This can have mixed results:
If you see shapes you don’t expect you can check whether your X axis has been sorted on the X Axis tab. Toggle the Sort Values option. If Sort Values is disabled, Databricks SQL retains the ordering of the source query.
These two charts come from the same base data. The only difference is whether or not Databricks SQL sorted the X axis values.