Visualizations

Databricks supports a number of visualizations out of the box. All notebooks, regardless of their language, support Databricks visualization using the display function. The display function includes support for visualizing image data types.

Additionally, all Databricks programming language notebooks (Python, Scala, R) support interactive HTML graphics using JavaScript libraries such as D3; you can pass any HTML, CSS, or JavaScript code to the displayHTML function to render its results. See Embed static images in notebooks and HTML, D3, and SVG in Notebooks for more information.

display function

The easiest way to create a visualization in Databricks is to call display(<dataframe-name>). For example, if you have a DataFrame diamonds_color of a diamonds dataset, grouped by diamond color and compute the average price, and you call

display(diamonds_color)

A table of diamond color versus average price displays.

../../_images/diamonds-table.png

Click the bar chart icon Chart Button to display a chart of the same information:

../../_images/diamonds-bar-chart.png

Note

If you see OK with no rendering after calling the display function, mostly likely the DataFrame or collection you passed in is empty.

You can click the down arrow next to the bar chart Chart Button to choose another chart type and click Plot Options... to configure the chart.

../../_images/display-charts.png

If you register a DataFrame as a table, you can also query it with SQL to create Visualizations in SQL.

display function for image types

display renders columns containing image data types as rich HTML.

For clusters running Databricks Runtime 4.1 and above, display attempts to render image thumbnails for DataFrame columns matching Spark’s schema for images. Thumbnail rendering works for any images successfully read in through Spark’s readImages function. For image values generated through other means, Databricks supports the rendering of 1, 3, or 4 channel images (where each channel consists of a single byte), with the following constraints:

  • One-channel images: mode field must be equal to 0. height, width, and nChannels fields must accurately describe the binary image data in the data field
  • Three-channel images: mode field must be equal to 16. height, width, and nChannels fields must accurately describe the binary image data in the data field. The data field must contain pixel data in three-byte chunks, with the channel ordering (blue, green, red) for each pixel.
  • Four-channel images: mode field must be equal to 24. height, width, and nChannels fields must accurately describe the binary image data in the data field. The data field must contain pixel data in four-byte chunks, with the channel ordering (blue, green, red, alpha) for each pixel.

Visualizations in Python

To create a visualization in Python, call display(<dataframe-name>).

dataPath = "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv"
diamonds = spark.read.format("csv").option("header","true").option("inferSchema", "true").load(dataPath) # Read diamonds dataset and crate DataFrame

diamonds-color = diamonds.groupBy("color").avg("price") # Group by color
display(diamonds-color)

You can also display matplotlib and ggplot figures in Databricks. For a demonstration, see Matplotlib and ggplot in Python Notebooks.

Visualizations in R

In addition to the Databricks visualizations, R notebooks can use any R visualization package. The R notebook will capture the resulting plot as a .png and display it inline.

Here’s an example of the default library:

fit <- lm(Petal.Length ~., data = iris)
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(fit)

Using ggplot:

library(ggplot2)
ggplot(diamonds, aes(carat, price, color = color, group = 1)) + geom_point(alpha = 0.3) + stat_smooth()

Using Lattice:

library(lattice)
xyplot(price ~ carat | cut, diamonds, scales = list(log = TRUE), type = c("p", "g", "smooth"), ylab = "Log price")

You can also install and use other plotting libraries.

install.packages("DandEFA", repos = "http://cran.us.r-project.org")
library(DandEFA)
data(timss2011)
timss2011 <- na.omit(timss2011)
dandpal <- rev(rainbow(100, start = 0, end = 0.2))
facl <- factload(timss2011,nfac=5,method="prax",cormeth="spearman")
dandelion(facl,bound=0,mcex=c(1,1.2),palet=dandpal)
facl <- factload(timss2011,nfac=8,method="mle",cormeth="pearson")
dandelion(facl,bound=0,mcex=c(1,1.2),palet=dandpal)

Visualizations in Scala

The easiest way to perform plotting in Scala is to use the built-in Databricks visualization modules and the display method. For example:

case class MyCaseClass(key: String, group: String, value: Int)
val dataframe = sc.parallelize(Array(MyCaseClass("f", "consonants", 1),
       MyCaseClass("g", "consonants", 2),
       MyCaseClass("h", "consonants", 3),
       MyCaseClass("i", "vowels", 4),
       MyCaseClass("j", "consonants", 5))
).toDS()

display(dataframe)

Visualizations in SQL

When you execute SQL you would like to visualize, Databricks automatically extracts some of the data and displays it as a table.

For example, after creating a DataFrame in Scala, you could register it as a temporary table:

diamonds.createOrReplaceTempView("diamonds_table")

Then, query that DataFrame with SQL.

select color, price from diamonds_table

Databricks automatically displays the color and price columns in a table. From there you can select different styles of visualization.