Visualizations

Databricks supports a number visualizations out of the box. All notebooks, regardless of their language, support Databricks visualizations. Additionally, all Databricks programming language notebooks (python, scala, R) support using interactive HTML graphics using javascript libraries like D3. To use this, you can pass any HTML, CSS, or JavaScript code to the displayHTML() function to render its results. See HTML, D3 & SVG for more information.

Visualizations in Python

To create a visualization in python you simply need to call display(onYourDataFrame) to create a visualization. It will start off as a table but clicking the chart button will allow you to manipulate it. You’ll find an example below.

from pyspark.sql import Row

array = [Row(key="a", group="vowels", value=1),
         Row(key="b", group="consonants", value=2),
         Row(key="c", group="consonants", value=3),
         Row(key="d", group="consonants", value=4),
         Row(key="e", group="vowels", value=5)]
dataframe = sqlContext.createDataFrame(sc.parallelize(array))

display(dataframe)

If you register a DataFrame as a Databases and Tables then you can also query it with SQL to create Visualizations in SQL.

You can also display matplotlib and ggplot figures from python in Databricks. Please see Matplotlib and ggplot for a demonstration of this.

Visualizations in R

In addition to the Databricks visualizations, R notebooks can use any R visualization package. The R notebook will capture the resulting plot as a .png and display it inline.

Here’s an example of the default library:

fit <- lm(Petal.Length ~., data = iris)
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
plot(fit)

Using ggplot:

library(ggplot2)
ggplot(diamonds, aes(carat, price, color = color, group = 1)) + geom_point(alpha = 0.3) + stat_smooth()

Using Lattice:

library(lattice)
xyplot(price ~ carat | cut, diamonds, scales = list(log = TRUE), type = c("p", "g", "smooth"), ylab = "Log price")

You can also install and use other plotting libraries as you wish.

install.packages("DandEFA", repos = "http://cran.us.r-project.org")
library(DandEFA)
data(timss2011)
timss2011 <- na.omit(timss2011)
dandpal <- rev(rainbow(100, start = 0, end = 0.2))
facl <- factload(timss2011,nfac=5,method="prax",cormeth="spearman")
dandelion(facl,bound=0,mcex=c(1,1.2),palet=dandpal)
facl <- factload(timss2011,nfac=8,method="mle",cormeth="pearson")
dandelion(facl,bound=0,mcex=c(1,1.2),palet=dandpal)

Visualizations in Scala

The easiest way to perform plotting in scala is to use the built-in Databricks visualization modules and the display command. For example:

case class MyCaseClass(key: String, group: String, value: Int)
val dataframe = sc.parallelize(Array(MyCaseClass("f", "consonants", 1),
       MyCaseClass("g", "consonants", 2),
       MyCaseClass("h", "consonants", 3),
       MyCaseClass("i", "vowels", 4),
       MyCaseClass("j", "consonants", 5))
).toDS()

display(dataframe)

If you register a DataFrame as a Databases and Tables then you can also query it with SQL to create Visualizations in SQL.

Visualizations in SQL

Execute the SQL you would like to visualize and Databricks will automatically extract some of the data and display it for you. From there you are able to select relevant columns in order to create different styles of visualizations.

For example after creating the above DataFrame in Scala. You could register it as a temporary table.

// in Spark > 2.X
dataframe.createOrReplaceTempView("someTableName")
// in Spark < 2.X
dataframe.registerTempTable("someTableName")

Then query that DataFrame with SQL

SELECT * FROM someTableName

This will allow you to use the built in Databricks visualizations to see this data.