Using sparklyr in Databricks R Notebooks

library(sparklyr)

sc <- spark_connect(method = "databricks")

library(dplyr)

iris_tbl <- copy_to(sc, iris)

src_tbls(sc)

iris_tbl %>% count

# Change the default plot height 
options(repr.plot.height = 600)

iris_summary <- iris_tbl %>% 
  mutate(Sepal_Width = ROUND(Sepal_Width * 2) / 2) %>% # Bucketizing Sepal_Width
  group_by(Species, Sepal_Width) %>% 
  summarize(count = n(), Sepal_Length_Mean = mean(Sepal_Length), stdev = sd(Sepal_Length)) %>% collect

library(ggplot2)

ggplot(iris_summary, aes(Sepal_Width, Sepal_Length_Mean, color = Species)) + 
  geom_line(size = 1.2) +
  geom_errorbar(aes(ymin = Sepal_Length_Mean - stdev, ymax = Sepal_Length_Mean + stdev), width = 0.05) +
  geom_text(aes(label = count), vjust = -0.2, hjust = 1.2, color = "black") +
  theme(legend.position="top")

Using sparklyr in Databricks R Notebooks(R)

Use `sparklyr` in Databricks R notebooks

Load `sparklyr` package

Create a `sparklyr` connection

Use `sparklyr` and `dplyr` APIs

Aggregate and visualize data

Using sparklyr in Databricks R Notebooks(R)

Use sparklyr in Databricks R notebooks

Load sparklyr package

Create a sparklyr connection

Use sparklyr and dplyr APIs

Aggregate and visualize data

Use `sparklyr` in Databricks R notebooks

Load `sparklyr` package

Create a `sparklyr` connection

Use `sparklyr` and `dplyr` APIs