Shiny on hosted RStudio Server

Shiny is an R package, available on CRAN, used to build interactive R applications and dashboards. You can use Shiny inside RStudio Server hosted on Databricks clusters. You can also develop, host, and share Shiny applications directly from a Databricks notebook. See Share Shiny app URL.

To get started with Shiny, see the Shiny tutorials.

This article describes how to run Shiny applications on RStudio on Databricks and use Apache Spark inside Shiny applications.

Requirements

Important

With RStudio Server Pro, you must disable proxied authentication. Make sure auth-proxy=1 is not present inside /etc/rstudio/rserver.conf.

Get Started with Shiny

  1. Open RStudio on Databricks.

  2. In RStudio, import the Shiny package and run the example app 01_hello as follows:

    > library(shiny)
    > runExample("01_hello")
    
    Listening on http://127.0.0.1:3203
    

    A new window appears, displaying the Shiny application.

    First Shiny app

Run a Shiny app from an R script

To run a Shiny app from an R script, open the R script in the RStudio editor and click the Run App button on the top right.

Shiny run App

Use Apache Spark inside Shiny apps

You can use Apache Spark when developing Shiny applications on Databricks. You can interact with Spark using both SparkR and sparklyr. You need at least one worker to launch Spark tasks.

The following example uses SparkR to launch Spark jobs. The example uses the ggplot2 diamonds dataset to plot the price of diamonds by carat. The carat range can be changed using the slider at the top of the application, and the range of the plot’s x-axis would change accordingly.

library(SparkR)
library(sparklyr)
library(dplyr)
library(ggplot2)
sparkR.session()

sc <- spark_connect(method = "databricks")
diamonds_tbl <- spark_read_csv(sc, path = "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv")

# Define the UI
ui <- fluidPage(
  sliderInput("carat", "Select Carat Range:",
              min = 0, max = 5, value = c(0, 5), step = 0.01),
  plotOutput('plot')
)

# Define the server code
server <- function(input, output) {
  output$plot <- renderPlot({
    # Select diamonds in carat range
    df <- diamonds_tbl %>%
      dplyr::select("carat", "price") %>%
      dplyr::filter(carat >= !!input$carat[[1]], carat <= !!input$carat[[2]])

    # Scatter plot with smoothed means
    ggplot(df, aes(carat, price)) +
      geom_point(alpha = 1/2) +
      geom_smooth() +
      scale_size_area(max_size = 2) +
      ggtitle("Price vs. Carat")
  })
}

# Return a Shiny app object

shinyApp(ui = ui, server = server)
Spark Shiny app

Frequently asked questions (FAQ)

How do I install Shiny on Databricks Runtime 6.4 Extended Support and Databricks Runtime 5.5 LTS?

Install the Shiny package as a Databricks library on the cluster. Using install.packages(‘shiny’) in the RStudio console or using the RStudio package manager may not work.

Why is my Shiny app grayed out after some time?

If there is no interaction with the Shiny app, the connection to the app closes after about 10 minutes.

To reconnect, refresh the Shiny app page. The dashboard state resets.

Why does my Shiny viewer window disappear after a while?

If the Shiny viewer window disappears after idling for several minutes, it is due to the same timeout as the “gray out” scenario.

My app crashes immediately after launching, but the code appears to be correct. What’s going on?

There is a 20 MB limit on the total amount of data that can be displayed in a Shiny app on Databricks. If the application’s total data size exceeds this limit, it will crash immediately after launching. To avoid this, Databricks recommends reducing the data size, for example by downsampling the displayed data or reducing the resolution of images.

Why do long Spark jobs never return?

This is also because of the idle timeout. Any Spark job running for longer than the previously mentioned timeouts is not able to render its result because the connection closes before the job returns.

How can I avoid the timeout?

  • There is a workaround suggested in this issue thread. The workaround sends heartbeats to keep the websocket alive when the app is idle. However, if the app is blocked by a long running computation, this workaround does not work.

  • Shiny does not support long running tasks. A Shiny blog post recommends using promises and futures to run long tasks asynchronously and keep the app unblocked. Here is an example that uses heartbeats to keep the Shiny app alive, and runs a long running Spark job in a future construct.

    # Write an app that uses spark to access data on Databricks
    # First, install the following packages:
    install.packages(future)
    install.packages(promises)
    
    library(shiny)
    library(promises)
    library(future)
    plan(multisession)
    
    HEARTBEAT_INTERVAL_MILLIS = 1000  # 1 second
    
    # Define the long Spark job here
    run_spark <- function(x) {
      # Environment setting
      library("SparkR", lib.loc = "/databricks/spark/R/lib")
      sparkR.session()
    
      irisDF <- createDataFrame(iris)
      collect(irisDF)
      Sys.sleep(3)
      x + 1
    }
    
    run_spark_sparklyr <- function(x) {
      # Environment setting
      library(sparklyr)
      library(dplyr)
      library("SparkR", lib.loc = "/databricks/spark/R/lib")
      sparkR.session()
      sc <- spark_connect(method = "databricks")
    
      iris_tbl <- copy_to(sc, iris, overwrite = TRUE)
      collect(iris_tbl)
      x + 1
    }
    
    ui <- fluidPage(
      sidebarLayout(
        # Display heartbeat
        sidebarPanel(textOutput("keep_alive")),
    
        # Display the Input and Output of the Spark job
        mainPanel(
          numericInput('num', label = 'Input', value = 1),
          actionButton('submit', 'Submit'),
          textOutput('value')
        )
      )
    )
    server <- function(input, output) {
      #### Heartbeat ####
      # Define reactive variable
      cnt <- reactiveVal(0)
      # Define time dependent trigger
      autoInvalidate <- reactiveTimer(HEARTBEAT_INTERVAL_MILLIS)
      # Time dependent change of variable
      observeEvent(autoInvalidate(), {  cnt(cnt() + 1)  })
      # Render print
      output$keep_alive <- renderPrint(cnt())
    
      #### Spark job ####
      result <- reactiveVal() # the result of the spark job
      busy <- reactiveVal(0)  # whether the spark job is running
      # Launch a spark job in a future when actionButton is clicked
      observeEvent(input$submit, {
        if (busy() != 0) {
          showNotification("Already running Spark job...")
          return(NULL)
        }
        showNotification("Launching a new Spark job...")
        # input$num must be read outside the future
        input_x <- input$num
        fut <- future({ run_spark(input_x) }) %...>% result()
        # Or: fut <- future({ run_spark_sparklyr(input_x) }) %...>% result()
        busy(1)
        # Catch exceptions and notify the user
        fut <- catch(fut, function(e) {
          result(NULL)
          cat(e$message)
          showNotification(e$message)
        })
        fut <- finally(fut, function() { busy(0) })
        # Return something other than the promise so shiny remains responsive
        NULL
      })
      # When the spark job returns, render the value
      output$value <- renderPrint(result())
    }
    shinyApp(ui = ui, server = server)
    

How can I develop a Shiny application that can be published to a Shiny server and access data on Databricks?

While you can access data naturally using SparkR or sparklyr during development and testing on Databricks, after a Shiny application is published to a stand-alone hosting service, it cannot directly access the data and tables on Databricks.

To enable your application to function outside Databricks, you must rewrite how you access data. There are a few options:

Databricks recommends that you work with your Databricks solutions team to find the best approach for your existing data and analytics architecture.

How can I save the Shiny applications that I develop on Databricks?

You can either save your application code on DBFS through the FUSE mount or check your code into version control.

Can I develop a Shiny application inside a Databricks notebook?

Yes, you can develop a Shiny application inside a Databricks notebook. For more details see Use Shiny inside Databricks notebooks.