Databricks SDK for R

Note

This article covers the Databricks SDK for R by Databricks Labs, which is in an Experimental state. To provide feedback, ask questions, and report issues, use the Issues tab in the Databricks SDK for R repository in GitHub.

In this article, you learn how to automate operations in Databricks workspaces and related resources with the Databricks SDK for R. This article supplements the Databricks SDK for R documentation.

Note

The Databricks SDK for R does not support the automation of operations in Databricks accounts. To call account-level operations, use a different Databricks SDK, for example:

Before you begin

Before you begin to use the Databricks SDK for R, your development machine must have:

  • A Databricks personal access token for the target Databricks workspace that you want to automate.

    Note

    The Databricks SDK for R supports Databricks personal access token authentication only.

  • R, and optionally an R-compatible integrated development environment (IDE). Databricks recommends RStudio Desktop and uses it in this article’s instructions.

Get started with the Databricks SDK for R

  1. Make your Databricks workspace URL and personal access token available to your R project’s scripts. For example, you can add the following to an R project’s .Renviron file. Replace <your-workspace-url> with your workspace instance URL, for example https://dbc-a1b2345c-d6e7.cloud.databricks.com. Replace <your-personal-access-token> with your Databricks personal access token, for example dapi12345678901234567890123456789012.

    DATABRICKS_HOST=<your-workspace-url>
    DATABRICKS_TOKEN=<your-personal-access-token>
    

    For additional ways to provide your Databricks workspace URL and personal access token, see Authentication in the Databricks SDK for R repository in GitHub.

    Important

    Do not add .Renviron files to version control systems, as this risks exposing sensitive information such as Databricks personal access tokens.

  2. Install the Databricks SDK for R package. For example, in RStudio Desktop, in the Console view (View > Move Focus to Console), run the following commands, one at a time:

    install.packages("devtools")
    library(devtools)
    install_github("databrickslabs/databricks-sdk-r")
    

    Note

    The Databricks SDK for R package is not available on CRAN.

  3. Add code to reference the Databricks SDK for R and to list all of the clusters in your Databricks workspace. For example, in a project’s main.r file, the code might be as follows:

    require(databricks)
    
    client <- DatabricksClient()
    
    clustersList(client)[, "cluster_name"]
    
  4. Run your script. For example, in RStuidio Desktop, in the script editor with the a project’s main.r file active, click Source > Source or Source with Echo.

  5. The list of clusters appears. For example, in RStudio Desktop, this is in the Console view.

Code examples

The following code examples demonstrate how to use the Databricks SDK for R to create and delete clusters, and create jobs.

Create a cluster

This code example creates a cluster with the specified Databricks Runtime version and cluster node type. This cluster has one worker, and the cluster automatically terminates after 15 minutes of idle time.

require(databricks)

client <- DatabricksClient()

response <- clustersCreate(
  client = client,
  cluster_name = "my-cluster",
  spark_version = "12.2.x-scala2.12",
  node_type_id = "i3.xlarge",
  autotermination_minutes = 15,
  num_workers = 1
)

# Get the workspace URL to be used in the following results message.
get_client_debug <- strsplit(client$debug_string(), split = "host=")
get_host <- strsplit(get_client_debug[[1]][2], split = ",")
host <- get_host[[1]][1]

# Make sure the workspace URL ends with a forward slash.
if (endsWith(host, "/")) {
} else {
  host <- paste(host, "/", sep = "")
}

print(paste(
  "View the cluster at ",
  host,
  "#setting/clusters/",
  response$cluster_id,
  "/configuration",
  sep = "")
)

Permanently delete a cluster

This code example permanently deletes the cluster with the specified cluster ID from the workspace.

require(databricks)

client <- DatabricksClient()

cluster_id <- readline("ID of the cluster to delete (for example, 1234-567890-ab123cd4):")

clustersPermanentDelete(client, cluster_id)

Create a job

This code example creates a Databricks job that can be used to run the specified notebook on the specified cluster. As this code runs, it gets the existing notebook’s path, the existing cluster ID, and related job settings from the user at the console.

require(databricks)

client <- DatabricksClient()

job_name <- readline("Some short name for the job (for example, my-job):")
description <- readline("Some short description for the job (for example, My job):")
existing_cluster_id <- readline("ID of the existing cluster in the workspace to run the job on (for example, 1234-567890-ab123cd4):")
notebook_path <- readline("Workspace path of the notebook to run (for example, /Users/someone@example.com/my-notebook):")
task_key <- readline("Some key to apply to the job's tasks (for example, my-key):")

print("Attempting to create the job. Please wait...")

notebook_task <- list(
  notebook_path = notebook_path,
  source = "WORKSPACE"
)

job_task <- list(
  task_key = task_key,
  description = description,
  existing_cluster_id = existing_cluster_id,
  notebook_task = notebook_task
)

response <- jobsCreate(
  client,
  name = job_name,
  tasks = list(job_task)
)

# Get the workspace URL to be used in the following results message.
get_client_debug <- strsplit(client$debug_string(), split = "host=")
get_host <- strsplit(get_client_debug[[1]][2], split = ",")
host <- get_host[[1]][1]

# Make sure the workspace URL ends with a forward slash.
if (endsWith(host, "/")) {
} else {
  host <- paste(host, "/", sep = "")
}

print(paste(
  "View the job at ",
  host,
  "#job/",
  response$job_id,
  sep = "")
)