Databricks SDK for Go

In this article, you learn how to automate operations in Databricks accounts, workspaces, and related resources with the Databricks SDK for Go.

Experimental

The Databricks SDK for Go is in an Experimental state. To provide feedback, ask questions, and report issues, use the Issues tab in the Databricks SDK for Go repository in GitHub.

During the Experimental period, Databricks is actively working on stabilizing the Databricks SDK for Go’s interfaces. API clients for all services are generated from specification files that are synchronized from the Databricks platform. You are highly encouraged to pin the exact version in the go.mod file that you want to use and to read the CHANGELOG where Databricks documents the changes to each version. Some interfaces are more stable than others. For those interfaces that are not yet nightly tested, Databricks may have minor documented backward-incompatible changes, such as fixing mapping correctness from int to int64 or renaming the methods or some type names to bring more consistency.

Before you begin

Before you begin to use the Databricks SDK for Go, your development machine must have:

Get started with the Databricks SDK for Go

  1. On your development machine with Go already installed, an existing Go code project already created, and Databricks authentication configured, create a go.mod file to track your Go code’s dependencies by running the go mod init command, for example:

    go mod init sample
    
  2. Take a dependency on the Databricks SDK for Go package by running the go mod edit -require command:

    go mod edit -require github.com/databricks/databricks-sdk-go@v0.1.1
    

    Your go.mod file should now look like this:

    module sample
    
    go 1.18
    
    require github.com/databricks/databricks-sdk-go v0.1.1
    
  3. Within your project, create a Go code file that imports the Databricks SDK for Go. The following example, in a file named main.go with the following contents, simply lists all the clusters in your Databricks workspace:

    package main
    
    import (
      "context"
    
      "github.com/databricks/databricks-sdk-go"
      "github.com/databricks/databricks-sdk-go/service/clusters"
    )
    
    func main() {
      w := databricks.Must(databricks.NewWorkspaceClient())
      all, err := w.Clusters.ListAll(context.Background(), clusters.List{})
      if err != nil {
        panic(err)
      }
      for _, c := range all {
        println(c.ClusterName)
      }
    }
    
  4. Add any missing module dependencies by running the go mod tidy command:

    go mod tidy
    

    Note

    If you get the error go: warning: "all" matched no packages, you forgot to add a Go code file that imports the Databricks SDK for Go.

  5. Grab copies of all packages needed to support builds and tests of packages in your main module, by running the go mod vendor command:

    go mod vendor
    
  6. Set up your development machine for Databricks authentication.

  7. Run your Go code file, assuming a file named main.go, by running the go run command:

    go run main.go
    

    Note

    By not setting *databricks.Config as an argument in the preceding call to w := databricks.Must(databricks.NewWorkspaceClient()), the Databricks SDK for Go uses its default process for trying to perform Databricks authentication. To override this default behavior, see Authenticate the Databricks SDK for Go with your Databricks account or workspace.

Authenticate the Databricks SDK for Go with your Databricks account or workspace

To run Databricks automation commands within a Databricks account or workspace, you must first authenticate the Databricks SDK for Go with your Databricks account or workspace at run time. The Databricks SDK for Go attempts to authenticate by using each of the following authentication methods by default, in the following order, until it succeeds. If the Databricks SDK for Go cannot successfully authenticate with any of these methods, it returns the error panic: failed request: default auth: cannot configure default credentials and stops running. You can override this default behavior, as described in the following sections.

  1. Databricks personal access token authentication.

  2. Databricks basic (username and password) authentication.

For each authentication method, the Databricks SDK for Go looks for authentication credentials in the following locations by default, in the following order, until it finds a set that it can use.

  1. Hard-coded fields in *databricks.Config.

  2. Environment variables.

  3. Databricks configuration profiles.

The following sections describe each of these authentication methods and the locations for each method.

Token authentication

Be default, the Databricks SDK for Go first tries to authenticate with Databricks by using Databricks personal access token authentication.

You can explicitly instruct the Databricks SDK for Go to begin performing Databricks token authentication by setting Credentials to config.PatCredentials in *databricks.Config, for example:

w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
  Credentials: config.PatCredentials{}, // import "github.com/databricks/databricks-sdk-go/config"
}))

To perform Databricks personal access token authentication, you must provide the Databricks SDK for Go as follows with:

  • A Databricks host URL.

  • A Databricks personal access token.

  • A Databricks account ID, if you are calling Databricks account-level operations.

The Databricks SDK for Go first checks to see if the Host, Token, and possibly AccountID fields are set in *databricks.Config, for example:

w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
  Host:        "https://...",
  Token:       "dapi...",
  Credentials: config.PatCredentials{}, // import "github.com/databricks/databricks-sdk-go/config"
}))

Warning

Databricks strongly discourages hard-coding information such as tokens and account IDs into *databricks.Config, as this sensitive information can be exposed in plain text through version control systems. Databricks recommends that you use environment variables or Databricks configuration profiles that you set on your development machine instead.

The Databricks SDK for Go next checks to see if the following environment variables are set:

  • DATABRICKS_HOST

  • DATABRICKS_TOKEN

  • DATABRICKS_ACCOUNT_ID, if DATABRICKS_HOST is also set to https://accounts.cloud.databricks.com

Note

If the preceding environment variables are set, the Databricks SDK for Go might return the error panic: validate: more than one authorization method configured and stop running, if the following are also present on your development machine:

  • A Databricks configuration profile named DEFAULT in your .databrickscfg file.

  • The environment variables DATABRICKS_USERNAME and DATABRICKS_PASSWORD.

To address this error, either remove the DEFAULT profile or the environment variables DATABRICKS_USERNAME and DATABRICKS_PASSWORD from your development machine, and then try running your code again.

If the preceding environment variables are not set, the Databricks SDK for Go next checks to see if a Databricks configuration profile named DEFAULT is set in your .databrickscfg file. This profile must contain the following fields:

  • host

  • token

  • account_id, if host is also set to https://accounts.cloud.databricks.com

You can provide an alternate profile name, for example:

w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
  Profile:     "<alternate-profile-name>",
  Credentials: config.PatCredentials{}, // import "github.com/databricks/databricks-sdk-go/config"
}))

Note

If a Databricks configuration profile named DEFAULT is set in your .databrickscfg file, the Databricks SDK for Go might return the error panic: validate: more than one authorization method configured and stop running, if the following are also present on your development machine:

  • The environment variables DATABRICKS_USERNAME and DATABRICKS_PASSWORD.

To address this error, either remove the DEFAULT profile or the environment variables DATABRICKS_USERNAME and DATABRICKS_PASSWORD from your development machine, and try running your code again.

If the Databricks SDK for Go cannot successfully authenticate through Databricks personal access token authentication, by default it moves on to try authenticating with Databricks basic (username and password) authentication.

Databricks basic (username and password) authentication

To perform basic authentication, you must provide the Databricks SDK for Go as follows with:

  • A Databricks host URL.

  • A Databricks username and password

  • A Databricks account ID, if you are calling Databricks account-level operations.

You can instruct the Databricks SDK for Go to begin performing basic authentication by setting Credentials to config.BasicCredentials in *databricks.Config, for example:

w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
  Credentials: config.BasicCredentials{}, // import "github.com/databricks/databricks-sdk-go/config"
}))

The Databricks SDK for Go first checks to see if the Host, Username, and Password fields, and possibly also the AccountID field, are set in *databricks.Config.

Warning

Databricks strongly discourages hard-coding information such as usernames, passwords, and account IDs into *databricks.Config, as this sensitive information can be exposed in plain text through version control systems. Databricks recommends that you use environment variables or Databricks configuration profiles that you set on your development machine instead.

The Databricks SDK for Go next checks to see if the following environment variables are set:

  • DATABRICKS_HOST

  • DATABRICKS_USERNAME

  • DATABRICKS_PASSWORD

  • DATABRICKS_ACCOUNT_ID, if DATABRICKS_HOST is also set to https://accounts.cloud.databricks.com

Note

If the preceding environment variables are set, the Databricks SDK for Go might return the error panic: validate: more than one authorization method configured and stop running, if the following are also present on your development machine:

  • A Databricks configuration profile named DEFAULT in your .databrickscfg file.

  • The environment variable DATABRICKS_TOKEN.

To address this error, either remove the DEFAULT profile or the environment variable DATABRICKS_TOKEN from your development machine, and then try running your code again.

If the preceding environment variables are not set, the Databricks SDK for Go next checks to see if a Databricks configuration profile named DEFAULT profile is set in your .databrickscfg file. This profile must contain the following fields:

  • host

  • username

  • password

  • account_id, if host is also set to https://accounts.cloud.databricks.com

You can provide an alternate profile name, for example:

w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
  Profile:     "<alternate-profile-name>",
  Credentials: config.BasicCredentials{}, // import "github.com/databricks/databricks-sdk-go/config"
}))

Note

If a Databricks configuration profile named DEFAULT is set in your .databrickscfg file the Databricks SDK for Go might return the error panic: validate: more than one authorization method configured and stop running, if the following is also present on your development machine:

  • The environment variable DATABRICKS_TOKEN.

To address this error, either remove the DEFAULT profile or the environment variable DATABRICKS_TOKEN from your development machine, and then try running your code again.

Examples

The following code examples demonstrate how to use the Databricks SDK for Go to create and delete clusters and to run jobs. These code examples use the Databricks SDK for Go’s default Databricks authentication process.

For additional code examples, see the examples folder in the Databricks SDK for Go repository in GitHub.

Create a cluster

This code example creates a cluster with the latest available Databricks Runtime Long Term Support (LTS) version and the smallest available cluster node type with a local disk. This cluster has one worker, and the cluster will automatically terminate after 15 minutes of idle time. The CreateAndWait method call causes the code to pause until the new cluster is running in the workspace.

package main

import (
  "context"
  "fmt"

  "github.com/databricks/databricks-sdk-go/service/clusters"
  "github.com/databricks/databricks-sdk-go"
)

func main() {
  const clusterName            = "my-cluster"
  const autoTerminationMinutes = 15
  const numWorkers             = 1

  w   := databricks.Must(databricks.NewWorkspaceClient())
  ctx := context.Background()

  // Get the full list of available Spark versions to choose from.
  sparkVersions, err := w.Clusters.SparkVersions(ctx)

  if err != nil {
    panic(err)
  }

  // Choose the latest Long Term Support (LTS) version.
  latestLTS, err := sparkVersions.Select(clusters.SparkVersionRequest{
    Latest:          true,
    LongTermSupport: true,
  })

  if err != nil {
    panic(err)
  }

  // Get the list of available cluster node types to choose from.
  nodeTypes, err := w.Clusters.ListNodeTypes(ctx)

  if err != nil {
    panic(err)
  }

  // Choose the smallest available cluster node type.
  smallestWithLocalDisk, err := nodeTypes.Smallest(clusters.NodeTypeRequest{
    LocalDisk: true,
  })

  if err != nil {
    panic(err)
  }

  fmt.Println("Now attempting to create the cluster, please wait...")

  runningCluster, err := w.Clusters.CreateAndWait(ctx, clusters.CreateCluster{
    ClusterName:            clusterName,
    SparkVersion:           latestLTS,
    NodeTypeId:             smallestWithLocalDisk,
    AutoterminationMinutes: autoTerminationMinutes,
    NumWorkers:             numWorkers,
  })

  if err != nil {
    panic(err)
  }

  switch runningCluster.State {
  case clusters.StateRunning:
    fmt.Printf("The cluster is now ready at %s#setting/clusters/%s/configuration\n",
      w.Config.Host,
      runningCluster.ClusterId,
    )
  default:
    fmt.Printf("Cluster is not running or failed to create. %s", runningCluster.StateMessage)
  }

  // Output:
  //
  // Now attempting to create the cluster, please wait...
  // The cluster is now ready at <workspace-host>#setting/clusters/<cluster-id>/configuration
}

Permanently delete a cluster

This code example permanently deletes the cluster with the specified cluster ID from the workspace.

package main

import (
  "context"

  "github.com/databricks/databricks-sdk-go/service/clusters"
  "github.com/databricks/databricks-sdk-go"
)

func main() {
  // Replace with your cluster's ID.
  const clusterId = "1234-567890-ab123cd4"

  w   := databricks.Must(databricks.NewWorkspaceClient())
  ctx := context.Background()

  err := w.Clusters.PermanentDelete(ctx, clusters.PermanentDeleteCluster{
    ClusterId: clusterId,
  })

  if err != nil {
    panic(err)
  }
}

Run a job

This code example creates a Databricks job that runs the specified notebook on the specified cluster. As the code runs, it gets the existing notebook’s path, the existing cluster ID, and related job settings from the user at the terminal. The RunNowAndWait method call causes the code to pause until the new job has finished running in the workspace.

package main

import (
  "bufio"
  "context"
  "fmt"
  "os"
  "strings"

  "github.com/databricks/databricks-sdk-go/service/jobs"
  "github.com/databricks/databricks-sdk-go"
)

func main() {
  w   := databricks.Must(databricks.NewWorkspaceClient())
  ctx := context.Background()

  nt := jobs.NotebookTask{
    NotebookPath: askFor("Workspace path of the notebook to run:"),
  }

  jobToRun, err := w.Jobs.Create(ctx, jobs.CreateJob{
    Name: askFor("Some short name for the job:"),
    Tasks: []jobs.JobTaskSettings{
      {
        Description:       askFor("Some short description for the job:"),
        TaskKey:           askFor("Some key to apply to the job's tasks:"),
        ExistingClusterId: askFor("ID of the existing cluster in the workspace to run the job on:"),
        NotebookTask:      &nt,
      },
    },
  })

  if err != nil {
    panic(err)
  }

  fmt.Printf("Now attempting to run the job at %s/#job/%d, please wait...\n",
    w.Config.Host,
    jobToRun.JobId,
  )

  runningJob, err := w.Jobs.RunNowAndWait(ctx, jobs.RunNow{
    JobId: jobToRun.JobId,
  })

  if err != nil {
    panic(err)
  }

  fmt.Printf("View the job run results at %s/#job/%d/run/%d\n",
    w.Config.Host,
    runningJob.JobId,
    runningJob.RunId,
  )

  // Output:
  //
  // Now attempting to run the job at <workspace-host>/#job/<job-id>, please wait...
  // View the job run results at <workspace-host>/#job/<job-id>/run/<run-id>
}

// Get job settings from the user.
func askFor(prompt string) string {
  var s string
  r := bufio.NewReader(os.Stdin)
  for {
    fmt.Fprint(os.Stdout, prompt+" ")
    s, _ = r.ReadString('\n')
    if s != "" {
      break
    }
  }
  return strings.TrimSpace(s)
}

Additional resources

For more information, see: