Databricks SDK for Java

Experimental

The Databricks SDK for Java is in an Experimental state. To provide feedback, ask questions, and report issues, use the Issues tab in the Databricks SDK for Java repository in GitHub.

During the Experimental period, Databricks is actively working on stabilizing the Databricks SDK for Java’s interfaces. API clients for all services are generated from specification files that are synchronized from the main platform. You are highly encouraged to install a specific version of the Databricks SDK for Python package and read the changelog where Databricks documents the changes. Databricks may have minor documented backward-incompatible changes, such as renaming the methods or some class names to bring more consistency.

In this article, you learn how to automate operations in Databricks accounts, workspaces, and related resources with the Databricks SDK for Java.

Before you begin

Before you begin to use the Databricks SDK for Java, your development machine must have:

  • Databricks authentication configured.

  • A Java Development Kit (JDK) that is compatible with Java 8 or higher. Continuous integration (CI) testing with the Databricks SDK for Java is compatible with Java versions 8, 11, 17, and 20.

  • A Java-compatible integrated development environment (IDE). Databricks recommends IntelliJ IDEA.

Get started with the Databricks SDK for Java

  1. In your project’s pom.xml file, instruct your build system to take a dependency on the Databricks SDK for Java. To do this, add the following <dependency> to the pom.xml file’s existing <dependencies> section. If the <dependencies> section does not already exist within the pom.xml file, you must also add the <dependencies> parent element to the pom.xml file.

    For example, to open your project’s pom.xml file in IntelliJ IDEA, click View > Tool Windows > Project, and then double-click to open <your-project-name> > src > pom.xml.

    <dependencies>
      <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>databricks-sdk-java</artifactId>
        <version>0.0.1</version>
      </dependency>
    </dependencies>
    

    Note

    Be sure to replace 0.0.1 with the latest version of the Databricks SDK for Java. You can find the latest version in the Maven central repository.

  2. Instruct your project to take the declared dependency on the Databricks SDK for Java. For example, in IntelliJ IDEA, in your project’s Project tool window, click your project’s root node, and then click Maven > Reload Project.

  3. Add code to import the Databricks SDK for Java and to list all of the clusters in your Databricks workspace. For example, in a project’s Main.java file, the code might be as follows:

    import com.databricks.sdk.WorkspaceClient;
    import com.databricks.sdk.service.compute.ClusterInfo;
    import com.databricks.sdk.service.compute.ListClustersRequest;
    
    public class Main {
      public static void main(String[] args) {
        WorkspaceClient w = new WorkspaceClient();
    
        for (ClusterInfo c : w.clusters().list(new ListClustersRequest())) {
          System.out.println(c.getClusterName());
        }
      }
    }
    

    Note

    By not setting any arguments in the preceding call to WorkspaceClient w = new WorkspaceClient(), the Databricks SDK for Java uses its default process for trying to perform Databricks authentication. To override this default behavior, see the following authentication section.

  4. Build your project. For example, to do this in IntelliJ IDEA, from the main menu, click Build > Build Project.

  5. Run your main file. For example, to do this in IntelliJ IDEA for a project’s Main.java file, from the main menu, click Run > Run ‘Main’.

  6. The list of clusters appears. For example, in IntelliJ IDEA, this is in the Run tool window. To display this tool window, from the main menu, click View > Tool Windows > Run.

Authenticate the Databricks SDK for Java with your Databricks account or workspace

The Databricks SDK for Java implements the Databricks client unified authentication standard, a consolidated and consistent architectural and programmatic approach to authentication. This approach helps make setting up and automating authentication with Databricks more centralized and predictable. It enables you to configure Databricks authentication once and then use that configuration across multiple Databricks tools and SDKs without further authentication configuration changes. For more information, including more complete code examples in Java, see Databricks client unified authentication.

Some of the available coding patterns to initialize Databricks authentication with the Databricks SDK for Java include:

  • Use Databricks default authentication by doing one of the following:

    • Create or identify a custom Databricks configuration profile with the required fields for the target Databricks authentication type. Then set the DATABRICKS_CONFIG_PROFILE environment variable to the name of the custom configuration profile.

    • Set the required environment variables for the target Databricks authentication type.

    import com.databricks.sdk.WorkspaceClient;
    // ...
    WorkspaceClient w = new WorkspaceClient();
    // ...
    
  • Hard-coding the required fields is supported but not recommended, as it risks exposing sensitive information in your code, such as Databricks personal access tokens. The following example hard-codes Databricks host and access token values for Databricks token authentication:

    import com.databricks.sdk.WorkspaceClient;
    import com.databricks.sdk.core.DatabricksConfig;
    // ...
    DatabricksConfig cfg = new DatabricksConfig()
      .setHost("https://...")
      .setToken("...");
    WorkspaceClient w = new WorkspaceClient(cfg);
    // ...
    

Code examples

The following code examples demonstrate how to use the Databricks SDK for Java to create and delete clusters, create jobs, and list account-level groups. These code examples use the Databricks SDK for Java’s default Databricks authentication process.

For additional code examples, see the examples folder in the Databricks SDK for Java repository in GitHub.

Create a cluster

This code example creates a cluster with the specified Databricks Runtime version and cluster node type. This cluster has one worker, and the cluster will automatically terminate after 15 minutes of idle time.

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.service.compute.CreateCluster;
import com.databricks.sdk.service.compute.CreateClusterResponse;

public class Main {
  public static void main(String[] args) {
    WorkspaceClient w = new WorkspaceClient();

    CreateClusterResponse c = w.clusters().create(
      new CreateCluster()
        .setClusterName("my-cluster")
        .setSparkVersion("12.2.x-scala2.12")
        .setNodeTypeId("i3.xlarge")
        .setAutoterminationMinutes(15L)
        .setNumWorkers(1L)
    ).getResponse();

    System.out.println("View the cluster at " +
      w.config().getHost() +
      "#setting/clusters/" +
      c.getClusterId() +
      "/configuration\n");
  }
}

Permanently delete a cluster

This code example permanently deletes the cluster with the specified cluster ID from the workspace.

import com.databricks.sdk.WorkspaceClient;
import java.util.Scanner;

public class Main {
  public static void main(String[] args) {
    System.out.println("ID of cluster to delete (for example, 1234-567890-ab123cd4):");

    Scanner in = new Scanner(System.in);
    String c_id = in.nextLine();
    WorkspaceClient w = new WorkspaceClient();

    w.clusters().permanentDelete(c_id);
  }
}

Create a job

This code example creates a Databricks job that can be used to run the specified notebook on the specified cluster. As this code runs, it gets the existing notebook’s path, the existing cluster ID, and related job settings from the user at the terminal.

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.service.jobs.JobTaskSettings;
import com.databricks.sdk.service.jobs.NotebookTask;
import com.databricks.sdk.service.jobs.NotebookTaskSource;
import com.databricks.sdk.service.jobs.CreateResponse;
import com.databricks.sdk.service.jobs.CreateJob;

import java.util.Scanner;
import java.util.Map;
import java.util.Collection;
import java.util.Arrays;

public class Main {
  public static void main(String[] args) {
    System.out.println("Some short name for the job (for example, my-job):");
    Scanner in = new Scanner(System.in);
    String jobName = in.nextLine();

    System.out.println("Some short description for the job (for example, My job):");
    String description = in.nextLine();

    System.out.println("ID of the existing cluster in the workspace to run the job on (for example, 1234-567890-ab123cd4):");
    String existingClusterId = in.nextLine();

    System.out.println("Workspace path of the notebook to run (for example, /Users/someone@example.com/my-notebook):");
    String notebookPath = in.nextLine();

    System.out.println("Some key to apply to the job's tasks (for example, my-key): ");
    String taskKey = in.nextLine();

    System.out.println("Attempting to create the job. Please wait...");

    WorkspaceClient w = new WorkspaceClient();

    Map<String, String> map = Map.of("", "");

    Collection<JobTaskSettings> tasks = Arrays.asList(new JobTaskSettings()
      .setDescription(description)
      .setExistingClusterId(existingClusterId)
      .setNotebookTask(new NotebookTask()
        .setBaseParameters(map)
        .setNotebookPath(notebookPath)
        .setSource(NotebookTaskSource.WORKSPACE))
      .setTaskKey(taskKey)
    );

    CreateResponse j = w.jobs().create(new CreateJob()
      .setName(jobName)
      .setTasks(tasks)
    );

    System.out.println("View  the job at " +
      w.config().getHost() +
      "/#job/" +
      j.getJobId()
    );
  }
}

List account-level groups

This code example lists the display names for all of the available groups within the Databricks account.

import com.databricks.sdk.AccountClient;
import com.databricks.sdk.core.DatabricksConfig;
import com.databricks.sdk.service.iam.Group;
import com.databricks.sdk.service.iam.ListAccountGroupsRequest;

public class Main {
  public static void main(String[] args) {
    AccountClient a = new AccountClient();

    for (Group g : a.groups().list((new ListAccountGroupsRequest()))) {
      System.out.println(g.getDisplayName());
    }
  }
}