Use IntelliJ IDEA with Databricks Connect for Scala

Note

This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.

This article covers how to use Databricks Connect for Scala and IntelliJ IDEA with the Scala plugin. Databricks Connect enables you to connect popular IDEs, notebook servers, and other custom applications to Databricks clusters. See What is Databricks Connect?.

Note

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

To use Databricks Connect and IntelliJ IDEA with the Scala plugin to create, run, and debug a sample Scala sbt project, follow these instructions. These instructions were tested with IntelliJ IDEA Community Edition 2023.3.6. If you use a different version or edition of IntelliJ IDEA, the following instructions might vary.

  1. Make sure that the Java Development Kit (JDK) is installed locally. Databricks recommends that your local JDK version match the version of the JDK on your Databricks cluster.

  2. Start IntelliJ IDEA.

  3. Click File > New > Project.

  4. Give your project some meaningful Name.

  5. For Location, click the folder icon, and complete the on-screen directions to specify the path to your new Scala project.

  6. For Language, click Scala.

  7. For Build system, click sbt.

  8. In the JDK drop-down list, select an existing installation of the JDK on your development machine that matches the JDK version on your cluster, or select Download JDK and follow the on-screen instructions to download a JDK that matches the JDK version on your cluster.

    Note

    Choosing a JDK install that is above or below the JDK version on your cluster might produce unexpected results, or your code might not run at all.

  9. In the sbt drop-down list, select the latest version.

  10. In the Scala drop-down list, select the version of Scala that matches the Scala version on your cluster.

    Note

    Choosing a Scala version that is below or above the Scala version on your cluster might produce unexpected results, or your code might not run at all.

  11. For Package prefix, enter some package prefix value for your project’s sources, for example org.example.application.

  12. Make sure the Add sample code box is checked.

  13. Click Create.

  14. Add the Databricks Connect package: with your new Scala project open, in your Project tool window (View > Tool Windows > Project), open the file named build.sbt, in project-name > target.

  15. Add the following code to the end of the build.sbt file, which declares your project’s dependency on a specific version of the Databricks Connect library for Scala:

    libraryDependencies += "com.databricks" % "databricks-connect" % "14.3.1"
    

    Replace 14.3.1 with the version of the Databricks Connect library that matches the Databricks Runtime version on your cluster. You can find the Databricks Connect library version numbers in the Maven central repository.

  16. Click the Load sbt changes notification icon to update your Scala project with the new library location and dependency.

  17. Wait until the sbt progress indicator at the bottom of the IDE disappears. The sbt load process might take a few minutes to complete.

  18. Add code: in your Project tool window, open the file named Main.scala, in project-name > src > main > scala.

  19. Replace any existing code in the file with the following code and then save the file:

    package org.example.application
    
    import com.databricks.connect.DatabricksSession
    import org.apache.spark.sql.SparkSession
    
    object Main {
      def main(args: Array[String]): Unit = {
        val spark = DatabricksSession.builder().remote().getOrCreate()
        val df = spark.read.table("samples.nyctaxi.trips")
        df.limit(5).show()
      }
    }
    
  20. Run the code: start the target cluster in your remote Databricks workspace.

  21. After the cluster has started, on the main menu, click Run > Run ‘Main’.

  22. In the Run tool window (View > Tool Windows > Run), in the Main tab, the first 5 rows of the samples.nyctaxi.trips table appear. All Scala code runs locally, while all Scala code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.

  23. Debug the code: start the target cluster in your remote Databricks workspace, if it is not already running.

  24. In the preceding code, click the gutter next to df.limit(5).show() to set a breakpoint.

  25. After the cluster has started, on the main menu, click Run > Debug ‘Main’.

  26. In the Debug tool window (View > Tool Windows > Debug), in the Console tab, click the calculator (Evaluate Expression) icon.

  27. Enter the expression df.schema and click Evaluate to show the DataFrame’s schema.

  28. In the Debug tool window’s sidebar, click the green arrow (Resume Program) icon.

  29. In the Console pane, the first 5 rows of the samples.nyctaxi.trips table appear. All Scala code runs locally, while all Scala code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.