Asynchronous queries and interruptions with Databricks Connect for Scala

Note

This article covers Databricks Connect for Databricks Runtime 14.0 and above.

This article describes how to handle asynchronous queries and interruptions with Databricks Connect for Scala. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. See What is Databricks Connect?. For the Python version of this article, see Asynchronous queries and interruptions with Databricks Connect for Python.

Note

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

For Databricks Connect for Databricks Runtime 14.0 and above, query execution is more resilient to network and other interrupts when executing long running queries. When the client program receives an interruption or the process is paused (up to 5 minutes) by the operating system, such as when the laptop lid is shut, the client reconnects to the running query. This also allows queries to run for longer times (previously only 1 hour).

Databricks Connect now also comes with the ability to interrupt running queries, if desired, such as for cost saving.

import com.databricks.connect.DatabricksSession

object InterruptTagExample {
  def main(args: Array[String]): Unit = {

    val session = DatabricksSession.builder.getOrCreate()

    val t = new Thread {
      override def run {
        Thread.sleep(5000)
        session.interruptTag("interrupt-me")
      }
    }

    // All subsequent DataFrame queries that use session will have this tag.
    session.addTag("interrupt-me")

    t.start()

    val df = <a long running DataFrame query>
    df.show()

    t.join()
  }
}