Learn about tools and guidance you can use to work with Databricks assets and data and to develop Databricks applications.
You can connect many popular third-party IDEs to a Databricks cluster. This allows you to write code on your local development machine by using the Spark APIs and then run that code as jobs remotely on a Databricks cluster.
These third-party IDEs include:
- sparklyr and RStudio Desktop
- SparkR and RStudio Desktop
- Visual Studio Code
You can use connectors and drivers to connect your code to a Databricks cluster or a Databricks SQL endpoint. These connectors and drivers include:
- The Databricks SQL Connector for Python
- The Databricks ODBC driver
- The Databricks JDBC driver
For additional information about connecting your code through JDBC or ODBC, see the JDBC and ODBC configuration guidance.
Databricks provides additional developer tools.
|Name||Use this tool when you want to…|
|Databricks CLI||Use the command line to work with Data Science & Engineering workspace assets such as cluster policies, clusters, file systems, groups, pools, jobs, libraries, runs, secrets, and tokens.|
|Databricks Utilities||Run Python, R, or Scala code in a notebook to work with credentials, file systems, libraries, and secrets from a Databricks cluster.|
|Category||Use this API to work with…|
|REST API (latest)||Data Science & Engineering workspace assets such as clusters, global init scripts, groups, pools, jobs, libraries, permissions, secrets, and tokens, by using the latest version of the Databricks REST API.|
|REST API 2.1||Data Science & Engineering workspace assets such as jobs, by using version 2.1 of the Databricks REST API.|
|REST API 2.0||Data Science & Engineering workspace assets such as clusters, global init scripts, groups, pools, jobs, libraries, permissions, secrets, and tokens, by using version 2.0 of the Databricks REST API.|
|REST API 1.2||Command executions and execution contexts by using version 1.2 of the Databricks REST API.|
You can use an infrastructure-as-code (IaC) approach to programmatically provision Databricks infrastructure and assets such as workspaces, clusters, cluster policies, pools, jobs, groups, permissions, secrets, tokens, and users. For details, see Databricks Terraform provider.
To manage the lifecycle of Databricks assets and data, you can use continuous integration and delivery (CI/CD), data pipeline, and data engineering tools.
|Area||Use these patterns and best practices when you want to…|
|Continuous integration and delivery on Databricks using Jenkins||Develop a CI/CD pipeline for Databricks that uses Jenkins.|
|Managing dependencies in data pipelines||Manage and schedule a data pipeline that uses Apache Airflow.|
|dbt Core integration with Databricks||Transform data in Databricks by simply writing select statements on your local development machine. dbt turns these select statements into tables and views.|
|dbt Cloud integration with Databricks||Transform data in Databricks by simply writing select statements in your web browser. dbt turns these select statements into tables and views.|
You can use these third-party tools to run SQL commands and scripts and to browse database objects in Databricks.
|IDE||Use this when you want to:|
|DataGrip integration with Databricks||Use a query console, schema navigation, smart code completion, and other features to run SQL commands and scripts and to browse database objects in Databricks.|
|DBeaver integration with Databricks||Run SQL commands and browse database objects in Databricks by using this client software application and database administration tool.|
|SQL Workbench/J||Run SQL scripts (either interactively or as a batch) in Databricks by using this SQL query tool.|
You can connect many popular third-party tools to clusters and SQL endpoints to access data in Databricks. See the Databricks integrations.