What is data warehousing on Databricks?

The Databricks Lakehouse Platform provides a complete end-to-end data warehousing solution. The Databricks Lakehouse Platform is built on open standards and APIs. The Databricks Lakehouse combines the ACID transactions and data governance of enterprise data warehouses with the flexibility and cost-efficiency of data lakes. Databricks SQL describes the enterprise data warehouse built into the Databricks Lakehouse Platform that provides general compute resources for business analytics. The core offering of Databricks SQL is a SQL warehouse.

What is data modeling on Databricks?

The Databricks Lakehouse Platform organizes data stored with Delta Lake in cloud object storage with familiar relations like database schemas, tables, and views. Databricks recommends a multi-layer approach to validating, cleansing, and transforming data for analytics. For more information, see the medallion architecture.

What is Databricks SQL?

Databricks SQL provides general compute resources for SQL queries, visualizations, and dashboards that are executed against the tables in the lakehouse. Within Databricks SQL, these queries, visualizations, and dashboards are developed and executed using SQL editor.

What is the SQL editor?

Use the built-in SQL editor to explore schemas and to write, share, and reuse queries using familiar SQL syntax. Regularly used SQL code can be saved as snippets for quick reuse, and query results can be cached to keep run times short. Additionally, query updates can be scheduled to automatically refresh, as well as to issue alerts when meaningful changes occur in the data. Databricks SQL also allows analysts to make sense of data through visualizations and drag-and-drop dashboards for quick ad-hoc exploratory analysis.

What are the available warehouse types?

Databricks SQL supports three warehouse types, each with different levels of performance and feature support.

Note

For pricing for each warehouse type and a detailed feature comparison, see Databricks SQL. To learn about the latest Databricks SQL features, see Databricks SQL release notes.

  • Serverless: Supports all features in the pro SQL warehouse type, as well as advanced Databricks SQL performance features. SQL warehouses run in the customer’s Databricks account using serverless compute. See Serverless compute.

    • If serverless compute is enabled for your account, in the UI the default is a serverless SQL warehouse. If your account needs updated terms of use, workspace admins are prompted in the Databricks SQL UI.

    • If your workspace has an AWS instance profile, it might need updates to the trust relationship, depending on how and when it was created.

    Important

    SQL warehouses do not support credential passthrough. Databricks recommends using Unity Catalog for data governance. See What is Unity Catalog?.

  • Pro: Supports additional Databricks SQL performance features (compared to classic) and supports all Databricks SQL functionality.

  • Classic: Supports entry level performance features and a limited set of Databricks SQL functionality.

What are the warehouse type defaults?

For workspaces in regions that support serverless SQL warehouses and satisfy the requirements:

  • Using the UI, the default SQL warehouse type is serverless.

  • Using the SQL Warehouses API with default parameters, the default SQL warehouse type is classic. To use serverless, set the enable_serverless_compute parameter to true and also set warehouse_type to pro. If this workspace used the SQL Warehouses API to create a warehouse between September 1, 2022 and April 30, 2023, and fits the requirements for serverless SQL warehouses, the default remains set to true. To avoid ambiguity, especially for organizations with many workspaces, Databricks recommends that you always set this field.

  • If the workspace uses a legacy external Hive metastore, serverless SQL warehouses are not supported. The default SQL warehouse type is the same as if serverless compute were disabled, which is pro in the UI and classic using the API. Also, contact your Databricks representative to learn more about Unity Catalog or other options.

For workspaces that do not support serverless SQL warehouses:

  • Using the UI, the default SQL warehouse type is pro.

  • Using the SQL Warehouses API with default parameters, the default SQL warehouse type is classic.

Note

If your account needs updated terms of use, workspace admins are prompted in the Databricks SQL UI.

Databricks SQL and third party BI tools

Databricks SQL also supports many third party BI and visualization tools for your Databricks data warehouse solution. See also Databricks and PowerBI and Databricks and Tableau.

Databricks SQL APIs

Databricks SQL also provides a robust API for your programming needs.

Databricks SQL administrative tasks

For information on enabling Databricks SQL, creating and managing SQL warehouses, managing users and data access, and other administrative tasks, see Set up your workspace to use Databricks SQL.

Developer tools

You can use a variety of developer tools to run SQL commands and scripts and to browse database objects in Databricks. See Use a SQL database tool.