Query data

This article shows how to query data in Unity Catalog.

Requirements

  • The compute resource that you use to run the notebook, Databricks SQL editor, or Data Explorer workflow to create the query must be compliant with Unity Catalog security requirements.

  • To query data in a table or view, the user must have the USAGE permission on the parent catalog and schema and the SELECT permission on the table or view.

    Note

    To read from a view on a cluster with single-user security mode, the user must have SELECT on all referenced tables and views.

Three-level namespace notation

In Unity Catalog, a table or view is contained within a parent catalog and schema. You can refer to a table or view using two different styles of notation. You can use USE CATALOG and USE statements to specify the catalog and schema:

USE CATALOG <catalog_name>;
USE SCHEMA <schema_name>;
SELECT * from <table_name>;
spark.sql("USE CATALOG <catalog_name>")
spark.sql("USE SCHEMA <schema_name>")

display(spark.table("<table_name>"))
library(SparkR)

sql("USE CATALOG <catalog_name>")
sql("USE SCHEMA <schema_name>")

display(tableToDF("<table_name>"))
spark.sql("USE CATALOG <catalog_name>")
spark.sql("USE SCHEMA <schema_name>")

display(spark.table("<table_name>"))

As an alternative, you can use three-level namespace notation:

SELECT * from <catalog_name>.<schema_name>.<table_name>;
display(spark.table("<catalog_name>.<schema_name>.<table_name>"))
library(SparkR)

display(tableToDF("<catalog_name>.<schema_name>.<table_name>"))
display(spark.table("<catalog_name>.<schema_name>.<table_name>"))

Using three-level namespace simplifies querying data in multiple catalogs and schemas.

You can also use three-level namespace notation for data in the Hive metastore by setting <catalog_name> to hive_metastore.

Explore tables and views in Databricks SQL

You can quickly explore tables and views without the need to run a cluster by using Data Explorer.

  1. To open Data Explorer, click Data Icon Data in the sidebar.

  2. In Data Explorer, select the catalog and schema to view its tables and views.

For objects in the Hive Metastore, you must be running a SQL warehouse to use Data Explorer.

Select from tables and views

To select from a table or view from a notebook:

  1. Use the sidebar to switch to Data Science & Engineering.

  2. Attach the notebook to a Data Science & Engineering or Databricks Machine Learning cluster that is configured for Unity Catalog.

  3. In the notebook, create a query that references Unity Catalog tables and views. You can use three-level namespace notation to easily select data in multiple catalogs and schemas, including the workspace-local Hive metastore.

    Note

    To read from a view from a cluster with single-user security mode, the user must have SELECT on all referenced tables and views.

To select from a table or view from Databricks SQL:

  1. Use the sidebar to switch to Databricks SQL.

  2. Click SQL Editor in the sidebar.

  3. Select a SQL warehouse that is configured for Unity Catalog.

  4. Compose a query. To insert a table or view into the query, select a catalog and schema, then click the name of the table or view to insert.

  5. Click Run.

Select from files

You can explore the contents of data files stored in your cloud tenant before creating tables from them.

Requirements

You must have the READ FILES permission on either an external location (recommended) associated with the path to the files in your cloud tenant or directly on the storage credential that contains the IAM role that authorizes reading from the path. If you have already defined a table on the path, you can access the data through the table’s path if you have SELECT permission on the table and USAGE permission on the table’s parent catalog and schema.

To explore data stored in external location:

  1. List the files in a cloud storage path:

    LIST 's3://<path_to_files>';
    
    display(spark.sql("LIST 's3://<path_to_files>'"))
    
    library(SparkR)
    
    display(sql("LIST 's3://<path_to_files>'"))
    
    display(spark.sql("LIST 's3://<path_to_files>'"))
    

    If you have the READ FILES permission on the external location associated with the cloud storage path, a list of data files in that location is returned.

  2. Query the data in the files in a given path:

    SELECT * FROM <format>.`s3://<path_to_files>`;
    
    display(spark.read.load("s3:://<path_to_files>"))
    
    library(SparkR)
    
    display(loadDF("s3:://<path_to_files>"))
    
    display(spark.read.load("s3:://<path_to_files>"))
    

    To explore data using a storage credential directly:

    SELECT * FROM <format>.`s3://<path_to_files>`;
    WITH (CREDENTIAL <storage_credential);
    
    display(spark.sql("SELECT * FROM <format>.`s3://<path_to_files>` "
      "WITH (CREDENTIAL <storage_credential)"))
    
    library(SparkR)
    
    display(sql(paste("SELECT * FROM <format>.`s3://<path_to_files>` ",
      "WITH (CREDENTIAL <storage_credential)",
      sep = "")))
    
    display(spark.sql("SELECT * FROM <format>.`s3://<path_to_files>` " +
      "WITH (CREDENTIAL <storage_credential)"))