Databricks provides a suite of tools and products that simplify the discovery of data assets that are accessible through the Databricks Data Intelligence Platform. This article provides an opinionated overview of how you can discover and preview data that has already been configured for access in your workspace.
To connect to data sources, see Connect to data sources.
For information about gaining access to data in the Databricks Marketplace, see What is Databricks Marketplace?.
Topics in this section focus on exploring data objects and data files. If you’re looking for information about working with assets such as notebooks, SQL queries, libraries, and models, see Navigate the workspace.
If you’re seeking guidance around generating summary statistics for datasets or other tasks associated with exploratory data analysis (EDA), see Exploratory data analysis on Databricks: Tools and techniques.
Data discovery tools on Databricks fall into the following general categories:
AI-assisted insights, summary, and search.
Catalog exploration using the UI.
Programmatic listing and metadata exploration.
Data discovery tools are optimized for data governed by Unity Catalog. Data assets that have not been registered as Unity Catalog objects might not be discoverable using some of these approaches.
Catalog Explorer provides tools for exploring and governing data assets. You access Catalog Explorer using the Catalog in the workspace sidebar. See What is Catalog Explorer?.
Notebooks and the SQL query editor also provide a catalog navigator for exploring database objects. Click the Catalog icon in these interfaces to expand or collapse the catalog navigator without leaving from your code editor.
Once you’ve discovered a dataset of interest, you can use the Insights tab to learn how the data is being used in your workspace. See View frequent queries and users of a table.
You can use the
SHOW command on all database objects to discover assets registered to Unity Catalog. Use the
LIST command, the
%fs magic command, or Databricks Utilities to list files.
You can review comments to learn about the contents of datasets available in your lakehouse. Comments can be set on data objects including catalogs, schemas, tables, and columns. You can view comments in Catalog Explorer or using the
DESCRIBE command for an object.
Catalog Explorer can provide AI-generated comments for tables, which makes it easy for data asset owners to provide a rich overview of datasets. See Add AI-generated comments to a table.
Users can also optionally provide comments on tables and other database objects using markdown, which is rendered in Catalog Explorer. See Document data in Catalog Explorer using markdown comments.
You can use the search bar in Databricks to find tables registered to Unity Catalog. You can either perform a keyword search or use semantic search to find datasets or columns that relate to your search query. Search only returns results for tables that you have permission to see. Search reviews table names, column names, table comments, and column comments. See Search for workspace objects.