Data governance with Databricks
This page gives an overview of how to govern data using Unity Catalog in Databricks.
This page focuses on the governance of data. Related security topics, such as the following, are covered in Security and compliance:
- Authentication and access control
- Network configuration
- Data security and encryption
- Privacy and compliance
What is Unity Catalog?
Unity Catalog is a centralized data catalog that provides fine-grained access control for tabular and unstructured data in multiple formats on multiple platforms, along with governance of AI assets like machine learning models. It also includes the tools you need to discover data, track usage, capture lineage, and monitor data quality.
Unity Catalog is open-source and supports multiple platforms. It is deeply integrated into Databricks.
The Unity Catalog data governance model
Data governance with Unity Catalog provides the following:
- Data unification: a unified view of all data and AI assets, across platforms, reducing duplication and sprawl.
- Data access control: tools to ensure that data is easy to access, but only for the right users.
- Data discoverability: tools that make it easy to find the data you need.
- Data quality: tools to ensure that data that is accurate, complete, consistent, and secure throughout its lifecycle.
- Data collaboration and sharing: the ability to share data securely not just within your organization but across organizational and platform boundaries.
- Auditing: tools that capture who uses the data and how.
This page explains how your organization can meet these needs using Unity Catalog in Databricks.
Data access control
To make sure that users only access the data they should, Unity Catalog provides a hierarchical privilege model that enables you to grant users, groups, and service principals access to data and AI assets from the account level down to table rows and columns. You can control access to assets that are stored in dedicated Unity Catalog storage or stored in other platforms, like cloud storage or database systems: the key is that Unity Catalog gives your users potential access to all of your data, no matter where it is, from within Databricks, and that Unity Catalog controls their access and tracks their data usage.
Task | Description |
---|---|
Learn about the securable objects that Unity Catalog manages and how to control access to them. | |
Learn how to manage identities in the context of Unity Catalog. | |
Learn how to control access to table data using row filters and column masks. | |
Learn how to control access to cloud storage, external data platforms, and external non-data services using Unity Catalog. | |
Learn how Unity Catalog can manage access to your data from external platforms that use the Apache Iceberg or open-source Unity Catalog APIs. |
Data discoverability
Databricks and Unity Catalog provide the following tools to help users find the data they need:
Feature | Description |
---|---|
Browse and search for data and AI assets using asset names and metadata such as comments and tags. | |
Catalog browsers | Find data and AI assets using browsers that are built into the notebook and SQL query editors. See Navigate the Databricks notebook and file editor and Write queries and explore data in the SQL editor. |
Automatically generate documentation of data and AI assets to assist discoverability. | |
Use a UI built into Catalog Explorer to view the most frequent users and queries of any table in Unity Catalog. | |
Capture and visualize the way data flows through your organization. For feature and model lineage, see Feature governance and lineage. | |
Display relationships for tables that have foreign keys defined. |
See also Discover data.
Data quality monitoring
Tools for ensuring data quality and data integrity are deeply integrated into Delta Lake, Apache Spark, and Databricks. You can learn about them throughout the Databricks documentation.
Unity Catalog adds the following:
Feature | Description |
---|---|
A data monitoring tool that captures the statistical properties and quality of the data in all of the tables in your account. You can also use it to track the performance of machine learning models and model-serving endpoints by monitoring inference tables that contain model inputs and predictions. | |
Label securable objects, such as catalogs, schemas, and tables, with indicators of data quality or lifecycle status. These system tags help organizations enforce governance, improve data discoverability, and increase trust in analytics and AI applications. |
Data collaboration and sharing
Unity Catalog lets your users collaborate on the same data across all of your account's workspaces in the same region. When you require collaboration across workspace regions, across organizations, and across platforms, Unity Catalog provides the foundation for the following sharing tools.
Feature | Description |
---|---|
A secure data sharing platform that lets you share data and AI assets in Databricks with users outside your organization, whether those users use Databricks or not. | |
A Databricks-managed environment where multiple participants on Databricks and non-Databricks platforms can collaborate on projects without sharing underlying data with each other. | |
An open forum for exchanging data and AI products. It also provides a private data exchange. |
Auditing
Audit logs capture fine-grained details about who accessed a given dataset and the actions that they performed. Unity Catalog adds system tables, the easiest way to access and query your account's audit logs.
See Audit log reference and Monitor account activity with system tables.
Legacy Databricks data governance tools
Databricks also provides these legacy governance features. Databricks recommends that you use Unity Catalog instead.
Feature | Description |
---|---|
A legacy data governance model that lets you programmatically grant and revoke access to objects managed by your workspace’s built-in Hive metastore. | |
A legacy data governance feature that allows users to authenticate automatically to S3 buckets from Databricks clusters using the identity that they use to log in to Databricks. |
Next steps
- Learn more about Unity Catalog: What is Unity Catalog?
- Get started with Unity Catalog: Get started with Unity Catalog
- Review best practices: What is Unity Catalog?