Data governance with Unity Catalog
This guide shows how to manage data and AI object access in Databricks. For information on Databricks security, see the Security and compliance guide. Databricks provides centralized governance for data and AI with Unity Catalog and Delta Sharing.
Centralize access control using Unity Catalog
Unity Catalog is a fine-grained governance solution for data and AI on the Databricks platform. It helps simplify security and governance of your data and AI assets by providing a central place to administer and audit access to data and AI assets.
In most accounts, Unity Catalog is enabled by default when you create a workspace. For details, see Automatic enablement of Unity Catalog.
For a discussion of how to use Unity Catalog effectively, see Unity Catalog best practices.
Track data lineage using Unity Catalog
You can use Unity Catalog to capture runtime data lineage across queries in any language executed on a Databricks cluster or SQL warehouse. Lineage is captured down to the column level, and includes notebooks, jobs, and dashboards related to the query. To learn more, see Capture and view data lineage using Unity Catalog.
Discover data using Catalog Explorer
Databricks Catalog Explorer provides a UI to explore and manage data and AI assets, including schemas (databases), tables, volumes (non-tabular data), and registered ML models, along with asset permissions, data owners, external locations, and credentials. You can use the Insights tab in Catalog Explorer to view the most frequent recent queries and users of any table registered in Unity Catalog.
Configure audit logging
Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns.
Unity Catalog lets you easily access and query your account’s operational data, including audit logs, billable usage, and lineage using system tables (Public Preview).
Configure identity
Every good data governance story starts with a strong identity foundation. To learn how to best configure identity in Databricks, see Identity best practices.
Legacy data governance solutions
Databricks also provides these legacy governance models:
Table access control is a legacy data governance model that lets you programmatically grant and revoke access to objects managed by your workspace’s built-in Hive metastore. Databricks recommends that you use Unity Catalog instead of table access control. Unity Catalog simplifies security and governance of your data by providing a central place to administer and audit data access across multiple workspaces in your account.
IAM role credential passthrough is also a legacy data governance feature that allows users to authenticate automatically to S3 buckets from Databricks clusters using the identity that they use to log in to Databricks. Databricks recommends that you use Unity Catalog instead.