Data governance guide

This guide shows how to manage data and data access in Databricks. For information on Databricks security, see the Security and compliance guide. Databricks provides centralized governance for data and AI with Unity Catalog and Delta Sharing.

Centralize access control using Unity Catalog

Unity Catalog is a fine-grained governance solution for data and AI on the Databricks platform. It helps simplify security and governance of your data and AI assets by providing a central place to administer and audit access to data and AI assets.

For best practices on adopting Unity Catalog, see Unity Catalog best practices.

Track data lineage using Unity Catalog

You can use Unity Catalog to capture runtime data lineage across queries in any language executed on a Databricks cluster or SQL warehouse. Lineage is captured down to the column level, and includes notebooks, workflows, and dashboards related to the query. To learn more, see Capture and view data lineage with Unity Catalog.

Discover data using Catalog Explorer

Databricks Catalog Explorer provides a UI to explore and manage data and AI assets, including schemas (databases), tables, volumes (non-tabular data), and registered ML models, along with asset permissions, data owners, external locations, and credentials. You can use the Insights tab in Catalog Explorer to view the most frequent recent queries and users of any table registered in Unity Catalog.

Share data using Delta Sharing

Delta Sharing is an open protocol developed by Databricks for secure data and AI asset sharing with other organizations, or with other teams within your organization, regardless of which computing platforms they use.

Configure audit logging

Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns.

Unity Catalog lets you easily access and query your account’s operational data, including audit logs, billable usage, and lineage using system tables (Public Preview).

Configure identity

Every good data governance story starts with a strong identity foundation. To learn how to best configure identity in Databricks, see Identity best practices.

Legacy data governance solutions

  • Table access control is a legacy data governance model that lets you programmatically grant and revoke access to objects managed by your workspace’s built-in Hive metastore. Databricks recommends that you use Unity Catalog instead of table access control. Unity Catalog simplifies security and governance of your data by providing a central place to administer and audit data access across multiple workspaces in your account.

  • IAM role credential passthrough is also a legacy data governance feature that allows users to authenticate automatically to S3 buckets from Databricks clusters using the identity that they use to log in to Databricks. Databricks recommends that you use Unity Catalog instead.