Data governance with Unity Catalog
Unity Catalog provides a unified data governance model for the data lakehouse. SAP Databricks administrators can manage permissions for teams and individuals. Privileges are managed with access control lists (ACLs) through either user-friendly UIs or SQL syntax.
This page details specific guidance for data governance in SAP Databricks.
Features
The following data governance features are included in SAP Databricks:
- Unity Catalog
- Access controls
- Lakehouse monitoring
- Receive data through Delta Sharing
- Connect to external data
Unity Catalog
Unity Catalog is a unified governance solution for data and AI assets on Databricks. Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces. To learn more about the Unity Catalog object model, see Database objects in SAP Databricks.
Centralize access control using Unity Catalog
Unity Catalog is a fine-grained governance solution for data and AI on the Databricks platform. It helps simplify security and governance of your data and AI assets by providing a central place to administer and audit access to data and AI assets.
Discover data using Catalog Explorer
Databricks Catalog Explorer provides a UI to explore and manage data and AI assets, including schemas (databases), tables, volumes (non-tabular data), and registered ML models, along with asset permissions, data owners, external locations, and credentials. You can use the Insights tab in Catalog Explorer to view the most frequent recent queries and users of any table registered in Unity Catalog. See What is Catalog Explorer?.
Track data lineage using Unity Catalog
You can use Unity Catalog to capture runtime data lineage across queries in any language executed on a Databricks cluster or SQL warehouse. Lineage is captured down to the column level, and includes notebooks, jobs, and dashboards related to the query. See Capture and view data lineage using Unity Catalog.
Receive data shares through Delta Sharing
Delta Sharing is an open protocol developed by Databricks for secure data and AI asset sharing with other organizations, or with other teams within your organization, regardless of which computing platforms they use. In SAP Databricks, you can receive Delta Shares.
Lakehouse monitoring
Databricks Lakehouse Monitoring lets you monitor the statistical properties and quality of the data in all of the tables in your account. To learn more, see Monitor data and AI assets with Lakehouse Monitoring. To create a monitor, see Create a monitor using the Databricks UI.
Connect to external data
Once you have external locations configured in Unity Catalog, you can create external tables and volumes on directories inside the external locations. You can then use Unity Catalog to manage user and group access to these tables and volumes. This allows you to provide specific users or groups access to specific directories and files in the cloud storage bucket.
Prevent data exfiltration by making external locations read-only.