Bring your own data lineage
This feature is in Public Preview.
This page describes how to update data lineage to include external assets and workflows that are run outside of Databricks.
Unity Catalog automatically captures runtime data lineage across queries that are run on Databricks. However, you might have workloads that run outside of Databricks (for example, first mile ETL or last mile BI). Unity Catalog lets you add external lineage metadata to augment the Databricks data lineage it captures automatically, giving you an end-to-end lineage view in Unity Catalog. This is useful when you want to capture where data came from (for example, Salesforce or MySQL) before it was ingested into Unity Catalog or where data is being consumed outside of Unity Catalog (for example, Tableau or PowerBI).
The following lineage graph shows an external PostgreSQL table that was ingested into Databricks as a Unity Catalog managed table, with three columns transformed into one release_date
column, and then queried using PowerBI.
For general information about data lineage in Databricks, see View data lineage using Unity Catalog.
Requirements
To add external lineage metadata in Unity Catalog, you must have the following privileges, depending on the specific task:
- To create an external metadata securable object in Unity Catalog, you must have the
CREATE EXTERNAL METADATA
privilege on the metastore. - To specify lineage relationships between an external metadata object and any other Unity Catalog object, you must have the
MODIFY
privilege on the external metadata object. - To specify a downstream lineage relationship to a Unity Catalog object, you must have read privileges on the object (for example,
SELECT
on a table). - To specify an upstream lineage relationship to a Unity Catalog object, you must have write privileges on the object (for example,
MODIFY
on a table).
Add external lineage metadata
To add external lineage metadata:
-
Create an external metadata securable object in Unity Catalog.
This object represents an entity in an external system, such as a dashboard in Tableau.
-
Configure a lineage relationship between the external metadata object and another Unity Catalog object, such as a table, model, path, or other external metadata object.
When you have created lineage relationships, the external metadata object appears in the lineage graph view.
You can create external metadata objects and configure lineage relationships using the Catalog Explorer UI or a REST API.
Create an external metadata object
To use Catalog Explorer to create an external metadata object:
-
In your Databricks workspace, click
Catalog.
-
On the Quick access page, click the External data > button, go to the External Metadata tab, and click Create external metadata.
-
Specify the metadata details.
Required:
- Name: Enter a human-readable name that will help Databricks users understand what they are seeing in lineage. You cannot use spaces.
- System type: Select from the list of common external data and BI systems. If you don't find yours, select Custom.
- Entity type: Enter the type of object, such as "table" or "dashboard."
Optional:
- URL: Enter the URL of the object if you want lineage graph viewers to be able to click through to the external asset (such as a Tableau dashboard, for example).
- Description
Advanced:
- Columns: If you want to do column-level mapping from this external object to another Unity Catalog object, enter column names. Select UI to enter them one at a time or Text Input to enter a comma-delimited list in a single text box.
- Properties: If there are other properties that you want to track in lineage, enter them as a JSON key-value pairs. You can use the UI to enter each key-value pair, or enter a complete JSON object.
-
Click Create.
A dialog gives you the option to view the external metadata object or to create lineage relationships for the object.
Create lineage relationships
To add relationships between an external metadata object and other Unity Catalog objects:
-
Follow the prompt mentioned above or find the existing external metadata object in Catalog Explorer:
- Click
Catalog
- Click the External data > button
- Go to the External Metadata tab and select the external metadata object.
- Click
-
Click Create lineage relationship.
-
Select whether you want to create an upstream or downstream relationship.
-
Enter the Object type that you want to create the relationship to:
- Table: Select the table using the search dialog.
- Model: Select the model using the search dialog, and then select the model version.
- Path: For volumes or external locations, enter the path.
- External metadata: Select the external metadata object from the drop-down menu.
-
(Optional) Click Advanced to add:
- Column mappings between the external metadata object and the source or target object.
- Other metadata as JSON key-value pairs. For example, you can use these to enter the text of the query that created a table from the external metadata object or annotations that explain the external workflow that generated the relationship.
-
Click Create.
You can now see the the external lineage relationship in the Lineage tab of the related objects.
Frequently asked questions about external lineage
Does Databricks provide any connectors or crawlers to bring in external lineage metadata automatically?
No, external lineage is not captured automatically. You must use the REST API or Catalog Explorer to add external lineage.
Is external lineage that I add recorded in the lineage system table?
No, external lineage that you add using this feature cannot be queried from the lineage system table. You must call the REST API to fetch external lineage programmatically.
Can I specify a lineage relationship between two tables registered in Unity Catalog using this feature?
To specify a lineage relationship between two tables that are both registered in Unity Catalog, you must create an external metadata object that sits between them. You can specify one table as upstream to the external metadata object and the other downstream in order for them to show as connected in the lineage graph.
Can I use this feature to specify multiple levels of external lineage relationships (for example, annotating data that goes through multiple systems before it enters Databricks)?
Yes, you can specify multiple levels of external lineage by creating multiple external metadata securable objects and creating external lineage relationships to each of them.
Can I add column-level external lineage using this feature?
Yes, you can add column level external lineage. You must specify column names when you create the external metadata securable object and specify the source and target column mappings when you configure the external lineage relationship.