Skip to main content

External lineage

Unity Catalog automatically captures runtime data lineage across queries that are run on Databricks. However, you might have workloads that run outside of Databricks (for example, first-mile ETL or last-mile BI). Unity Catalog lets you add external lineage metadata to augment the Databricks data lineage it captures automatically, giving you an end-to-end lineage view in Unity Catalog. This is useful when you want to capture where data came from (for example, Salesforce or MySQL) before it was ingested into Unity Catalog or where data is consumed outside of Unity Catalog (for example, Tableau or Power BI).

You can add external lineage in two ways:

The following lineage graph shows two external tables in MySQL and PostgreSQL that were ingested into Databricks as a Unity Catalog managed table, with columns transformed into a release_date column, and then consumed by an external report.

A lineage graph showing external upstream tables and a downstream report connected to a Unity Catalog table, with the Create external lineage button in the upper-right corner.

For general information about data lineage in Databricks, see Data lineage in Unity Catalog.

Requirements

To add external lineage metadata in Unity Catalog, you must have the following privileges, depending on the specific task:

  • To create an external metadata securable object in Unity Catalog, you must have the CREATE EXTERNAL METADATA privilege on the metastore.
  • To specify lineage relationships between an external metadata object and any other Unity Catalog object, you must have the MODIFY privilege on the external metadata object.
  • To specify a downstream lineage relationship to a Unity Catalog object, you must have read privileges on the object (for example, SELECT on a table).
  • To specify an upstream lineage relationship to a Unity Catalog object, you must have write privileges on the object (for example, MODIFY on a table).

Add external lineage metadata

To add external lineage metadata:

  1. Create an external metadata securable object in Unity Catalog.

    This object represents an entity in an external system, such as a dashboard in Tableau.

  2. Configure a lineage relationship between the external metadata object and another Unity Catalog object, such as a table, model, path, or other external metadata object.

    When you have created lineage relationships, the external metadata object appears in the lineage graph view.

You can create external metadata objects and configure lineage relationships using the Catalog Explorer UI. To start from an existing lineage graph, click Create external lineage in the upper-right corner of the graph. You can also begin from the External data section in Catalog Explorer, as described in the following sections.

Create an external metadata object

You can create an external metadata object using Catalog Explorer or the External Metadata API.

To use Catalog Explorer to create an external metadata object:

  1. In your Databricks workspace, click Data icon. Catalog.

  2. Click the External data > button, go to the External Metadata tab, and click Create external metadata.

  3. Specify the metadata details.

    Required:

    • Name: Enter a human-readable name that helps Databricks users understand what they are seeing in lineage. You cannot use spaces.
    • System type: Select from the list of common external data and BI systems. If you don't find yours, select Custom.
    • Entity type: Enter the type of object, such as "table" or "dashboard."

    Optional:

    • URL: Enter the URL of the object if you want lineage graph viewers to be able to click through to the external asset (such as a Tableau dashboard, for example).
    • Description

    Advanced:

    • Columns: If you want to do column-level mapping from this external object to another Unity Catalog object, enter column names. Select UI to enter them one at a time or Text Input to enter a comma-delimited list in a single text box.
    • Properties: If there are other properties that you want to track in lineage, enter them as a JSON key-value pairs. You can use the UI to enter each key-value pair, or enter a complete JSON object.
  4. Click Create.

    A dialog gives you the option to view the external metadata object or to create lineage relationships for the object.

Create lineage relationships

You can create lineage relationships using Catalog Explorer, the External Lineage API, or the Databricks SDK for Python.

To add relationships between an external metadata object and other Unity Catalog objects:

  1. Follow the prompt mentioned above or find the existing external metadata object in Catalog Explorer:

    1. Click Data icon. Catalog
    2. Click the External data > button
    3. Go to the External Metadata tab and select the external metadata object.
  2. Click Create lineage relationship.

  3. Select whether you want to create an upstream or downstream relationship.

  4. Enter the Object type that you want to create the relationship to:

    • Table: Select the table using the search dialog.
    • Model: Select the model using the search dialog, and then select the model version.
    • Path: For volumes or external locations, enter the path.
    • External metadata: Select the external metadata object from the drop-down menu.
  5. (Optional) Click Advanced to add:

    • Column mappings between the external metadata object and the source or target object.
    • Other metadata as JSON key-value pairs. For example, you can use these to enter the text of the query that created a table from the external metadata object or annotations that explain the external workflow that generated the relationship.
  6. Click Create.

You can now see the external lineage relationship in the Lineage tab of the related objects.

Model external lineage relationships

When you add external lineage manually, use the following patterns to model more complex relationships:

  • Connect two Unity Catalog tables: To specify a lineage relationship between two tables that are both registered in Unity Catalog, create an external metadata object that sits between them. Specify one table as upstream to the external metadata object and the other as downstream so that they appear connected in the lineage graph.
  • Add multiple levels of lineage: To annotate data that passes through multiple systems before it enters Databricks, create multiple external metadata objects and configure external lineage relationships between each of them.
  • Add column-level lineage: Specify column names when you create the external metadata object, then map the source and target columns when you configure the lineage relationship.

Limitations

  • External lineage is not recorded in the lineage system tables (system.access.table_lineage and system.access.column_lineage).
  • You can create up to 10,000 external metadata objects and 100,000 external lineage relationships per metastore. See Resource limits.

Additional resources