Lineage system tables reference

This page includes a reference for the two lineage system tables. These system tables build on Unity Catalog's data lineage feature, allowing you to programmatically query lineage data to fuel decision making and reports. To access the tables, the schemas must be enabled in your system catalog. For more information, see Enable system tables.

note

Both lineage tables represent a subset of all read/write events, as it is not always possible to capture lineage. Records are only emitted when lineage can be inferred. The data lineage limitations described in Lineage limitations also apply to lineage system tables.

Table lineage table

The table lineage system table includes a record for each read or write event on a Unity Catalog table or path. This includes, but is not limited to, job runs, notebook runs, and dashboards updated with the read or write event.

Table path: This system table is located at system.access.table_lineage.

Table lineage schema

The table lineage system table uses the following schema.

Column name	Data type	Description	Example
`account_id`	string	The ID of the Databricks account.	`7af234db-66d7-4db3-bbf0-956098224879`
`metastore_id`	string	The ID of the Unity Catalog metastore.	`5a31ba44-bbf4-4174-bf33-e1fa078e6765`
`workspace_id`	string	The ID of the workspace	`123456789012345`
`entity_type`	string	The type of entity associated with the lineage record, if any. The value is `NOTEBOOK`, `JOB`, `PIPELINE`, `DASHBOARD_V3`, `DBSQL_DASHBOARD` (legacy dashboard), `DBSQL_QUERY`, or NULL. Note: please see `entity_metadata` column for information on all entities involved in the lineage record.	`JOB`
`entity_id`	string	The ID of the entity associated with the lineage record, or NULL. Note: please see `entity_metadata` column for information on all entities involved in the lineage record.	`417306252667357`
`entity_run_id`	string	The unique run ID of the entity associated with the lineage record, or NULL. Note: please see `entity_metadata` column for information on all entities involved in the lineage record.	`688612062233399`
`source_table_full_name`	string	Three-part name to identify the source table.	`catalog.schema.table`
`source_table_catalog`	string	The catalog of the source table.	`catalog`
`source_table_schema`	string	The schema of the source table.	`schema`
`source_table_name`	string	The name of the source table.	`table`
`source_path`	string	Location in cloud storage of the source table, or the path if it's reading from cloud storage directly.	`s3://mybucket/table1`
`source_type`	string	The type of the source. The value is `TABLE`, `PATH`, `VIEW`, `MATERIALIZED_VIEW`, `METRIC_VIEW`, or `STREAMING_TABLE`.	`TABLE`
`target_table_full_name`	string	Three-part name to identify the target table.	`catalog.schema.table`
`target_table_catalog`	string	The catalog of the target table.	`catalog`
`target_table_schema`	string	The schema of the target table.	`schema`
`target_table_name`	string	The name of the target table.	`table`
`target_path`	string	Location in cloud storage of the target table.	`s3://mybucket/table1`
`target_type`	string	The type of the target. The value is `TABLE`, `PATH`, `VIEW`, `MATERIALIZED_VIEW`, `METRIC_VIEW`, or `STREAMING_TABLE`.	`TABLE`
`created_by`	string	The user who generated this lineage. This can be a Databricks username, a Databricks service principal ID, “System-User”, or `NULL` if the user information cannot be captured.	`crampton.rods@email.com`
`event_time`	timestamp	The timestamp when the lineage was generated. Timezone information is recorded at the end of the value with `+00:00` representing UTC.	`2025-04-20T19:47:21.194+00:00`
`event_date`	date	The date when the lineage was generated. This is a partitioned column.	`2025-04-20`
`record_id`	string	The unique ID of the lineage record. This value is auto-generated and cannot be joined with any tables.	`3c5c8eed-87bb-3aa6-8a86-80d00d48299e`
`event_id`	string	The unique ID for a single lineage event. Multiple rows may share the same `event_id` if they were generated by the same event. This value is auto-generated and cannot be joined with any tables.	`ca123ff3-f3f8-332b-a832-0154a6327353`
`statement_id`	string	The unique ID of a query statement that generated the lineage event. This is a foreign key to join with query history system table. This value is only set when the query is run from a SQL warehouse.	`1234526f-a6ac-475c-8601-e8637f8ee039`
`entity_metadata`	struct	Metadata about the entity responsible for the lineage event.	See Entity metadata

Column lineage table

The column lineage table does not include events that do not have a source. For example, if you insert into a column using explicit values, it is not captured. If you read a column, it is captured whether or not you write the output.

Table path: This system table is located at system.access.column_lineage.

Column lineage schema

The column lineage system table uses the following schema:

Column name	Data type	Description	Example
`account_id`	string	The ID of the Databricks account.	`7af234db-66d7-4db3-bbf0-956098224879`
`metastore_id`	string	The ID of the Unity Catalog metastore.	`5a31ba44-bbf4-4174-bf33-e1fa078e6765`
`workspace_id`	string	The ID of the workspace	`123456789012345`
`entity_type`	string	The type of entity associated with the lineage record, if any. The value is `NOTEBOOK`, `JOB`, `PIPELINE`, `DASHBOARD_V3`, `DBSQL_DASHBOARD` (legacy dashboard), `DBSQL_QUERY`, or NULL. Note: please see `entity_metadata` column for information on all entities involved in the lineage record.	`JOB`
`entity_id`	string	The ID of the entity associated with the lineage record, or NULL. Note: please see `entity_metadata` column for information on all entities involved in the lineage record.	`417306252667357`
`entity_run_id`	string	The unique run ID of the entity associated with the lineage record, or NULL. Note: please see `entity_metadata` column for information on all entities involved in the lineage record.	`688612062233399`
`source_table_full_name`	string	Three-part name to identify the source table.	`catalog.schema.table`
`source_table_catalog`	string	The catalog of the source table.	`catalog`
`source_table_schema`	string	The schema of the source table.	`schema`
`source_table_name`	string	The name of the source table.	`table`
`source_path`	string	Location in cloud storage of the source table, or the path if it's reading from cloud storage directly.	`s3://mybucket/table1`
`source_type`	string	The type of the source. The value is `TABLE`, `PATH`, `VIEW`, `MATERIALIZED_VIEW`, `METRIC_VIEW`, or `STREAMING_TABLE`.	`TABLE`
`source_column_name`	string	The name of the source column.	`date`
`target_table_full_name`	string	Three-part name to identify the target table.	`catalog.schema.table`
`target_table_catalog`	string	The catalog of the target table.	`catalog`
`target_table_schema`	string	The schema of the target table.	`schema`
`target_table_name`	string	The name of the target table.	`table`
`target_path`	string	Location in cloud storage of the target table.	`s3://mybucket/table1`
`target_type`	string	The type of the target. The value is `TABLE`, `PATH`, `VIEW`, `MATERIALIZED_VIEW`, `METRIC_VIEW`, or `STREAMING_TABLE`.	`TABLE`
`target_column_name`	string	The name of the target column.	`date`
`created_by`	string	The user who generated this lineage. This can be a Databricks username, a Databricks service principal ID, “System-User”, or `NULL` if the user information cannot be captured.	`crampton.rods@email.com`
`event_time`	timestamp	The timestamp when the lineage was generated. Timezone information is recorded at the end of the value with `+00:00` representing UTC.	`2025-04-20T19:47:21.194+00:00`
`event_date`	date	The date when the lineage was generated. This is a partitioned column.	`2025-04-20`
`record_id`	string	The unique ID of the lineage record. This value is auto-generated and cannot be joined with any tables.	`3c5c8eed-87bb-3aa6-8a86-80d00d48299e`
`event_id`	string	The unique ID for a single lineage event. Multiple rows may share the same `event_id` if they were generated by the same event. This value is auto-generated and cannot be joined with any tables.	`ca123ff3-f3f8-332b-a832-0154a6327353`
`statement_id`	string	The unique ID of a query statement that generated the lineage event. This is a foreign key to join with query history system table. This value is only set when the query is run from a SQL warehouse.	`1234526f-a6ac-475c-8601-e8637f8ee039`
`entity_metadata`	struct	Metadata about the entity responsible for the lineage event.	See Entity metadata

Reading lineage system tables

To determine whether the event was a read or a write, you can view the source type and the target type.
- Read-only: The source type is not null, but target type is null.
- Write-only: The target type is not null, but the source type is null.
- Read and write: The source type and the target type are not null.

Entity metadata reference

The entity_metadata struct has the following schema:

JSON
job_info:
  job_id: "string"
  job_run_id: "string"
dashboard_id: "string"
legacy_dashboard_id: "string"
notebook_id: "string"
sql_query_id: "string"
dlt_pipeline_info:
  dlt_pipeline_id: "string"
  dlt_update_id: "string"

Databricks records lineage events from Lakeflow Spark Declarative Pipelines, notebooks, jobs, Databricks SQL queries, and dashboards. Events from other entities aren't supported.

Multiple values can be populated depending on the event type. For example, a job running a notebook task would populate both job_info and notebook_id.

If all the values in entity_metadata are null, it means no Databricks entity was involved in the event. For example, it could be the result of a JDBC query or from a user clicking into the Sample Data tab in the Databricks UI.

Lineage system table example

As an example of how lineage is recorded in system tables, here is an example query followed by the lineage records the query creates:

SQL
CREATE OR REPLACE TABLE car_features
AS SELECT *,  in1+in2 as premium_feature_set
FROM car_features_exterior
JOIN car_features_interior
USING(id, model);

The record in system.access.table_lineage would look like:

`entity_type`	`entity_id`	`source_table_name`	`target_table_name`	`created_by`	`event_time`
`NOTEBOOK`	`27080565267`	`car_features_exterior`	`car_features`	`crampton@email.com`	`2023-01-25T16:19:58.908+0000`
`NOTEBOOK`	`27080565267`	`car_features_interior`	`car_features`	`crampton@email.com`	`2023-01-25T16:19:58.908+0000`

The record in system.access.column_lineage would look like:

`entity_type`	`entity_id`	`source_table_name`	`target_table_name`	`source_column_name`	`target_column_name`	`event_time`
`NOTEBOOK`	`27080565267`	`car_features_interior`	`car_features`	`in1`	`premium_feature_set`	`2023-01-25T16:19:58.908+0000`
`NOTEBOOK`	`27080565267`	`car_features_interior`	`car_features`	`in2`	`premium_feature_set`	`2023-01-25T16:19:58.908+0000`

note

Not all lineage columns are shown in above example. For the full schema, see the above lineage schema.

Analyzing table popularity using table lineage table

The table lineage system table can be used to estimate the popularity of tables. For example, the following snippet retrieves the top 100 most frequently queried tables in the last 7 days:

SQL
SELECT
  source_table_full_name,
  COUNT(distinct event_id) AS num_of_queries
FROM
  system.access.table_lineage t
WHERE
  t.event_date > CURRENT_DATE() - INTERVAL 7 DAYS
  AND t.source_table_full_name IS NOT NULL
GROUP BY source_table_full_name
ORDER BY num_of_queries DESC
LIMIT 100;

You can join with the query history system table to obtain additional details about queries run on SQL warehouses or serverless compute, including those served from cache (note that lineage system tables do not emit entries for cached query results). For example, the following snippet retrieves tables with queries exceeding one hour of execution time in the past 7 days:

SQL
SELECT
  t.source_table_full_name,
  COUNT(distinct event_id) AS num_of_queries_over_hour
FROM
  system.query.history h
INNER JOIN system.access.table_lineage t
  ON t.statement_id = h.cache_origin_statement_id
WHERE
  h.total_duration_ms > 3600000
  AND t.event_date > CURRENT_DATE() - INTERVAL 7 DAYS
  AND t.source_table_full_name IS NOT NULL
GROUP BY t.source_table_full_name
ORDER BY num_of_queries_over_hour DESC;

Troubleshooting external table queries

When you reference an external table using its cloud storage path, the associated lineage record only includes the path name and not the table name. As an example, the lineage record for this query would include the path name and not the table name:

SQL
SELECT * FROM delta.`s3://mybucket/table1`;

If you are trying to query lineage records for an external table referenced by path, you must filter the query using source_path or target_path instead of source_table_full_name or target_table_full_name. For example, the following query pulls all lineage records for an external table:

SQL
SELECT *
FROM system.access.table_lineage
WHERE
  source_path = "s3://mybucket/table1" OR
  target_path = "s3://mybucket/table1";

Retrieve lineage records based on external table name

If you don't want to manually retrieve the cloud storage path to find lineage, you can use the following function to get lineage data using the table name. You can also replace system.access.table_lineage with system.access.column_lineage in the function if you want to query column lineage.

Python
def getLineageForTable(table_name):
  table_path = spark.sql(f"describe detail {table_name}").select("location").head()[0]

  df = spark.read.table("system.access.table_lineage")
  return df.where(
    (df.source_table_full_name == table_name)
    | (df.target_table_full_name == table_name)
    | (df.source_path == table_path)
    | (df.target_path == table_path)
  )

Then use the following command to call the function and display lineage records for the external table:

Python
display(getLineageForTable("table_name"))

Table lineage table​

Table lineage schema​

Column lineage table​

Column lineage schema​

Reading lineage system tables​

Entity metadata reference​

Lineage system table example​

Analyzing table popularity using table lineage table​

Troubleshooting external table queries​

Retrieve lineage records based on external table name​