Monitor Delta Live Tables pipelines

This article describes how you can use built-in features in Delta Live Tables for monitoring and observability for pipelines, including data lineage, update history, and data quality reporting.

You can review most monitoring data manually through the pipeline details UI. Some tasks are easier to accomplish by querying the event log metadata. See What is the Delta Live Tables event log?.

What pipeline details are available in the UI?

The pipeline graph displays as soon as an update to a pipeline has successfully started. Arrows represent dependencies between datasets in your pipeline. By default, the pipeline details page shows the most recent update for the table, but you can select older updates from a drowndown menu.

Details displayed include the pipeline ID, source libraries, compute cost, product edition, Databricks Runtime version, and the channel configured for the pipeline.

The Run as user is the pipeline owner, and pipeline updates run with this user’s permissions. To change the run as user, click Permissions and change the pipeline owner.

How can you view dataset details?

Clicking on a dataset in the pipeline graph displays details about the dataset. Details include the dataset schema, data quality metrics, and a link back to the source code that define the dataset.

View update history

To view the history and status of pipeline updates, click the Update history dropdown menu.

To view the graph, details, and events for an update, select the update in the dropdown menu. To return to the latest update, click Show the latest update.

What is the Delta Live Tables event log?

The Delta Live Tables event log contains all information related to a pipeline, including audit logs, data quality checks, pipeline progress, and data lineage. You can use the event log to track, understand, and monitor the state of your data pipelines.

The event log for each pipeline is stored in a Delta table. You can view event log entries in the Delta Live Tables user interface, the Delta Live Tables API, or by directly querying the Delta table. This article focuses on querying the Delta table.

The example notebook includes queries that can be used to explore the Delta Live Tables event log.

Event log location

The event log is stored in /system/events under the storage location. For example, if you have configured your pipeline storage setting as /Users/username/data, the event log is stored in the /Users/username/data/system/events path in DBFS.

If you have not configured the storage setting, the default event log location is /pipelines/<pipeline-id>/system/events in DBFS. For example, if the ID of your pipeline is 91de5e48-35ed-11ec-8d3d-0242ac130003, the storage location is /pipelines/91de5e48-35ed-11ec-8d3d-0242ac130003/system/events.

Event log schema

The following table describes the event log schema. Some of these fields contain JSON data that require parsing to perform some queries, such as the details field. Databricks supports the : operator to parse JSON fields. See : (colon sign) operator.

Field

Description

id

A unique identifier for the event log record.

sequence

A JSON document containing metadata to identify and order events.

origin

A JSON document containing metadata for the origin of the event, for example, cloud provider, region, user_id, or pipeline_id.

timestamp

The time the event was recorded.

message

A human-readable message describing the event.

level

The event type, for example, INFO, WARN, ERROR, or METRICS.

error

If an error occurred, details describing the error.

details

A JSON document containing structured details of the event. This is the primary field used for analyzing events.

event_type

The event type.

maturity_level

The stability of the event schema. The possible values are:

  • STABLE: The schema is stable and will not change.

  • NULL: The schema is stable and will not change. The value may be NULL if the record was created before the maturity_level field was added (release 2022.37).

  • EVOLVING: The schema is not stable and may change.

  • DEPRECATED: The schema is deprecated and the Delta Live Tables runtime may stop producing this event at any time.

Event log queries

You can create a view to simplify querying the event log. The following example creates a temporary view called event_log_raw. This view is used in the following examples that query event log records:

CREATE OR REPLACE TEMP VIEW event_log_raw AS SELECT * FROM delta.`<event_log_path>`;

Replace <event_log_path> with the event log location.

Each instance of a pipeline run is called an update. You often want to extract information for the most recent update. Run the following commands to find the identifier for the most recent update and save it in the latest_update_id temporary view:

CREATE OR REPLACE TEMP VIEW latest_update AS SELECT origin.update_id AS id FROM event_log_raw WHERE event_type = 'create_update' ORDER BY timestamp DESC LIMIT 1;

Query lineage information from the event log

Events containing information about lineage have the event type flow_definition. The details:flow_definition object contains the output_dataset and input_datasets defining each relationship in the graph.

You can use the following query to extract this the input and output datasets to see lineage information:

SELECT
  details:flow_definition.output_dataset as output_dataset,
  details:flow_definition.input_datasets as input_dataset
FROM
  event_log_raw
WHERE
  event_type = 'flow_definition'
  AND origin.update_id = latest_update.id

output_dataset

input_datasets

1

customers

null

2

sales_orders_raw

null

3

sales_orders_cleaned

[“customers”, “sales_orders_raw”]

4

sales_order_in_la

[“sales_orders_cleaned”]

Query data quality from the event log

If you define expectations on datasets in your pipeline, the data quality metrics are stored in the details:flow_progress.data_quality.expectations object. Events containing information about data quality have the event type flow_progress. The following example queries the data quality metrics for the last pipeline update:

SELECT
  row_expectations.dataset as dataset,
  row_expectations.name as expectation,
  SUM(row_expectations.passed_records) as passing_records,
  SUM(row_expectations.failed_records) as failing_records
FROM
  (
    SELECT
      explode(
        from_json(
          details :flow_progress :data_quality :expectations,
          "array<struct<name: string, dataset: string, passed_records: int, failed_records: int>>"
        )
      ) row_expectations
    FROM
      event_log_raw
    WHERE
      event_type = 'flow_progress'
      AND origin.update_id = latest_update.id
  )
GROUP BY
  row_expectations.dataset,
  row_expectations.name

dataset

expectation

passing_records

failing_records

1

sales_orders_cleaned

valid_order_number

4083

0

Monitor data backlog by querying the event log

Delta Live Tables tracks how much data is present in the backlog in the details:flow_progress.metrics.backlog_bytes object. Events containing backlog metrics have the event type flow_progress. The following example queries backlog metrics for the last pipeline update:

SELECT
  timestamp,
  Double(details :flow_progress.metrics.backlog_bytes) as backlog
FROM
  event_log_raw
WHERE
  event_type ='flow_progress'
  AND origin.update_id = latest_update.id

Note

The backlog metrics may not be available depending on the pipeline’s data source type and Databricks Runtime version.

Monitor Enhanced Autoscaling events from the event log

The event log captures cluster resizes when Enhanced Autoscaling is enabled in your pipelines. Events containing information about Enhanced Autoscaling have the event type autoscale. The cluster resizing request information is stored in the details:autoscale object. The following example queries the Enhanced Autoscaling cluster resize requests for the last pipeline update:

SELECT
  timestamp,
  Double(
    case
      when details :autoscale.status = 'RESIZING' then details :autoscale.requested_num_executors
      else null
    end
  ) as starting_num_executors,
  Double(
    case
      when details :autoscale.status = 'SUCCEEDED' then details :autoscale.requested_num_executors
      else null
    end
  ) as succeeded_num_executors,
  Double(
    case
      when details :autoscale.status = 'PARTIALLY_SUCCEEDED' then details :autoscale.requested_num_executors
      else null
    end
  ) as partially_succeeded_num_executors,
  Double(
    case
      when details :autoscale.status = 'FAILED' then details :autoscale.requested_num_executors
      else null
    end
  ) as failed_num_executors
FROM
  event_log_raw
WHERE
  event_type = 'autoscale'
  AND origin.update_id = latest_update.id

Monitor compute resource utilization

cluster_resources events provide metrics on the number of task slots in the cluster, how much those task slots are utilized, and how many tasks are waiting to be scheduled.

When Enhanced Autoscaling is enabled, cluster_resources events also contain metrics for the autoscaling algorithm, including latest_requested_num_executors, and optimal_num_executors. The events also show the status of the algorithm as different states such as CLUSTER_AT_DESIRED_SIZE, SCALE_UP_IN_PROGRESS_WAITING_FOR_EXECUTORS, and BLOCKED_FROM_SCALING_DOWN_BY_CONFIGURATION. This information can be viewed in conjunction with the autoscaling events to provide an overall picture of Enhanced Autoscaling.

The following example queries the task queue size history for the last pipeline update:

SELECT
  timestamp,
  Double(details :cluster_resources.avg_num_queued_tasks) as queue_size
FROM
  event_log_raw
WHERE
  event_type = 'cluster_resources'
  AND origin.update_id = '${latest_update.id}'

The following example queries the utilization history for the last pipeline update:

SELECT
  timestamp,
  Double(details :cluster_resources.avg_task_slot_utilization) as utilization
FROM
  event_log_raw
WHERE
  event_type = 'cluster_resources'
  AND origin.update_id = '${latest_update.id}'

The following example queries the executor count history, accompanied by metrics available only for Enhanced Autoscaling pipelines, including the number of executors requested by the algorithm in the latest request, the optimal number of executors recommended by the algorithm based on the most recent metrics, and the autoscaling algorithm state:

SELECT
  timestamp,
  Double(details :cluster_resources.num_executors) as current_executors,
  Double(details :cluster_resources.latest_requested_num_executors) as latest_requested_num_executors,
  Double(details :cluster_resources.optimal_num_executors) as optimal_num_executors
  details :cluster_resources.state as autoscaling_state
FROM
  event_log_raw
WHERE
  event_type = 'cluster_resources'
  AND origin.update_id = '${latest_update.id}'

Audit Delta Live Tables pipelines

You can use Delta Live Tables event log records and other Databricks audit logs to get a complete picture of how data is being updated in Delta Live Tables.

Delta Live Tables uses the credentials of the pipeline owner to run updates. You can change the credentials used by updating the pipeline owner. Delta Live Tables records the user for actions on the pipeline, including pipeline creation, edits to configuration, and triggering updates.

See Configure audit logging.

Query user actions in the event log

You can use the event log to audit events, for example, user actions. Events containing information about user actions have the event type user_action.

The following query creats a temporary view you can use to query the event log. See Event log location to learn how to find the value to substitute for <event_log_path>.

CREATE OR REPLACE TEMP VIEW event_log_raw AS SELECT * FROM delta.`<event_log_path>`;

Information about the action is stored in the user_action object in the details field. Use the following query to construct an audit log of user events:

SELECT timestamp, details:user_action:action, details:user_action:user_name FROM event_log_raw WHERE event_type = 'user_action'

timestamp

action

user_name

1

2021-05-20T19:36:03.517+0000

START

user@company.com

2

2021-05-20T19:35:59.913+0000

CREATE

user@company.com

3

2021-05-27T00:35:51.971+0000

START

user@company.com

Runtime information

You can view runtime information for a pipeline update, for example, the Databricks Runtime version for the update:

SELECT details:create_update:runtime_version:dbr_version FROM event_log_raw WHERE event_type = 'create_update'

dbr_version

1

11.0

Example: query event log notebook

Querying the Delta Live Tables event log

Open notebook in new tab