Monitor managed ingestion pipeline cost

SaaS connectors Database connectors

Learn how to use the system.billing.usage table to monitor managed ingestion pipeline costs and track usage. The queries on this page help you understand spending patterns, identify high-cost pipelines, and attribute costs to specific data sources.

How to read Lakeflow Connect usage data

Users with permissions to access system table data can view and query their account billing logs for managed ingestion at system.billing.usage. Every billing record includes columns that attribute the usage amount to specific resources, identities, and products involved.

Managed ingestion pipeline usage in Lakeflow Connect is tracked using the following billing parameters:

Parameter	Description
`billing_origin_product`	Set to `LAKEFLOW_CONNECT` for all managed connector usage.
`usage_type`	Set to `COMPUTE_TIME` for pipeline processing.
`usage_unit`	Recorded in `MILLISECOND` (raw compute time) and `DBU` (pricing unit).

The usage_metadata column includes a struct with information about the pipeline resources:

Field	Description
`dlt_pipeline_id`	The unique identifier for the ingestion pipeline.
`uc_table_catalog` `uc_table_schema` `uc_table_name`	The destination table information.

For a complete reference of the usage table, see Billable usage system table reference.

Pipeline maintenance charges

Managed ingestion pipelines incur maintenance charges similar to Lakeflow Spark Declarative Pipelines. These maintenance costs cover pipeline infrastructure, metadata management, and change tracking between pipeline runs. You can distinguish between data processing costs and maintenance costs by analyzing usage patterns over time.

Operationalize billing data

Databricks recommends using AI/BI dashboards to create cost monitoring dashboards using system table billing data. You can create a new dashboard, or account admins can import a pre-built, customizable cost monitoring dashboard. See Usage dashboards.

You can also add alerts to your queries to help you stay informed about usage data. See Create an alert.

Sample queries

The following queries provide examples of how you can use the system.billing.usage table data to gain insights into your managed ingestion pipeline usage.

How much did my pipelines consume this month?

This query returns the total DBU consumption for all managed ingestion pipelines in the current month. Use this to track overall spending on managed connectors.

SQL
SELECT
  usage_date,
  SUM(usage_quantity) AS total_dbus
FROM system.billing.usage
WHERE
  billing_origin_product = 'LAKEFLOW_CONNECT'
  AND MONTH(usage_date) = MONTH(NOW())
  AND YEAR(usage_date) = YEAR(NOW())
GROUP BY usage_date
ORDER BY usage_date DESC

Which pipelines consumed the most DBUs?

This query identifies your most expensive managed ingestion pipelines by joining usage data with pipeline metadata. Use this to prioritize optimization efforts.

SQL
WITH ranked_pipelines AS (
  SELECT
    u.usage_metadata.dlt_pipeline_id AS pipeline_id,
    p.name AS pipeline_name,
    SUM(u.usage_quantity) AS total_dbus,
    COUNT(DISTINCT u.usage_date) AS days_active
  FROM system.billing.usage u
  JOIN system.lakeflow.pipelines p
    ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
  WHERE
    u.billing_origin_product = 'LAKEFLOW_CONNECT'
    AND u.usage_date >= DATE_SUB(NOW(), 30)
  GROUP BY pipeline_id, pipeline_name
)
SELECT
  pipeline_name,
  pipeline_id,
  total_dbus,
  days_active,
  ROUND(total_dbus / days_active, 2) AS avg_daily_dbus
FROM ranked_pipelines
ORDER BY total_dbus DESC
LIMIT 20

What is the cost trend for a specific pipeline?

This query shows the daily usage pattern for a specific pipeline. Replace :dlt_pipeline_id with your pipeline ID, which you can find on the Pipeline Details tab in the Lakeflow Spark Declarative Pipelines UI.

SQL
SELECT
  usage_date,
  SUM(usage_quantity) AS daily_dbus,
  COUNT(*) AS usage_events
FROM system.billing.usage
WHERE
  usage_metadata.dlt_pipeline_id = :dlt_pipeline_id
  AND billing_origin_product = 'LAKEFLOW_CONNECT'
  AND usage_date >= DATE_SUB(NOW(), 90)
GROUP BY usage_date
ORDER BY usage_date ASC

Which source types are most expensive?

This query breaks down managed ingestion pipeline costs by connector. Use this to understand which data sources drive the most usage.

SQL
SELECT
  p.connector_type AS source_type,
  COUNT(DISTINCT p.pipeline_id) AS pipeline_count,
  SUM(u.usage_quantity) AS total_dbus,
  ROUND(SUM(u.usage_quantity) / COUNT(DISTINCT p.pipeline_id), 2) AS avg_dbus_per_pipeline
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
  ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
  u.billing_origin_product = 'LAKEFLOW_CONNECT'
  AND u.usage_date >= DATE_SUB(NOW(), 30)
GROUP BY source_type
ORDER BY total_dbus DESC

How much usage can be attributed to pipelines with a specific budget tag?

This query shows usage for managed ingestion pipelines tagged with a specific budget policy. Budget policy tags allow you to track costs across multiple pipelines for chargeback or cost allocation purposes. Replace :key and :value with your custom tag key and value.

Budget policy tagging for managed ingestion pipelines is API-only. For more information about budget policies, see Attribute usage with serverless budget policies.

SQL
SELECT
  custom_tags[:key] AS tag_value,
  usage_date,
  SUM(usage_quantity) AS daily_dbus
FROM system.billing.usage
WHERE
  billing_origin_product = 'LAKEFLOW_CONNECT'
  AND custom_tags[:key] = :value
  AND usage_date >= DATE_SUB(NOW(), 30)
GROUP BY tag_value, usage_date
ORDER BY usage_date DESC

What is the maintenance cost vs. data processing cost for my pipelines?

The following query separates pipeline maintenance charges from data processing costs. Maintenance charges occur even when pipelines aren't actively ingesting data, covering infrastructure and change tracking.

The query uses a threshold of 0.1 DBU per hour to distinguish between maintenance and processing costs. Adjust this threshold based on your pipeline characteristics.

SQL
WITH hourly_usage AS (
  SELECT
    usage_metadata.dlt_pipeline_id AS pipeline_id,
    DATE_TRUNC('hour', usage_start_time) AS usage_hour,
    SUM(usage_quantity) AS hourly_dbus
  FROM system.billing.usage
  WHERE
    billing_origin_product = 'LAKEFLOW_CONNECT'
    AND usage_date >= DATE_SUB(NOW(), 30)
  GROUP BY pipeline_id, usage_hour
)
SELECT
  pipeline_id,
  SUM(CASE WHEN hourly_dbus > 0.1 THEN hourly_dbus ELSE 0 END) AS processing_dbus,
  SUM(CASE WHEN hourly_dbus <= 0.1 THEN hourly_dbus ELSE 0 END) AS maintenance_dbus,
  SUM(hourly_dbus) AS total_dbus
FROM hourly_usage
GROUP BY pipeline_id
ORDER BY total_dbus DESC

Show me pipelines where costs are growing month-over-month

This query calculates the growth rate for managed ingestion pipeline usage between two months. Use this to identify pipelines with increasing costs that might need optimization or data volume review.

SQL
SELECT
  after.pipeline_id,
  after.pipeline_name,
  before_dbus,
  after_dbus,
  ROUND(((after_dbus - before_dbus) / NULLIF(before_dbus, 0) * 100), 2) AS growth_rate
FROM
  (
    SELECT
      u.usage_metadata.dlt_pipeline_id AS pipeline_id,
      p.name AS pipeline_name,
      SUM(u.usage_quantity) AS before_dbus
    FROM system.billing.usage u
    JOIN system.lakeflow.pipelines p
      ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
    WHERE
      u.billing_origin_product = 'LAKEFLOW_CONNECT'
      AND u.usage_date BETWEEN DATE_SUB(NOW(), 60) AND DATE_SUB(NOW(), 30)
    GROUP BY pipeline_id, pipeline_name
  ) AS before
JOIN
  (
    SELECT
      u.usage_metadata.dlt_pipeline_id AS pipeline_id,
      p.name AS pipeline_name,
      SUM(u.usage_quantity) AS after_dbus
    FROM system.billing.usage u
    JOIN system.lakeflow.pipelines p
      ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
    WHERE
      u.billing_origin_product = 'LAKEFLOW_CONNECT'
      AND u.usage_date >= DATE_SUB(NOW(), 30)
    GROUP BY pipeline_id, pipeline_name
  ) AS after
  ON before.pipeline_id = after.pipeline_id
WHERE after_dbus > before_dbus
ORDER BY growth_rate DESC

Calculate the dollar cost for usage in the previous month

This query joins usage data with list prices to calculate the approximate dollar cost for managed ingestion pipelines. The actual cost might differ slightly based on your account pricing tier and entitlements.

SQL
SELECT
  DATE_TRUNC('day', u.usage_date) AS usage_day,
  SUM(u.usage_quantity * lp.pricing.effective_list.default) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices lp
  ON lp.sku_name = u.sku_name
WHERE
  u.billing_origin_product = 'LAKEFLOW_CONNECT'
  AND u.usage_end_time >= lp.price_start_time
  AND (lp.price_end_time IS NULL OR u.usage_end_time < lp.price_end_time)
  AND u.usage_date >= ADD_MONTHS(DATE_TRUNC('month', CURRENT_DATE), -1)
  AND u.usage_date < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY usage_day
ORDER BY usage_day ASC

Which destination catalogs and schemas have the highest ingestion costs?

This query aggregates managed ingestion pipeline costs by destination catalog and schema. Use this to understand where ingested data is being stored and which destinations have the highest associated ingestion costs.

SQL
SELECT
  usage_metadata.uc_table_catalog AS catalog_name,
  usage_metadata.uc_table_schema AS schema_name,
  COUNT(DISTINCT usage_metadata.dlt_pipeline_id) AS pipeline_count,
  SUM(usage_quantity) AS total_dbus
FROM system.billing.usage
WHERE
  billing_origin_product = 'LAKEFLOW_CONNECT'
  AND usage_metadata.uc_table_catalog IS NOT NULL
  AND usage_date >= DATE_SUB(NOW(), 30)
GROUP BY catalog_name, schema_name
ORDER BY total_dbus DESC

Additional resources

Apply tags to managed ingestion pipelines

How to read Lakeflow Connect usage data​

Pipeline maintenance charges​

Operationalize billing data​

Sample queries​

How much did my pipelines consume this month?​

Which pipelines consumed the most DBUs?​

What is the cost trend for a specific pipeline?​

Which source types are most expensive?​

How much usage can be attributed to pipelines with a specific budget tag?​

What is the maintenance cost vs. data processing cost for my pipelines?​

Show me pipelines where costs are growing month-over-month​

Calculate the dollar cost for usage in the previous month​

Which destination catalogs and schemas have the highest ingestion costs?​

Additional resources​