Skip to main content

Monitor managed ingestion pipeline cost

check marked yes SaaS connectors check marked yes Database connectors

Learn how to use the system.billing.usage table to monitor managed ingestion pipeline costs and track usage. The queries on this page help you understand spending patterns, identify high-cost pipelines, and attribute costs to specific data sources.

How to read Lakeflow Connect usage data

Users with permissions to access system table data can view and query their account billing logs for managed ingestion at system.billing.usage. Every billing record includes columns that attribute the usage amount to specific resources, identities, and products involved.

Managed ingestion pipeline usage in Lakeflow Connect is tracked using the following billing parameters:

Parameter

Description

billing_origin_product

Set to LAKEFLOW_CONNECT for all managed connector usage.

usage_type

Set to COMPUTE_TIME for pipeline processing.

usage_unit

Recorded in MILLISECOND (raw compute time) and DBU (pricing unit).

The usage_metadata column includes a struct with information about the pipeline resources:

Field

Description

dlt_pipeline_id

The unique identifier for the ingestion pipeline.

  • uc_table_catalog
  • uc_table_schema
  • uc_table_name

The destination table information.

For a complete reference of the usage table, see Billable usage system table reference.

Pipeline maintenance charges

Managed ingestion pipelines incur maintenance charges similar to Lakeflow Spark Declarative Pipelines. These maintenance costs cover pipeline infrastructure, metadata management, and change tracking between pipeline runs. You can distinguish between data processing costs and maintenance costs by analyzing usage patterns over time.

Operationalize billing data

Databricks recommends using AI/BI dashboards to create cost monitoring dashboards using system table billing data. You can create a new dashboard, or account admins can import a pre-built, customizable cost monitoring dashboard. See Usage dashboards.

You can also add alerts to your queries to help you stay informed about usage data. See Create an alert.

Sample queries

The following queries provide examples of how you can use the system.billing.usage table data to gain insights into your managed ingestion pipeline usage.

How much did my pipelines consume this month?

This query returns the total DBU consumption for all managed ingestion pipelines in the current month. Use this to track overall spending on managed connectors.

SQL
SELECT
usage_date,
SUM(usage_quantity) AS total_dbus
FROM system.billing.usage
WHERE
billing_origin_product = 'LAKEFLOW_CONNECT'
AND MONTH(usage_date) = MONTH(NOW())
AND YEAR(usage_date) = YEAR(NOW())
GROUP BY usage_date
ORDER BY usage_date DESC

Which pipelines consumed the most DBUs?

This query identifies your most expensive managed ingestion pipelines by joining usage data with pipeline metadata. Use this to prioritize optimization efforts.

SQL
WITH ranked_pipelines AS (
SELECT
u.usage_metadata.dlt_pipeline_id AS pipeline_id,
p.name AS pipeline_name,
SUM(u.usage_quantity) AS total_dbus,
COUNT(DISTINCT u.usage_date) AS days_active
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_date >= DATE_SUB(NOW(), 30)
GROUP BY pipeline_id, pipeline_name
)
SELECT
pipeline_name,
pipeline_id,
total_dbus,
days_active,
ROUND(total_dbus / days_active, 2) AS avg_daily_dbus
FROM ranked_pipelines
ORDER BY total_dbus DESC
LIMIT 20

What is the cost trend for a specific pipeline?

This query shows the daily usage pattern for a specific pipeline. Replace :dlt_pipeline_id with your pipeline ID, which you can find on the Pipeline Details tab in the Lakeflow Spark Declarative Pipelines UI.

SQL
SELECT
usage_date,
SUM(usage_quantity) AS daily_dbus,
COUNT(*) AS usage_events
FROM system.billing.usage
WHERE
usage_metadata.dlt_pipeline_id = :dlt_pipeline_id
AND billing_origin_product = 'LAKEFLOW_CONNECT'
AND usage_date >= DATE_SUB(NOW(), 90)
GROUP BY usage_date
ORDER BY usage_date ASC

Which source types are most expensive?

This query breaks down managed ingestion pipeline costs by connector. Use this to understand which data sources drive the most usage.

SQL
SELECT
p.connector_type AS source_type,
COUNT(DISTINCT p.pipeline_id) AS pipeline_count,
SUM(u.usage_quantity) AS total_dbus,
ROUND(SUM(u.usage_quantity) / COUNT(DISTINCT p.pipeline_id), 2) AS avg_dbus_per_pipeline
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_date >= DATE_SUB(NOW(), 30)
GROUP BY source_type
ORDER BY total_dbus DESC

How much usage can be attributed to pipelines with a specific budget tag?

This query shows usage for managed ingestion pipelines tagged with a specific budget policy. Budget policy tags allow you to track costs across multiple pipelines for chargeback or cost allocation purposes. Replace :key and :value with your custom tag key and value.

Budget policy tagging for managed ingestion pipelines is API-only. For more information about budget policies, see Attribute usage with serverless budget policies.

SQL
SELECT
custom_tags[:key] AS tag_value,
usage_date,
SUM(usage_quantity) AS daily_dbus
FROM system.billing.usage
WHERE
billing_origin_product = 'LAKEFLOW_CONNECT'
AND custom_tags[:key] = :value
AND usage_date >= DATE_SUB(NOW(), 30)
GROUP BY tag_value, usage_date
ORDER BY usage_date DESC

What is the maintenance cost vs. data processing cost for my pipelines?

The following query separates pipeline maintenance charges from data processing costs. Maintenance charges occur even when pipelines aren't actively ingesting data, covering infrastructure and change tracking.

The query uses a threshold of 0.1 DBU per hour to distinguish between maintenance and processing costs. Adjust this threshold based on your pipeline characteristics.

SQL
WITH hourly_usage AS (
SELECT
usage_metadata.dlt_pipeline_id AS pipeline_id,
DATE_TRUNC('hour', usage_start_time) AS usage_hour,
SUM(usage_quantity) AS hourly_dbus
FROM system.billing.usage
WHERE
billing_origin_product = 'LAKEFLOW_CONNECT'
AND usage_date >= DATE_SUB(NOW(), 30)
GROUP BY pipeline_id, usage_hour
)
SELECT
pipeline_id,
SUM(CASE WHEN hourly_dbus > 0.1 THEN hourly_dbus ELSE 0 END) AS processing_dbus,
SUM(CASE WHEN hourly_dbus <= 0.1 THEN hourly_dbus ELSE 0 END) AS maintenance_dbus,
SUM(hourly_dbus) AS total_dbus
FROM hourly_usage
GROUP BY pipeline_id
ORDER BY total_dbus DESC

Show me pipelines where costs are growing month-over-month

This query calculates the growth rate for managed ingestion pipeline usage between two months. Use this to identify pipelines with increasing costs that might need optimization or data volume review.

SQL
SELECT
after.pipeline_id,
after.pipeline_name,
before_dbus,
after_dbus,
ROUND(((after_dbus - before_dbus) / NULLIF(before_dbus, 0) * 100), 2) AS growth_rate
FROM
(
SELECT
u.usage_metadata.dlt_pipeline_id AS pipeline_id,
p.name AS pipeline_name,
SUM(u.usage_quantity) AS before_dbus
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_date BETWEEN DATE_SUB(NOW(), 60) AND DATE_SUB(NOW(), 30)
GROUP BY pipeline_id, pipeline_name
) AS before
JOIN
(
SELECT
u.usage_metadata.dlt_pipeline_id AS pipeline_id,
p.name AS pipeline_name,
SUM(u.usage_quantity) AS after_dbus
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_date >= DATE_SUB(NOW(), 30)
GROUP BY pipeline_id, pipeline_name
) AS after
ON before.pipeline_id = after.pipeline_id
WHERE after_dbus > before_dbus
ORDER BY growth_rate DESC

Calculate the dollar cost for usage in the previous month

This query joins usage data with list prices to calculate the approximate dollar cost for managed ingestion pipelines. The actual cost might differ slightly based on your account pricing tier and entitlements.

SQL
SELECT
DATE_TRUNC('day', u.usage_date) AS usage_day,
SUM(u.usage_quantity * lp.pricing.effective_list.default) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices lp
ON lp.sku_name = u.sku_name
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_end_time >= lp.price_start_time
AND (lp.price_end_time IS NULL OR u.usage_end_time < lp.price_end_time)
AND u.usage_date >= ADD_MONTHS(DATE_TRUNC('month', CURRENT_DATE), -1)
AND u.usage_date < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY usage_day
ORDER BY usage_day ASC

Which destination catalogs and schemas have the highest ingestion costs?

This query aggregates managed ingestion pipeline costs by destination catalog and schema. Use this to understand where ingested data is being stored and which destinations have the highest associated ingestion costs.

SQL
SELECT
usage_metadata.uc_table_catalog AS catalog_name,
usage_metadata.uc_table_schema AS schema_name,
COUNT(DISTINCT usage_metadata.dlt_pipeline_id) AS pipeline_count,
SUM(usage_quantity) AS total_dbus
FROM system.billing.usage
WHERE
billing_origin_product = 'LAKEFLOW_CONNECT'
AND usage_metadata.uc_table_catalog IS NOT NULL
AND usage_date >= DATE_SUB(NOW(), 30)
GROUP BY catalog_name, schema_name
ORDER BY total_dbus DESC

Additional resources