Monitor managed ingestion pipeline cost
SaaS connectors
Database connectors
Learn how to use the system.billing.usage table to monitor managed ingestion pipeline costs and track usage. The queries on this page help you understand spending patterns, identify high-cost pipelines, and attribute costs to specific data sources.
How to read Lakeflow Connect usage data
Users with permissions to access system table data can view and query their account billing logs for managed ingestion at system.billing.usage. Every billing record includes columns that attribute the usage amount to specific resources, identities, and products involved.
Managed ingestion pipeline usage in Lakeflow Connect is tracked using the following billing parameters:
Parameter | Description |
|---|---|
| Set to |
| Set to |
| Recorded in |
The usage_metadata column includes a struct with information about the pipeline resources:
Field | Description |
|---|---|
| The unique identifier for the ingestion pipeline. |
| The destination table information. |
For a complete reference of the usage table, see Billable usage system table reference.
Pipeline maintenance charges
Managed ingestion pipelines incur maintenance charges similar to Lakeflow Spark Declarative Pipelines. These maintenance costs cover pipeline infrastructure, metadata management, and change tracking between pipeline runs. You can distinguish between data processing costs and maintenance costs by analyzing usage patterns over time.
Operationalize billing data
Databricks recommends using AI/BI dashboards to create cost monitoring dashboards using system table billing data. You can create a new dashboard, or account admins can import a pre-built, customizable cost monitoring dashboard. See Usage dashboards.
You can also add alerts to your queries to help you stay informed about usage data. See Create an alert.
Sample queries
The following queries provide examples of how you can use the system.billing.usage table data to gain insights into your managed ingestion pipeline usage.
How much did my pipelines consume this month?
This query returns the total DBU consumption for all managed ingestion pipelines in the current month. Use this to track overall spending on managed connectors.
SELECT
usage_date,
SUM(usage_quantity) AS total_dbus
FROM system.billing.usage
WHERE
billing_origin_product = 'LAKEFLOW_CONNECT'
AND MONTH(usage_date) = MONTH(NOW())
AND YEAR(usage_date) = YEAR(NOW())
GROUP BY usage_date
ORDER BY usage_date DESC
Which pipelines consumed the most DBUs?
This query identifies your most expensive managed ingestion pipelines by joining usage data with pipeline metadata. Use this to prioritize optimization efforts.
WITH ranked_pipelines AS (
SELECT
u.usage_metadata.dlt_pipeline_id AS pipeline_id,
p.name AS pipeline_name,
SUM(u.usage_quantity) AS total_dbus,
COUNT(DISTINCT u.usage_date) AS days_active
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_date >= DATE_SUB(NOW(), 30)
GROUP BY pipeline_id, pipeline_name
)
SELECT
pipeline_name,
pipeline_id,
total_dbus,
days_active,
ROUND(total_dbus / days_active, 2) AS avg_daily_dbus
FROM ranked_pipelines
ORDER BY total_dbus DESC
LIMIT 20
What is the cost trend for a specific pipeline?
This query shows the daily usage pattern for a specific pipeline. Replace :dlt_pipeline_id with your pipeline ID, which you can find on the Pipeline Details tab in the Lakeflow Spark Declarative Pipelines UI.
SELECT
usage_date,
SUM(usage_quantity) AS daily_dbus,
COUNT(*) AS usage_events
FROM system.billing.usage
WHERE
usage_metadata.dlt_pipeline_id = :dlt_pipeline_id
AND billing_origin_product = 'LAKEFLOW_CONNECT'
AND usage_date >= DATE_SUB(NOW(), 90)
GROUP BY usage_date
ORDER BY usage_date ASC
Which source types are most expensive?
This query breaks down managed ingestion pipeline costs by connector. Use this to understand which data sources drive the most usage.
SELECT
p.connector_type AS source_type,
COUNT(DISTINCT p.pipeline_id) AS pipeline_count,
SUM(u.usage_quantity) AS total_dbus,
ROUND(SUM(u.usage_quantity) / COUNT(DISTINCT p.pipeline_id), 2) AS avg_dbus_per_pipeline
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_date >= DATE_SUB(NOW(), 30)
GROUP BY source_type
ORDER BY total_dbus DESC
How much usage can be attributed to pipelines with a specific budget tag?
This query shows usage for managed ingestion pipelines tagged with a specific budget policy. Budget policy tags allow you to track costs across multiple pipelines for chargeback or cost allocation purposes. Replace :key and :value with your custom tag key and value.
Budget policy tagging for managed ingestion pipelines is API-only. For more information about budget policies, see Attribute usage with serverless budget policies.
SELECT
custom_tags[:key] AS tag_value,
usage_date,
SUM(usage_quantity) AS daily_dbus
FROM system.billing.usage
WHERE
billing_origin_product = 'LAKEFLOW_CONNECT'
AND custom_tags[:key] = :value
AND usage_date >= DATE_SUB(NOW(), 30)
GROUP BY tag_value, usage_date
ORDER BY usage_date DESC
What is the maintenance cost vs. data processing cost for my pipelines?
The following query separates pipeline maintenance charges from data processing costs. Maintenance charges occur even when pipelines aren't actively ingesting data, covering infrastructure and change tracking.
The query uses a threshold of 0.1 DBU per hour to distinguish between maintenance and processing costs. Adjust this threshold based on your pipeline characteristics.
WITH hourly_usage AS (
SELECT
usage_metadata.dlt_pipeline_id AS pipeline_id,
DATE_TRUNC('hour', usage_start_time) AS usage_hour,
SUM(usage_quantity) AS hourly_dbus
FROM system.billing.usage
WHERE
billing_origin_product = 'LAKEFLOW_CONNECT'
AND usage_date >= DATE_SUB(NOW(), 30)
GROUP BY pipeline_id, usage_hour
)
SELECT
pipeline_id,
SUM(CASE WHEN hourly_dbus > 0.1 THEN hourly_dbus ELSE 0 END) AS processing_dbus,
SUM(CASE WHEN hourly_dbus <= 0.1 THEN hourly_dbus ELSE 0 END) AS maintenance_dbus,
SUM(hourly_dbus) AS total_dbus
FROM hourly_usage
GROUP BY pipeline_id
ORDER BY total_dbus DESC
Show me pipelines where costs are growing month-over-month
This query calculates the growth rate for managed ingestion pipeline usage between two months. Use this to identify pipelines with increasing costs that might need optimization or data volume review.
SELECT
after.pipeline_id,
after.pipeline_name,
before_dbus,
after_dbus,
ROUND(((after_dbus - before_dbus) / NULLIF(before_dbus, 0) * 100), 2) AS growth_rate
FROM
(
SELECT
u.usage_metadata.dlt_pipeline_id AS pipeline_id,
p.name AS pipeline_name,
SUM(u.usage_quantity) AS before_dbus
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_date BETWEEN DATE_SUB(NOW(), 60) AND DATE_SUB(NOW(), 30)
GROUP BY pipeline_id, pipeline_name
) AS before
JOIN
(
SELECT
u.usage_metadata.dlt_pipeline_id AS pipeline_id,
p.name AS pipeline_name,
SUM(u.usage_quantity) AS after_dbus
FROM system.billing.usage u
JOIN system.lakeflow.pipelines p
ON u.usage_metadata.dlt_pipeline_id = p.pipeline_id
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_date >= DATE_SUB(NOW(), 30)
GROUP BY pipeline_id, pipeline_name
) AS after
ON before.pipeline_id = after.pipeline_id
WHERE after_dbus > before_dbus
ORDER BY growth_rate DESC
Calculate the dollar cost for usage in the previous month
This query joins usage data with list prices to calculate the approximate dollar cost for managed ingestion pipelines. The actual cost might differ slightly based on your account pricing tier and entitlements.
SELECT
DATE_TRUNC('day', u.usage_date) AS usage_day,
SUM(u.usage_quantity * lp.pricing.effective_list.default) AS estimated_cost
FROM system.billing.usage u
JOIN system.billing.list_prices lp
ON lp.sku_name = u.sku_name
WHERE
u.billing_origin_product = 'LAKEFLOW_CONNECT'
AND u.usage_end_time >= lp.price_start_time
AND (lp.price_end_time IS NULL OR u.usage_end_time < lp.price_end_time)
AND u.usage_date >= ADD_MONTHS(DATE_TRUNC('month', CURRENT_DATE), -1)
AND u.usage_date < DATE_TRUNC('month', CURRENT_DATE)
GROUP BY usage_day
ORDER BY usage_day ASC
Which destination catalogs and schemas have the highest ingestion costs?
This query aggregates managed ingestion pipeline costs by destination catalog and schema. Use this to understand where ingested data is being stored and which destinations have the highest associated ingestion costs.
SELECT
usage_metadata.uc_table_catalog AS catalog_name,
usage_metadata.uc_table_schema AS schema_name,
COUNT(DISTINCT usage_metadata.dlt_pipeline_id) AS pipeline_count,
SUM(usage_quantity) AS total_dbus
FROM system.billing.usage
WHERE
billing_origin_product = 'LAKEFLOW_CONNECT'
AND usage_metadata.uc_table_catalog IS NOT NULL
AND usage_date >= DATE_SUB(NOW(), 30)
GROUP BY catalog_name, schema_name
ORDER BY total_dbus DESC