Pular para o conteúdo principal

Exemplos de consultas para atividade de data SQL warehouse

Utilize estas consultas SQL de exemplo com tabelas do sistema para monitorar o desempenho, o uso e os custos SQL warehouse . Modifique as consultas para que se adaptem às necessidades da sua organização. Adicione um alerta para ser notificado sobre valores inesperados.

Requisitos

Tabelas para monitoramento SQL warehouse

Tabela do sistema

Descrição

system.compute.warehouse_events

Rastreia eventos de início, parada, escalonamento para cima e escalonamento para baixo no armazém.

system.compute.warehouses

Contém um instantâneo das configurações do armazém.

system.query.history

Registra detalhes sobre cada consulta executada no banco de dados SQL .

system.billing.usage

Contém os registros de faturamento de toda a utilização do Databricks.

Exemplo: Utilização do armazém

Utilize as seguintes consultas para entender como seu data warehouse está sendo usado, incluindo quais consultas, usuários e aplicativos geram mais atividade.

Encontre as consultas mais lentas em um data warehouse.

SQL
SELECT
statement_id,
executed_by,
statement_type,
execution_status,
total_duration_ms,
execution_duration_ms,
compilation_duration_ms,
waiting_at_capacity_duration_ms,
read_rows,
produced_rows,
start_time,
statement_text
FROM
system.query.history
WHERE
compute.warehouse_id = '<warehouse-id>'
AND start_time >= NOW() - INTERVAL 1 DAY
ORDER BY
total_duration_ms DESC
LIMIT 50

Analise as tendências de desempenho das consultas ao longo do tempo.

SQL
SELECT
DATE(start_time) AS query_date,
COUNT(*) AS total_queries,
COUNT(CASE WHEN execution_status = 'FINISHED' THEN 1 END) AS successful_queries,
COUNT(CASE WHEN execution_status = 'FAILED' THEN 1 END) AS failed_queries,
ROUND(AVG(total_duration_ms), 0) AS avg_duration_ms,
ROUND(PERCENTILE(total_duration_ms, 0.5), 0) AS p50_duration_ms,
ROUND(PERCENTILE(total_duration_ms, 0.95), 0) AS p95_duration_ms,
ROUND(AVG(waiting_at_capacity_duration_ms), 0) AS avg_queue_wait_ms
FROM
system.query.history
WHERE
compute.warehouse_id = '<warehouse-id>'
AND start_time >= NOW() - INTERVAL 30 DAY
GROUP BY
DATE(start_time)
ORDER BY
query_date DESC

Encontre os usuários mais ativos em um armazém.

SQL
SELECT
executed_by,
COUNT(*) AS query_count,
ROUND(SUM(total_duration_ms) / 1000 / 60, 2) AS total_duration_minutes,
ROUND(AVG(total_duration_ms), 0) AS avg_duration_ms
FROM
system.query.history
WHERE
compute.warehouse_id = '<warehouse-id>'
AND start_time >= NOW() - INTERVAL 7 DAY
GROUP BY
executed_by
ORDER BY
query_count DESC

Encontre os melhores aplicativos para clientes

SQL
SELECT
client_application,
CASE
WHEN query_source.job_info.job_id IS NOT NULL THEN 'Job'
WHEN query_source.dashboard_id IS NOT NULL THEN 'Dashboard'
WHEN query_source.legacy_dashboard_id IS NOT NULL THEN 'Legacy Dashboard'
WHEN query_source.alert_id IS NOT NULL THEN 'Alert'
WHEN query_source.notebook_id IS NOT NULL THEN 'Notebook'
WHEN query_source.genie_space_id IS NOT NULL THEN 'Genie Space'
WHEN query_source.sql_query_id IS NOT NULL THEN 'SQL Editor'
ELSE 'Other'
END AS source_type,
COUNT(*) AS query_count,
ROUND(AVG(total_duration_ms), 0) AS avg_duration_ms
FROM
system.query.history
WHERE
compute.warehouse_id = '<warehouse-id>'
AND start_time >= NOW() - INTERVAL 7 DAY
GROUP BY
client_application,
source_type
ORDER BY
query_count DESC

Monitorar consultas com falha

SQL
SELECT
DATE(start_time) AS failure_date,
execution_status,
error_message,
COUNT(*) AS failure_count,
COLLECT_SET(executed_by) AS affected_users
FROM
system.query.history
WHERE
compute.warehouse_id = '<warehouse-id>'
AND execution_status IN ('FAILED', 'CANCELED')
AND start_time >= NOW() - INTERVAL 7 DAY
GROUP BY
DATE(start_time),
execution_status,
error_message
ORDER BY
failure_date DESC,
failure_count DESC

Exemplo: Dimensionamento de armazém

Utilize as seguintes consultas para determinar se o tamanho do seu armazém está correto. Consultas em espera na capacidade máxima sugerem que você precisa aumentar max_clusters. Consultas com uso excessivo de disco sugerem que você precisa aumentar o tamanho do data warehouse.

Identificar consultas em espera que atingiram a capacidade máxima

Consultas com valores altos waiting_at_capacity_duration_ms estão passando tempo em fila em vez de serem executadas. Considere aumentar a configuração do armazém max_clusters para permitir que o armazém seja escalado.

SQL
SELECT
statement_id,
executed_by,
total_duration_ms,
waiting_at_capacity_duration_ms,
execution_duration_ms,
start_time,
statement_text
FROM
system.query.history
WHERE
compute.warehouse_id = '<warehouse-id>'
AND start_time >= NOW() - INTERVAL 7 DAY
AND waiting_at_capacity_duration_ms > 0
ORDER BY
waiting_at_capacity_duration_ms DESC
LIMIT 50

Identificar consultas com uso excessivo de disco

O estouro de disco ocorre quando uma consulta requer mais memória do que a disponível. Considere aumentar o tamanho do data warehouse para fornecer mais memória às consultas. O excesso de dados coletados geralmente significa que as consultas precisam ser otimizadas ou que o tamanho do data warehouse é muito pequeno para a carga de trabalho.

SQL
SELECT
statement_id,
executed_by,
spilled_local_bytes / (1024 * 1024) AS spilled_mb,
read_bytes / (1024 * 1024) AS read_mb,
total_duration_ms,
start_time,
statement_text
FROM
system.query.history
WHERE
compute.warehouse_id = '<warehouse-id>'
AND start_time >= NOW() - INTERVAL 7 DAY
AND spilled_local_bytes > 0
ORDER BY
spilled_local_bytes DESC
LIMIT 50

Exemplo: Custos de armazém

Utilize as seguintes consultas para compreender e acompanhar os custos associados ao seu data warehouse SQL .

Monitorar o custo do armazém por dia

SQL
SELECT
usage_date,
sku_name,
ROUND(SUM(usage_quantity), 2) AS total_dbus,
ROUND(SUM(usage_quantity * list_prices.pricing.default), 2) AS estimated_list_cost
FROM
system.billing.usage
LEFT JOIN system.billing.list_prices ON usage.sku_name = list_prices.sku_name
AND price_end_time IS NULL
WHERE
usage_metadata.warehouse_id = '<warehouse-id>'
AND usage_date >= NOW() - INTERVAL 30 DAY
GROUP BY
usage_date,
sku_name
ORDER BY
usage_date DESC

Correlacione eventos do armazém com o volume de consultas.

Esta consulta ajuda você a entender a relação entre eventos de dimensionamento de armazém e a atividade de consulta para identificar oportunidades de otimização de custos.

SQL
WITH hourly_events AS (
SELECT
DATE_TRUNC('hour', event_time) AS event_hour,
warehouse_id,
MAX(cluster_count) AS max_clusters,
COLLECT_SET(event_type) AS event_types
FROM
system.compute.warehouse_events
WHERE
warehouse_id = '<warehouse-id>'
AND event_time >= NOW() - INTERVAL 7 DAY
GROUP BY
DATE_TRUNC('hour', event_time),
warehouse_id
),
hourly_queries AS (
SELECT
DATE_TRUNC('hour', start_time) AS query_hour,
COUNT(*) AS query_count,
ROUND(AVG(total_duration_ms), 0) AS avg_duration_ms,
ROUND(AVG(waiting_at_capacity_duration_ms), 0) AS avg_queue_wait_ms
FROM
system.query.history
WHERE
compute.warehouse_id = '<warehouse-id>'
AND start_time >= NOW() - INTERVAL 7 DAY
GROUP BY
DATE_TRUNC('hour', start_time)
)
SELECT
COALESCE(e.event_hour, q.query_hour) AS hour,
q.query_count,
q.avg_duration_ms,
q.avg_queue_wait_ms,
e.max_clusters,
e.event_types
FROM
hourly_events e
FULL OUTER JOIN hourly_queries q ON e.event_hour = q.query_hour
ORDER BY
hour DESC