Zerobus Ingest system table reference
This article is a reference for the zerobus system tables, which track Zerobus Ingest activity in your workspace. These tables include your account records from all workspaces in your same region. To see records from another region, you must view the tables from a workspace deployed in that region.
Requirements
- To access these system tables, users must either:
- Be both a metastore admin and a workspace admin, or
- Have
USEandSELECTpermissions on the system schemas. See Grant access to system tables.
Available Zerobus Ingest tables
All Zerobus Ingest related system tables live in the system.lakeflow schema. This includes the following tables:
Table | Description | Supports streaming | Free retention period | Includes global or regional data |
|---|---|---|---|---|
Zerobus_stream (Beta) | Tracks all streams created using Zerobus Ingest | Yes | 365 days | Regional |
Zerobus_ingest (Beta) | Tracks all data ingested into Delta tables using Zerobus Ingest | Yes | 365 days | Regional |
Zerobus stream table schema reference
The zerobus_stream table stores all data related to stream events incurred by Zerobus Ingest usage. This includes events that occur when a stream is created, when an error occurs during the stream's lifetime, and when a stream is closed.
Table path: system.lakeflow.zerobus_stream
Column name | Data type | Description | Example |
|---|---|---|---|
| string | The ID of the account responsible for the stream |
|
| string | The ID of the workspace responsible for the stream |
|
| string | The ID of the stream |
|
| timestamp | The timestamp of when the event occurred |
|
| string | The ID of the producer that is responsible for the stream creation | This column is not currently supported. |
| timestamp | The timestamp of when the stream was opened |
|
| timestamp | The timestamp of when the stream was closed |
|
| string | The ID of the table that is being ingested into |
|
| string | The fully qualified name of the table that is being ingested into |
|
| string | The protocol used to ingest data. Possible values are |
|
| string | The format of data used on the stream. Possible values are |
|
| array | Array of errors that have occurred during the stream's lifetime | See Error reference |
Example queries for the stream table
The following query returns the number of currently opened streams for a specified table:
SELECT COUNT(stream.stream_id)
FROM system.lakeflow.zerobus_stream AS stream
WHERE stream.table_id=:table_id AND stream.closed_time IS NULL AND stream.opened_time > CURRENT_TIMESTAMP - INTERVAL '15 minutes'
The following query returns the IDs of all streams that incurred an error:
SELECT DISTINCT(stream.stream_id)
FROM system.lakeflow.zerobus_stream AS stream
WHERE size(stream.errors) > 0
Zerobus ingest table schema reference
The zerobus_ingest table stores all data related to records ingested using Zerobus Ingest. This data is not record-level granular. The data is aggregated by batches written to the Delta table, identified using the commit_version.
Table path: system.lakeflow.zerobus_ingest
Column name | Data type | Description | Example |
|---|---|---|---|
| string | ID of the account that owns the given workspace |
|
| string | ID of the workspace that the stream is created in |
|
| string | ID of the stream that performed the ingestion |
|
| bigint | Unique version number of the commit to the Delta |
|
| string | ID of the table that is being written into |
|
| string | Fully qualified table name in human readable format |
|
| timestamp | Timestamp of the commit |
|
| bigint | Size of the ingested data in bytes | 3532498 |
| bigint | Number of committed records | 3445 |
| array | Custom tags provided by Zerobus Ingest that can be used for tracking usage origin and other metadata |
|
| array | Array of errors that have occurred during the ingestion | See Error reference |
Example queries for the ingestion table
The following query gets the average insertion rate for every table during a specified time range:
SELECT ingest.table_id AS table_id, SUM(ingest.committed_bytes) / (SECONDS(:end_timestamp - :start_timestamp)) AS average_insertion_rate
FROM system.lakeflow.zerobus_ingest AS ingest
WHERE ingest.commit_time >= :start_timestamp AND ingest.commit_time <= :end_timestamp
GROUP BY ingest.table_id
The following query gets the total amount of data and records ingested into a table during a specified time range:
SELECT SUM(ingest.committed_bytes) as 'Total data', SUM(ingest.committed_records) as 'Total records'
FROM system.lakeflow.zerobus_ingest AS ingest
WHERE ingest.commit_time >= :start_timestamp AND ingest.commit_time <= :end_timestamp AND ingest.table_name = :table_name
Error reference
Both the stream and ingest tables include an errors column, which includes an array of errors that occurred during the stream or ingest, with each element representing a separate error. These array elements are recorded as objects with the following fields:
Field name | Data type | Description |
|---|---|---|
| bigint | Error code |
| string | Full error message |
| timestamp | Timestamp of when the error occurred |
Common join patterns
The following query shows how you can join the stream and ingest tables to return all tables that have been written into from a single workspace.
SELECT DISTINCT(stream.table_id)
FROM system.lakeflow.zerobus_stream AS stream INNER JOIN system.lakeflow.zerobus_ingest AS ingest ON (stream_id)
WHERE stream.workspace_id=:workspace_id