Configure OpenTelemetry (OTLP) clients to send data to Unity Catalog
This feature is in Beta.
Zerobus Ingest includes an OpenTelemetry Protocol (OTLP) endpoint. You can push traces, logs, and metrics directly into Unity Catalog Delta tables using standard OpenTelemetry SDKs and collectors, without custom libraries. This page covers retrieving your endpoint, creating target tables, configuring a service principal, and sending your first telemetry data.
Get your Zerobus Ingest endpoint and Workspace URL
The endpoint URL follows this pattern:
- Workspace URL:
https://<databricks-instance>.cloud.databricks.com - Server endpoint:
<workspace-id>.zerobus.<region>.cloud.databricks.com
For example:
- Workspace URL:
https://dbc-a1b2c3d4-e5f6.cloud.databricks.com - Server endpoint:
1234567890123456.zerobus.us-west-2.cloud.databricks.com
For more details on finding your workspace ID, URL and region, see Get your workspace URL and Zerobus Ingest endpoint.
Create target tables in Unity Catalog
You must create the target Delta tables before sending data. Each signal type (traces, logs, metrics) requires its own table with a specific schema.
Prerequisites:
- DBR 15.3 or higher: Required to query
VARIANTtype data. - (Optional) DBR 17.2 or higher: Required for variant shredding which improves query performance. For more information, see Optimize performance on the VARIANT data with shredding
To set up your tables:
- Replace
<catalog>.<schema>.<prefix>with your catalog, schema, and desired table name prefix. - Replace
<service-principal-uuid>with your service principal's app ID (UUID). To find it, go to the service principal's Configurations tab in your Databricks workspace. - Run the script in Databricks SQL.
Spans table
The spans table stores distributed trace data, including timing, status, and attributes for each span.
CREATE TABLE <catalog>.<schema>.<prefix>_otel_spans (
record_id STRING,
time TIMESTAMP,
date DATE,
service_name STRING,
trace_id STRING,
span_id STRING,
trace_state STRING,
parent_span_id STRING,
flags INT,
name STRING,
kind STRING,
start_time_unix_nano LONG,
end_time_unix_nano LONG,
attributes VARIANT,
dropped_attributes_count INT,
events ARRAY<STRUCT<
time_unix_nano: LONG,
name: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>>,
dropped_events_count INT,
links ARRAY<STRUCT<
trace_id: STRING,
span_id: STRING,
trace_state: STRING,
attributes: VARIANT,
dropped_attributes_count: INT,
flags: INT
>>,
dropped_links_count INT,
status STRUCT<
message: STRING,
code: STRING
>,
resource STRUCT<
attributes: VARIANT,
dropped_attributes_count: INT
>,
resource_schema_url STRING,
instrumentation_scope STRUCT<
name: STRING,
version: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>,
span_schema_url STRING
) USING DELTA
CLUSTER BY (time, service_name, trace_id)
TBLPROPERTIES (
'otel.schemaVersion' = 'v2',
'delta.checkpointPolicy' = 'classic',
'delta.enableVariantShredding' = 'true', -- optional
'delta.feature.variantShredding-preview' = 'supported', -- optional
'delta.feature.variantType-preview' = 'supported' -- optional
);
Logs table
The logs table stores structured log records, including severity, body, and resource attributes.
CREATE TABLE <catalog>.<schema>.<prefix>_otel_logs (
record_id STRING,
time TIMESTAMP,
date DATE,
service_name STRING,
event_name STRING,
trace_id STRING,
span_id STRING,
time_unix_nano LONG,
observed_time_unix_nano LONG,
severity_number STRING,
severity_text STRING,
body VARIANT,
attributes VARIANT,
dropped_attributes_count INT,
flags INT,
resource STRUCT<
attributes: VARIANT,
dropped_attributes_count: INT
>,
resource_schema_url STRING,
instrumentation_scope STRUCT<
name: STRING,
version: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>,
log_schema_url STRING
) USING DELTA
CLUSTER BY (time, service_name)
TBLPROPERTIES (
'otel.schemaVersion' = 'v2',
'delta.checkpointPolicy' = 'classic',
'delta.enableVariantShredding' = 'true', -- optional
'delta.feature.variantShredding-preview' = 'supported', -- optional
'delta.feature.variantType-preview' = 'supported' -- optional
);
Metrics table
The metrics table stores gauge, sum, and histogram measurements along with their associated resource and instrumentation scope attributes.
CREATE TABLE <catalog>.<schema>.<prefix>_otel_metrics (
record_id STRING,
time TIMESTAMP,
date DATE,
service_name STRING,
start_time_unix_nano LONG,
time_unix_nano LONG,
name STRING,
description STRING,
unit STRING,
metric_type STRING,
gauge STRUCT<
value: DOUBLE,
exemplars: ARRAY<STRUCT<
time_unix_nano: LONG,
value: DOUBLE,
span_id: STRING,
trace_id: STRING,
filtered_attributes: VARIANT
>>,
attributes: VARIANT,
flags: INT
>,
sum STRUCT<
value: DOUBLE,
exemplars: ARRAY<STRUCT<
time_unix_nano: LONG,
value: DOUBLE,
span_id: STRING,
trace_id: STRING,
filtered_attributes: VARIANT
>>,
attributes: VARIANT,
flags: INT,
aggregation_temporality: STRING,
is_monotonic: BOOLEAN
>,
histogram STRUCT<
count: LONG,
sum: DOUBLE,
bucket_counts: ARRAY<LONG>,
explicit_bounds: ARRAY<DOUBLE>,
exemplars: ARRAY<STRUCT<
time_unix_nano: LONG,
value: DOUBLE,
span_id: STRING,
trace_id: STRING,
filtered_attributes: VARIANT
>>,
attributes: VARIANT,
flags: INT,
min: DOUBLE,
max: DOUBLE,
aggregation_temporality: STRING
>,
exponential_histogram STRUCT<
attributes: VARIANT,
count: LONG,
sum: DOUBLE,
scale: INT,
zero_count: LONG,
positive_bucket: STRUCT<
offset: INT,
bucket_counts: ARRAY<LONG>
>,
negative_bucket: STRUCT<
offset: INT,
bucket_counts: ARRAY<LONG>
>,
flags: INT,
exemplars: ARRAY<STRUCT<
time_unix_nano: LONG,
value: DOUBLE,
span_id: STRING,
trace_id: STRING,
filtered_attributes: VARIANT
>>,
min: DOUBLE,
max: DOUBLE,
zero_threshold: DOUBLE,
aggregation_temporality: STRING
>,
summary STRUCT<
count: LONG,
sum: DOUBLE,
quantile_values: ARRAY<STRUCT<
quantile: DOUBLE,
value: DOUBLE
>>,
attributes: VARIANT,
flags: INT
>,
metadata VARIANT,
resource STRUCT<
attributes: VARIANT,
dropped_attributes_count: INT
>,
resource_schema_url STRING,
instrumentation_scope STRUCT<
name: STRING,
version: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>,
metric_schema_url STRING
) USING DELTA
CLUSTER BY (time, service_name)
TBLPROPERTIES (
'otel.schemaVersion' = 'v2',
'delta.checkpointPolicy' = 'classic',
'delta.enableVariantShredding' = 'true', -- optional
'delta.feature.variantShredding-preview' = 'supported', -- optional
'delta.feature.variantType-preview' = 'supported' -- optional
);
Create a service principal and grant permissions
Set up a service principal with OAuth credentials and grant it access to your tables. For more information on setting up a service principal, see Authorize service principal access to Databricks with OAuth.
Grant the service principal access to the catalog, schema, and each table. Granting ALL PRIVILEGES is not sufficient. You must explicitly grant MODIFY and SELECT on each table.
GRANT USE CATALOG ON CATALOG <catalog> TO `<service-principal-uuid>`;
GRANT USE SCHEMA ON SCHEMA <catalog>.<schema> TO `<service-principal-uuid>`;
GRANT MODIFY, SELECT ON TABLE <catalog>.<schema>.<prefix>_otel_spans TO `<service-principal-uuid>`;
GRANT MODIFY, SELECT ON TABLE <catalog>.<schema>.<prefix>_otel_logs TO `<service-principal-uuid>`;
GRANT MODIFY, SELECT ON TABLE <catalog>.<schema>.<prefix>_otel_metrics TO `<service-principal-uuid>`;
Configure your exporter
The following examples use OpenTelemetry zero-code instrumentation to automatically collect and forward traces, logs, and metrics to Zerobus Ingest without any code changes to your application. You can also use other OTLP-compatible exporters that support gRPC and custom metadata headers.
Required headers
All OTLP requests must include the following metadata headers:
x-databricks-zerobus-table-name: The fully-qualified Unity Catalog table name in<catalog>.<schema>.<table>format. Each request targets a single table.Authorization: OAuth bearer token generated from the service principal credentials.
To generate a static bearer token from your service principal credentials, see Authorize service principal access to Databricks with OAuth. Static tokens expire after one hour. For long-running applications, see OpenTelemetry Collector with automatic token refresh.
Variable setup
Define these variables before running either example:
Variable | Example |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You can define these variables as environment variables using Bash. For example:
export DATABRICKS_CLIENT_ID="<your-client-id>"
export DATABRICKS_CLIENT_SECRET="<your-client-secret>"
Quick start with static token
The following example uses a static bearer token for each signal type (traces, logs, metrics). Before running this example, generate tokens from your service principal credentials. See Authorize service principal access to Databricks with OAuth.
Use this approach for short-lived or ad-hoc pipelines where managing token refresh is not a concern. Static OAuth tokens expire after one hour and are not suitable for long-running processes. For production workloads, use the OpenTelemetry Collector with automatic token refresh instead.
You must generate a separate token for each signal type. In the authorization_details payload, replace $TABLE_NAME with the full table name for each signal, such as ${TABLE_PREFIX}_otel_spans, ${TABLE_PREFIX}_otel_logs, and ${TABLE_PREFIX}_otel_metrics.
authorization_details=$(cat <<EOF
[{
"type": "unity_catalog_privileges",
"privileges": ["USE CATALOG"],
"object_type": "CATALOG",
"object_full_path": "$CATALOG"
},
{
"type": "unity_catalog_privileges",
"privileges": ["USE SCHEMA"],
"object_type": "SCHEMA",
"object_full_path": "$CATALOG.$SCHEMA"
},
{
"type": "unity_catalog_privileges",
"privileges": ["SELECT", "MODIFY"],
"object_type": "TABLE",
"object_full_path": "$CATALOG.$SCHEMA.$TABLE_NAME"
}]
EOF
)
curl -X POST \
-u "$DATABRICKS_CLIENT_ID:$DATABRICKS_CLIENT_SECRET" \
-d "grant_type=client_credentials" \
-d "scope=all-apis" \
-d "resource=api://databricks/workspaces/$WORKSPACE_ID/zerobusDirectWriteApi" \
--data-urlencode "authorization_details=$authorization_details" \
"https://$WORKSPACE_URL/oidc/v1/token"
Save the three returned access tokens as TOKEN_SPANS, TOKEN_LOGS, and TOKEN_METRICS before installing the auto-instrumentation packages. Then run your application:
OTEL_SERVICE_NAME="my-service" \
OTEL_EXPORTER_OTLP_PROTOCOL="grpc" \
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://${WORKSPACE_ID}.zerobus.${REGION}.cloud.databricks.com:443" \
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT="https://${WORKSPACE_ID}.zerobus.${REGION}.cloud.databricks.com:443" \
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="https://${WORKSPACE_ID}.zerobus.${REGION}.cloud.databricks.com:443" \
OTEL_EXPORTER_OTLP_TRACES_HEADERS="authorization=Bearer ${TOKEN_SPANS},x-databricks-zerobus-table-name=${CATALOG}.${SCHEMA}.${TABLE_PREFIX}_otel_spans" \
OTEL_EXPORTER_OTLP_LOGS_HEADERS="authorization=Bearer ${TOKEN_LOGS},x-databricks-zerobus-table-name=${CATALOG}.${SCHEMA}.${TABLE_PREFIX}_otel_logs" \
OTEL_EXPORTER_OTLP_METRICS_HEADERS="authorization=Bearer ${TOKEN_METRICS},x-databricks-zerobus-table-name=${CATALOG}.${SCHEMA}.${TABLE_PREFIX}_otel_metrics" \
OTEL_TRACES_EXPORTER="otlp" \
OTEL_METRICS_EXPORTER="otlp" \
OTEL_LOGS_EXPORTER="otlp" \
opentelemetry-instrument python my_app.py
OpenTelemetry Collector with automatic token refresh
Databricks OAuth tokens expire after one hour. Rather than managing token refresh in your application code, deploy an OpenTelemetry Collector as a proxy between your application and Zerobus Ingest. The Collector uses the oauth2clientauthextension to mint a token from your service principal credentials at startup and refresh it automatically before expiry.
This is the recommended approach for long-running and production workloads. Unlike the static token approach, the Collector handles OAuth token acquisition and refresh automatically — your application code requires no changes.
The Collector sits between your application and Zerobus Ingest. Your application sends plain OTLP to the Collector on localhost:4317 with no authentication. The Collector adds the OAuth token and table header to each request and forwards it to the Zerobus Ingest endpoint.
Collector configuration
Create a collector.yaml file to configure your collector:
extensions:
oauth2client/spans:
client_id: ${env:DATABRICKS_CLIENT_ID}
client_secret: ${env:DATABRICKS_CLIENT_SECRET}
token_url: https://${env:WORKSPACE_URL}/oidc/v1/token
scopes: ['all-apis']
endpoint_params:
resource: 'api://databricks/workspaces/${env:WORKSPACE_ID}/zerobusDirectWriteApi'
authorization_details:
- '[{"type":"unity_catalog_privileges","privileges":["USE CATALOG"],"object_type":"CATALOG","object_full_path":"${env:CATALOG}"},{"type":"unity_catalog_privileges","privileges":["USE SCHEMA"],"object_type":"SCHEMA","object_full_path":"${env:CATALOG}.${env:SCHEMA}"},{"type":"unity_catalog_privileges","privileges":["SELECT","MODIFY"],"object_type":"TABLE","object_full_path":"${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_spans"}]'
oauth2client/logs:
client_id: ${env:DATABRICKS_CLIENT_ID}
client_secret: ${env:DATABRICKS_CLIENT_SECRET}
token_url: https://${env:WORKSPACE_URL}/oidc/v1/token
scopes: ['all-apis']
endpoint_params:
resource: 'api://databricks/workspaces/${env:WORKSPACE_ID}/zerobusDirectWriteApi'
authorization_details:
- '[{"type":"unity_catalog_privileges","privileges":["USE CATALOG"],"object_type":"CATALOG","object_full_path":"${env:CATALOG}"},{"type":"unity_catalog_privileges","privileges":["USE SCHEMA"],"object_type":"SCHEMA","object_full_path":"${env:CATALOG}.${env:SCHEMA}"},{"type":"unity_catalog_privileges","privileges":["SELECT","MODIFY"],"object_type":"TABLE","object_full_path":"${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_logs"}]'
oauth2client/metrics:
client_id: ${env:DATABRICKS_CLIENT_ID}
client_secret: ${env:DATABRICKS_CLIENT_SECRET}
token_url: https://${env:WORKSPACE_URL}/oidc/v1/token
scopes: ['all-apis']
endpoint_params:
resource: 'api://databricks/workspaces/${env:WORKSPACE_ID}/zerobusDirectWriteApi'
authorization_details:
- '[{"type":"unity_catalog_privileges","privileges":["USE CATALOG"],"object_type":"CATALOG","object_full_path":"${env:CATALOG}"},{"type":"unity_catalog_privileges","privileges":["USE SCHEMA"],"object_type":"SCHEMA","object_full_path":"${env:CATALOG}.${env:SCHEMA}"},{"type":"unity_catalog_privileges","privileges":["SELECT","MODIFY"],"object_type":"TABLE","object_full_path":"${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_metrics"}]'
exporters:
otlp/spans:
endpoint: ${env:WORKSPACE_ID}.zerobus.${env:REGION}.cloud.databricks.com:443
auth:
authenticator: oauth2client/spans
headers:
x-databricks-zerobus-table-name: '${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_spans'
otlp/logs:
endpoint: ${env:WORKSPACE_ID}.zerobus.${env:REGION}.cloud.databricks.com:443
auth:
authenticator: oauth2client/logs
headers:
x-databricks-zerobus-table-name: '${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_logs'
otlp/metrics:
endpoint: ${env:WORKSPACE_ID}.zerobus.${env:REGION}.cloud.databricks.com:443
auth:
authenticator: oauth2client/metrics
headers:
x-databricks-zerobus-table-name: '${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_metrics'
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 5s # adjust as needed
send_batch_size: 10 # adjust as needed
service:
extensions: [oauth2client/spans, oauth2client/logs, oauth2client/metrics]
pipelines:
traces:
receivers: [otlp] # adjust as needed
processors: [batch]
exporters: [otlp/spans]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp/logs]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlp/metrics]
Then run the collector:
./otelcol-contrib --config collector.yaml
Instrument your application
Set the required variables before you run this code sample. Then install the auto-instrumentation packages and run your application.
OTEL_SERVICE_NAME=my-service \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
OTEL_TRACES_EXPORTER=otlp \
OTEL_METRICS_EXPORTER=otlp \
OTEL_LOGS_EXPORTER=otlp \
opentelemetry-instrument python my_app.py
Next steps
- OpenTelemetry table reference for Zerobus Ingest: Reference for table schemas and data mapping.
- Query OpenTelemetry data: Example queries for exploring your telemetry data in Unity Catalog.
- Zerobus Ingest Error Handling: Troubleshoot common errors and error codes.
- Zerobus Ingest connector limitations: Review throughput and retention limits for Zerobus Ingest.