Skip to main content

Configure OpenTelemetry (OTLP) clients to send data to Unity Catalog

Beta

This feature is in Beta.

Zerobus Ingest includes an OpenTelemetry Protocol (OTLP) endpoint. You can push traces, logs, and metrics directly into Unity Catalog Delta tables using standard OpenTelemetry SDKs and collectors, without custom libraries. This page covers retrieving your endpoint, creating target tables, configuring a service principal, and sending your first telemetry data.

Get your Zerobus Ingest endpoint and Workspace URL

The endpoint URL follows this pattern:

  • Workspace URL: https://<databricks-instance>.cloud.databricks.com
  • Server endpoint: <workspace-id>.zerobus.<region>.cloud.databricks.com

For example:

  • Workspace URL: https://dbc-a1b2c3d4-e5f6.cloud.databricks.com
  • Server endpoint: 1234567890123456.zerobus.us-west-2.cloud.databricks.com

For more details on finding your workspace ID, URL and region, see Get your workspace URL and Zerobus Ingest endpoint.

Create target tables in Unity Catalog

You must create the target Delta tables before sending data. Each signal type (traces, logs, metrics) requires its own table with a specific schema.

Prerequisites:

To set up your tables:

  1. Replace <catalog>.<schema>.<prefix> with your catalog, schema, and desired table name prefix.
  2. Replace <service-principal-uuid> with your service principal's app ID (UUID). To find it, go to the service principal's Configurations tab in your Databricks workspace.
  3. Run the script in Databricks SQL.

Spans table

The spans table stores distributed trace data, including timing, status, and attributes for each span.

SQL
CREATE TABLE <catalog>.<schema>.<prefix>_otel_spans (
record_id STRING,
time TIMESTAMP,
date DATE,
service_name STRING,
trace_id STRING,
span_id STRING,
trace_state STRING,
parent_span_id STRING,
flags INT,
name STRING,
kind STRING,
start_time_unix_nano LONG,
end_time_unix_nano LONG,
attributes VARIANT,
dropped_attributes_count INT,
events ARRAY<STRUCT<
time_unix_nano: LONG,
name: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>>,
dropped_events_count INT,
links ARRAY<STRUCT<
trace_id: STRING,
span_id: STRING,
trace_state: STRING,
attributes: VARIANT,
dropped_attributes_count: INT,
flags: INT
>>,
dropped_links_count INT,
status STRUCT<
message: STRING,
code: STRING
>,
resource STRUCT<
attributes: VARIANT,
dropped_attributes_count: INT
>,
resource_schema_url STRING,
instrumentation_scope STRUCT<
name: STRING,
version: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>,
span_schema_url STRING
) USING DELTA
CLUSTER BY (time, service_name, trace_id)
TBLPROPERTIES (
'otel.schemaVersion' = 'v2',
'delta.checkpointPolicy' = 'classic',
'delta.enableVariantShredding' = 'true', -- optional
'delta.feature.variantShredding-preview' = 'supported', -- optional
'delta.feature.variantType-preview' = 'supported' -- optional
);

Logs table

The logs table stores structured log records, including severity, body, and resource attributes.

SQL
CREATE TABLE <catalog>.<schema>.<prefix>_otel_logs (
record_id STRING,
time TIMESTAMP,
date DATE,
service_name STRING,
event_name STRING,
trace_id STRING,
span_id STRING,
time_unix_nano LONG,
observed_time_unix_nano LONG,
severity_number STRING,
severity_text STRING,
body VARIANT,
attributes VARIANT,
dropped_attributes_count INT,
flags INT,
resource STRUCT<
attributes: VARIANT,
dropped_attributes_count: INT
>,
resource_schema_url STRING,
instrumentation_scope STRUCT<
name: STRING,
version: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>,
log_schema_url STRING
) USING DELTA
CLUSTER BY (time, service_name)
TBLPROPERTIES (
'otel.schemaVersion' = 'v2',
'delta.checkpointPolicy' = 'classic',
'delta.enableVariantShredding' = 'true', -- optional
'delta.feature.variantShredding-preview' = 'supported', -- optional
'delta.feature.variantType-preview' = 'supported' -- optional
);

Metrics table

The metrics table stores gauge, sum, and histogram measurements along with their associated resource and instrumentation scope attributes.

SQL
CREATE TABLE <catalog>.<schema>.<prefix>_otel_metrics (
record_id STRING,
time TIMESTAMP,
date DATE,
service_name STRING,
start_time_unix_nano LONG,
time_unix_nano LONG,
name STRING,
description STRING,
unit STRING,
metric_type STRING,
gauge STRUCT<
value: DOUBLE,
exemplars: ARRAY<STRUCT<
time_unix_nano: LONG,
value: DOUBLE,
span_id: STRING,
trace_id: STRING,
filtered_attributes: VARIANT
>>,
attributes: VARIANT,
flags: INT
>,
sum STRUCT<
value: DOUBLE,
exemplars: ARRAY<STRUCT<
time_unix_nano: LONG,
value: DOUBLE,
span_id: STRING,
trace_id: STRING,
filtered_attributes: VARIANT
>>,
attributes: VARIANT,
flags: INT,
aggregation_temporality: STRING,
is_monotonic: BOOLEAN
>,
histogram STRUCT<
count: LONG,
sum: DOUBLE,
bucket_counts: ARRAY<LONG>,
explicit_bounds: ARRAY<DOUBLE>,
exemplars: ARRAY<STRUCT<
time_unix_nano: LONG,
value: DOUBLE,
span_id: STRING,
trace_id: STRING,
filtered_attributes: VARIANT
>>,
attributes: VARIANT,
flags: INT,
min: DOUBLE,
max: DOUBLE,
aggregation_temporality: STRING
>,
exponential_histogram STRUCT<
attributes: VARIANT,
count: LONG,
sum: DOUBLE,
scale: INT,
zero_count: LONG,
positive_bucket: STRUCT<
offset: INT,
bucket_counts: ARRAY<LONG>
>,
negative_bucket: STRUCT<
offset: INT,
bucket_counts: ARRAY<LONG>
>,
flags: INT,
exemplars: ARRAY<STRUCT<
time_unix_nano: LONG,
value: DOUBLE,
span_id: STRING,
trace_id: STRING,
filtered_attributes: VARIANT
>>,
min: DOUBLE,
max: DOUBLE,
zero_threshold: DOUBLE,
aggregation_temporality: STRING
>,
summary STRUCT<
count: LONG,
sum: DOUBLE,
quantile_values: ARRAY<STRUCT<
quantile: DOUBLE,
value: DOUBLE
>>,
attributes: VARIANT,
flags: INT
>,
metadata VARIANT,
resource STRUCT<
attributes: VARIANT,
dropped_attributes_count: INT
>,
resource_schema_url STRING,
instrumentation_scope STRUCT<
name: STRING,
version: STRING,
attributes: VARIANT,
dropped_attributes_count: INT
>,
metric_schema_url STRING
) USING DELTA
CLUSTER BY (time, service_name)
TBLPROPERTIES (
'otel.schemaVersion' = 'v2',
'delta.checkpointPolicy' = 'classic',
'delta.enableVariantShredding' = 'true', -- optional
'delta.feature.variantShredding-preview' = 'supported', -- optional
'delta.feature.variantType-preview' = 'supported' -- optional
);

Create a service principal and grant permissions

Set up a service principal with OAuth credentials and grant it access to your tables. For more information on setting up a service principal, see Authorize service principal access to Databricks with OAuth.

Grant the service principal access to the catalog, schema, and each table. Granting ALL PRIVILEGES is not sufficient. You must explicitly grant MODIFY and SELECT on each table.

SQL
GRANT USE CATALOG ON CATALOG <catalog> TO `<service-principal-uuid>`;
GRANT USE SCHEMA ON SCHEMA <catalog>.<schema> TO `<service-principal-uuid>`;
GRANT MODIFY, SELECT ON TABLE <catalog>.<schema>.<prefix>_otel_spans TO `<service-principal-uuid>`;
GRANT MODIFY, SELECT ON TABLE <catalog>.<schema>.<prefix>_otel_logs TO `<service-principal-uuid>`;
GRANT MODIFY, SELECT ON TABLE <catalog>.<schema>.<prefix>_otel_metrics TO `<service-principal-uuid>`;

Configure your exporter

The following examples use OpenTelemetry zero-code instrumentation to automatically collect and forward traces, logs, and metrics to Zerobus Ingest without any code changes to your application. You can also use other OTLP-compatible exporters that support gRPC and custom metadata headers.

Required headers

All OTLP requests must include the following metadata headers:

  • x-databricks-zerobus-table-name: The fully-qualified Unity Catalog table name in <catalog>.<schema>.<table> format. Each request targets a single table.
  • Authorization: OAuth bearer token generated from the service principal credentials.

To generate a static bearer token from your service principal credentials, see Authorize service principal access to Databricks with OAuth. Static tokens expire after one hour. For long-running applications, see OpenTelemetry Collector with automatic token refresh.

Variable setup

Define these variables before running either example:

Variable

Example

DATABRICKS_CLIENT_ID

abc123-... (service principal app ID)

DATABRICKS_CLIENT_SECRET

dose1234...

WORKSPACE_URL

my-workspace.cloud.databricks.com

WORKSPACE_ID

1234567890123456

REGION

us-west-2

CATALOG

my_catalog

SCHEMA

my_schema

TABLE_PREFIX

my_prefix

You can define these variables as environment variables using Bash. For example:

Bash
export DATABRICKS_CLIENT_ID="<your-client-id>"
export DATABRICKS_CLIENT_SECRET="<your-client-secret>"

Quick start with static token

The following example uses a static bearer token for each signal type (traces, logs, metrics). Before running this example, generate tokens from your service principal credentials. See Authorize service principal access to Databricks with OAuth.

Use this approach for short-lived or ad-hoc pipelines where managing token refresh is not a concern. Static OAuth tokens expire after one hour and are not suitable for long-running processes. For production workloads, use the OpenTelemetry Collector with automatic token refresh instead.

You must generate a separate token for each signal type. In the authorization_details payload, replace $TABLE_NAME with the full table name for each signal, such as ${TABLE_PREFIX}_otel_spans, ${TABLE_PREFIX}_otel_logs, and ${TABLE_PREFIX}_otel_metrics.

Bash
authorization_details=$(cat <<EOF
[{
"type": "unity_catalog_privileges",
"privileges": ["USE CATALOG"],
"object_type": "CATALOG",
"object_full_path": "$CATALOG"
},
{
"type": "unity_catalog_privileges",
"privileges": ["USE SCHEMA"],
"object_type": "SCHEMA",
"object_full_path": "$CATALOG.$SCHEMA"
},
{
"type": "unity_catalog_privileges",
"privileges": ["SELECT", "MODIFY"],
"object_type": "TABLE",
"object_full_path": "$CATALOG.$SCHEMA.$TABLE_NAME"
}]
EOF
)

curl -X POST \
-u "$DATABRICKS_CLIENT_ID:$DATABRICKS_CLIENT_SECRET" \
-d "grant_type=client_credentials" \
-d "scope=all-apis" \
-d "resource=api://databricks/workspaces/$WORKSPACE_ID/zerobusDirectWriteApi" \
--data-urlencode "authorization_details=$authorization_details" \
"https://$WORKSPACE_URL/oidc/v1/token"

Save the three returned access tokens as TOKEN_SPANS, TOKEN_LOGS, and TOKEN_METRICS before installing the auto-instrumentation packages. Then run your application:

Bash
OTEL_SERVICE_NAME="my-service" \
OTEL_EXPORTER_OTLP_PROTOCOL="grpc" \
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://${WORKSPACE_ID}.zerobus.${REGION}.cloud.databricks.com:443" \
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT="https://${WORKSPACE_ID}.zerobus.${REGION}.cloud.databricks.com:443" \
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT="https://${WORKSPACE_ID}.zerobus.${REGION}.cloud.databricks.com:443" \
OTEL_EXPORTER_OTLP_TRACES_HEADERS="authorization=Bearer ${TOKEN_SPANS},x-databricks-zerobus-table-name=${CATALOG}.${SCHEMA}.${TABLE_PREFIX}_otel_spans" \
OTEL_EXPORTER_OTLP_LOGS_HEADERS="authorization=Bearer ${TOKEN_LOGS},x-databricks-zerobus-table-name=${CATALOG}.${SCHEMA}.${TABLE_PREFIX}_otel_logs" \
OTEL_EXPORTER_OTLP_METRICS_HEADERS="authorization=Bearer ${TOKEN_METRICS},x-databricks-zerobus-table-name=${CATALOG}.${SCHEMA}.${TABLE_PREFIX}_otel_metrics" \
OTEL_TRACES_EXPORTER="otlp" \
OTEL_METRICS_EXPORTER="otlp" \
OTEL_LOGS_EXPORTER="otlp" \
opentelemetry-instrument python my_app.py

OpenTelemetry Collector with automatic token refresh

Databricks OAuth tokens expire after one hour. Rather than managing token refresh in your application code, deploy an OpenTelemetry Collector as a proxy between your application and Zerobus Ingest. The Collector uses the oauth2clientauthextension to mint a token from your service principal credentials at startup and refresh it automatically before expiry.

This is the recommended approach for long-running and production workloads. Unlike the static token approach, the Collector handles OAuth token acquisition and refresh automatically — your application code requires no changes.

The Collector sits between your application and Zerobus Ingest. Your application sends plain OTLP to the Collector on localhost:4317 with no authentication. The Collector adds the OAuth token and table header to each request and forwards it to the Zerobus Ingest endpoint.

Collector configuration

Create a collector.yaml file to configure your collector:

YAML
extensions:
oauth2client/spans:
client_id: ${env:DATABRICKS_CLIENT_ID}
client_secret: ${env:DATABRICKS_CLIENT_SECRET}
token_url: https://${env:WORKSPACE_URL}/oidc/v1/token
scopes: ['all-apis']
endpoint_params:
resource: 'api://databricks/workspaces/${env:WORKSPACE_ID}/zerobusDirectWriteApi'
authorization_details:
- '[{"type":"unity_catalog_privileges","privileges":["USE CATALOG"],"object_type":"CATALOG","object_full_path":"${env:CATALOG}"},{"type":"unity_catalog_privileges","privileges":["USE SCHEMA"],"object_type":"SCHEMA","object_full_path":"${env:CATALOG}.${env:SCHEMA}"},{"type":"unity_catalog_privileges","privileges":["SELECT","MODIFY"],"object_type":"TABLE","object_full_path":"${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_spans"}]'

oauth2client/logs:
client_id: ${env:DATABRICKS_CLIENT_ID}
client_secret: ${env:DATABRICKS_CLIENT_SECRET}
token_url: https://${env:WORKSPACE_URL}/oidc/v1/token
scopes: ['all-apis']
endpoint_params:
resource: 'api://databricks/workspaces/${env:WORKSPACE_ID}/zerobusDirectWriteApi'
authorization_details:
- '[{"type":"unity_catalog_privileges","privileges":["USE CATALOG"],"object_type":"CATALOG","object_full_path":"${env:CATALOG}"},{"type":"unity_catalog_privileges","privileges":["USE SCHEMA"],"object_type":"SCHEMA","object_full_path":"${env:CATALOG}.${env:SCHEMA}"},{"type":"unity_catalog_privileges","privileges":["SELECT","MODIFY"],"object_type":"TABLE","object_full_path":"${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_logs"}]'

oauth2client/metrics:
client_id: ${env:DATABRICKS_CLIENT_ID}
client_secret: ${env:DATABRICKS_CLIENT_SECRET}
token_url: https://${env:WORKSPACE_URL}/oidc/v1/token
scopes: ['all-apis']
endpoint_params:
resource: 'api://databricks/workspaces/${env:WORKSPACE_ID}/zerobusDirectWriteApi'
authorization_details:
- '[{"type":"unity_catalog_privileges","privileges":["USE CATALOG"],"object_type":"CATALOG","object_full_path":"${env:CATALOG}"},{"type":"unity_catalog_privileges","privileges":["USE SCHEMA"],"object_type":"SCHEMA","object_full_path":"${env:CATALOG}.${env:SCHEMA}"},{"type":"unity_catalog_privileges","privileges":["SELECT","MODIFY"],"object_type":"TABLE","object_full_path":"${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_metrics"}]'

exporters:
otlp/spans:
endpoint: ${env:WORKSPACE_ID}.zerobus.${env:REGION}.cloud.databricks.com:443
auth:
authenticator: oauth2client/spans
headers:
x-databricks-zerobus-table-name: '${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_spans'

otlp/logs:
endpoint: ${env:WORKSPACE_ID}.zerobus.${env:REGION}.cloud.databricks.com:443
auth:
authenticator: oauth2client/logs
headers:
x-databricks-zerobus-table-name: '${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_logs'

otlp/metrics:
endpoint: ${env:WORKSPACE_ID}.zerobus.${env:REGION}.cloud.databricks.com:443
auth:
authenticator: oauth2client/metrics
headers:
x-databricks-zerobus-table-name: '${env:CATALOG}.${env:SCHEMA}.${env:TABLE_PREFIX}_otel_metrics'

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317

processors:
batch:
timeout: 5s # adjust as needed
send_batch_size: 10 # adjust as needed

service:
extensions: [oauth2client/spans, oauth2client/logs, oauth2client/metrics]
pipelines:
traces:
receivers: [otlp] # adjust as needed
processors: [batch]
exporters: [otlp/spans]
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp/logs]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlp/metrics]

Then run the collector:

Bash
./otelcol-contrib --config collector.yaml

Instrument your application

Set the required variables before you run this code sample. Then install the auto-instrumentation packages and run your application.

Bash
OTEL_SERVICE_NAME=my-service \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
OTEL_EXPORTER_OTLP_PROTOCOL=grpc \
OTEL_TRACES_EXPORTER=otlp \
OTEL_METRICS_EXPORTER=otlp \
OTEL_LOGS_EXPORTER=otlp \
opentelemetry-instrument python my_app.py

Next steps