Persist custom model serving data to Unity Catalog

This page shows how to configure endpoint telemetry to persist OpenTelemetry logs, traces, and metrics from your custom model serving endpoints to Unity Catalog tables. Use the persisted telemetry data to perform root cause analysis, monitor endpoint health, and meet compliance requirements with standard SQL queries.

Requirements

Your workspace must be enabled for Unity Catalog. Default storage (Arclight) is not supported.
You must have USE CATALOG, USE SCHEMA, CREATE TABLE, and MODIFY permissions on the destination Unity Catalog catalog and schema where the logs are stored.
An existing custom model serving endpoint or agent serving endpoint, or permissions to create one.
Your workspace must be in a supported region:
- us-east-1
- us-east-2
- us-west-2
- eu-central-1
- ap-southeast-1
- ap-southeast-2
- ap-northeast-1
- ca-central-1
- eu-west-1

Step 1: Instrument your model code

Add instrumentation to your model code to capture telemetry.

Add application logging to your model. Endpoint telemetry automatically captures standard Python logging output. No OpenTelemetry SDK instrumentation is required for basic logging.

Python
import logging

class MyCustomModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input):
        # This log will be persisted to the <prefix>_otel_logs table
        logging.warning("Received inference request")

        try:
            # Your model logic here
            result = model_input * 2
            return result
        except Exception as e:
            # Error logs are also captured with severity 'ERROR'
            logging.error(f"Inference failed: {e}")
            raise e

The root logging level is set to WARNING. See Troubleshooting to change the logging level.

(Optional) Instrument custom metrics and traces with OpenTelemetry. To capture custom metrics and traces beyond basic logging, add OpenTelemetry SDK instrumentation to your model. Expand the following section for a complete example that shows how to create counters, record spans, and attach custom attributes.

Example: Custom metrics, spans, and model logging with OpenTelemetry

note

Due to limitations in model serialization, you must write your model to a separate file before logging to avoid errors, as shown below using %%writefile return_input_model.py.

Python
%%writefile return_input_model.py
import os

import mlflow
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.metrics import get_meter, set_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace import get_tracer, set_tracer_provider

# ---- OTel initialization (per-worker) ----
resource = Resource.create({
    "worker.pid": str(os.getpid()),
})

otlp_trace_exporter = OTLPSpanExporter()
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(otlp_trace_exporter))
set_tracer_provider(tracer_provider)

otlp_metric_exporter = OTLPMetricExporter()
metric_reader = PeriodicExportingMetricReader(otlp_metric_exporter)
meter_provider = MeterProvider(metric_readers=[metric_reader], resource=resource)
set_meter_provider(meter_provider)

_tracer = get_tracer(__name__)
_meter = get_meter(__name__)
_prediction_counter = _meter.create_counter(
    name="prediction_count",
    description="Number of predictions made",
    unit="1"
)


class ReturnInputModel(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        self.tracer = _tracer
        self.prediction_counter = _prediction_counter

    def predict(self, context, model_input):
        with self.tracer.start_as_current_span("ReturnInputModel.predict") as span:
            span.set_attribute("input_shape", str(model_input.shape))
            span.set_attribute("input_columns", str(list(model_input.columns)))
            self.prediction_counter.add(1)
            return model_input

mlflow.models.set_model(ReturnInputModel())

Log and register the model.

Python
import pandas as pd
import mlflow
from mlflow.models import infer_signature

# Prepare tabular input/output for signature (pyfunc expects DataFrame)
input_df = pd.DataFrame({"inputs": ["hello world"]})
output_df = input_df.copy()  # model returns input unchanged

# Log the model with OpenTelemetry dependencies (using code-based logging to avoid serialization issues)
with mlflow.start_run():
    signature = infer_signature(input_df, output_df)

    model_info = mlflow.pyfunc.log_model(
        name="model",
        python_model="return_input_model.py",
        signature=signature,
        input_example=input_df,
        pip_requirements=[
            "mlflow==3.1",
            "opentelemetry-sdk",
            "opentelemetry-exporter-otlp-proto-http",
        ],
    )

# Register with express deployment environment packing
# Use Unity Catalog name: catalog.schema.model_name
registered = mlflow.register_model(
    model_info.model_uri,
    MODEL_NAME,
    env_pack="databricks_model_serving"
)

Step 2: Prepare the Unity Catalog destination

Before creating your endpoint, ensure you have a catalog and schema ready to receive the telemetry data. Databricks automatically creates the necessary tables in this schema if they do not already exist.

In Catalog Explorer, navigate to the catalog and schema you want to use (for example, my_catalog.observability).

Step 3: Enable endpoint telemetry

You can enable telemetry when creating a new endpoint or add it to an existing one.

New endpoint
Existing endpoint

To enable telemetry in the UI:

Navigate to Serving in the left sidebar.
Click Create serving endpoint.
Expand Advanced options, go to the AI Gateway section, and select Enable inference tables and telemetry.
Unity Catalog location: Select the destination Catalog and Schema prepared in step 2.
(Optional) Table prefix: Enter a prefix for the generated tables. If left blank, there is no prefix. The tables are named <prefix>_otel_logs, <prefix>_otel_spans, and <prefix>_otel_metrics.
Complete the rest of the endpoint configuration (Model selection, Compute settings) and click Create.

To do this with the API:

Enable telemetry using the API

Bash
curl -X POST -H "Authorization: Bearer <your-token>" \
https://<workspace-url>/api/2.0/serving-endpoints \
-d '{
  "name": "my-custom-logging-endpoint",
  "config": {
    "served_entities": [
      {
        "name": "my-model",
        "entity_name": "my-model",
        "entity_version": "1",
        "workload_size": "Small",
        "scale_to_zero_enabled": true
      }
    ]
  },
  "telemetry_config": {
    "table_names": {
      "logs_table": "my_catalog.observability.custom_endpoint_logs",
      "metrics_table": "my_catalog.observability.custom_endpoint_metrics",
      "traces_table": "my_catalog.observability.custom_endpoint_spans"
    }
  }
}'

note

Updating triggers a new deployment. Changes take effect once the deployment completes.

To enable telemetry in the UI:

From the endpoint view page, in the AI Gateway section, click Edit AI Gateway, then select Enable inference tables and telemetry.
Unity Catalog location: Select the destination Catalog and Schema prepared in step 2.
(Optional) Table prefix: Enter a prefix for the generated tables. If left blank, there is no prefix. The tables are named <prefix>_otel_logs, <prefix>_otel_spans, and <prefix>_otel_metrics.
Click Update.

Step 4: Verify and query telemetry data

After the endpoint receives traffic, telemetry data streams to the configured Unity Catalog tables.

Go to Catalog Explorer or the SQL Editor.
Locate the table named <prefix>_otel_logs in your configured schema.

Run a query to verify data is flowing:

SQL
SELECT * FROM <catalog>.<schema>.<prefix>_otel_logs
LIMIT 10;

Query telemetry data

The following examples show common queries.

To view the full schema of any telemetry table, run:

SQL
DESCRIBE TABLE <catalog>.<schema>.<prefix>_otel_logs;

Use these columns to filter and correlate telemetry data:

timestamp
severity_text
body
trace_id
span_id
attributes — a map that contains event-specific metadata.

Check for errors in the last hour

SQL
SELECT
  timestamp,
  severity_text,
  body,
  attributes
FROM <catalog>.<schema>.<prefix>_otel_logs
WHERE
  severity_text = 'ERROR'
  AND timestamp > current_timestamp() - INTERVAL 1 HOUR
ORDER BY timestamp DESC;

Troubleshooting

Logs not appearing in table: The root logging level defaults to WARNING to reduce overhead. To capture lower-severity logs, change the level in your model code:

Python
class MyModel(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        root = logging.getLogger()
        root.setLevel(logging.DEBUG)
        for handler in root.handlers:
            handler.setLevel(logging.DEBUG)

Limitations

The following limits apply to endpoint telemetry:

Schema evolution on the target table is not supported.
Only managed Delta tables are supported. External storage and Arclight default storage are not supported.
The table location must be in the same region as your workspace.
Only table names with ASCII letters, digits, and underscores are supported.
Recreating a target table is not supported.
Only single availability zone (single-az) durability is supported.
Delivery is at-least-once. An acknowledgement from the server means the record is durable and in the Delta table.
Records must be less than 10 MB each.
Requests must be less than 30 MB each.
Log lines must be less than 1 MB each.
Telemetry latency degrades beyond 2500 QPS.
Logs appear in the Unity Catalog table a few seconds after they are emitted.

Requirements​

Step 1: Instrument your model code​

Step 2: Prepare the Unity Catalog destination​

Step 3: Enable endpoint telemetry​

Step 4: Verify and query telemetry data​

Query telemetry data​

Check for errors in the last hour​

Troubleshooting​

Limitations​