Skip to main content

Tracing Semantic Kernel

Semantic Kernel is a lightweight, open source SDK that acts as AI middleware for C#, Python, and Java. It abstracts model interactions and composes prompts, functions, and plugins across providers.

MLflow Tracing integrates with Semantic Kernel to automatically instrument kernel callbacks and capture comprehensive execution traces. No changes to your app logic are required—enable it with mlflow.semantic_kernel.autolog.

The integration provides a complete view of:

  • Prompts and completion responses
  • Chat history and messages
  • Latencies
  • Model name and provider
  • Kernel functions and plugins
  • Template variables and arguments
  • Token usage information
  • Any exceptions if raised
note

Streaming is not currently traced.

Prerequisites

To use MLflow Tracing with Semantic Kernel, you need to install MLflow and the relevant Semantic Kernel packages.

For development environments, install the full MLflow package with Databricks extras and Semantic Kernel:

Bash
pip install --upgrade "mlflow[databricks]>=3.1" semantic_kernel openai

The full mlflow[databricks] package includes all features for local development and experimentation on Databricks.

note

MLflow 3 is recommended for the best tracing experience.

Before running the examples, you'll need to configure your environment:

For users outside Databricks notebooks: Set your Databricks environment variables:

Bash
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"

For users inside Databricks notebooks: These credentials are automatically set for you.

API Keys: Ensure your LLM provider API keys are configured. For production environments, use Mosaic AI Gateway or Databricks secrets instead of hardcoded values for secure API key management.

Bash
export OPENAI_API_KEY="your-openai-api-key"
# Add other provider keys as needed

Example usage

Semantic Kernel primarily uses async patterns. In notebooks, you can await directly; in scripts, wrap with asyncio.run().

Python
import mlflow

mlflow.semantic_kernel.autolog()
Python
import openai
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion

openai_client = openai.AsyncOpenAI()

kernel = Kernel()
kernel.add_service(
OpenAIChatCompletion(
service_id="chat-gpt",
ai_model_id="gpt-4o-mini",
async_client=openai_client,
)
)

answer = await kernel.invoke_prompt("Is sushi the best food ever?")
print("AI says:", answer)

Token usage tracking

MLflow 3.2.0+ records token usage per LLM call and aggregates totals in the trace info.

Python
import mlflow

last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)

print(trace.info.token_usage)
for span in trace.data.spans:
usage = span.get_attribute("mlflow.chat.tokenUsage")
if usage:
print(span.name, usage)

Disable auto-tracing

Disable Semantic Kernel auto-tracing with mlflow.semantic_kernel.autolog(disable=True) or disable all with mlflow.autolog(disable=True).