Tracing Semantic Kernel
Semantic Kernel is a lightweight, open source SDK that acts as AI middleware for C#, Python, and Java. It abstracts model interactions and composes prompts, functions, and plugins across providers.
MLflow Tracing integrates with Semantic Kernel to automatically instrument kernel callbacks and capture comprehensive execution traces. No changes to your app logic are required—enable it with mlflow.semantic_kernel.autolog
.
The integration provides a complete view of:
- Prompts and completion responses
- Chat history and messages
- Latencies
- Model name and provider
- Kernel functions and plugins
- Template variables and arguments
- Token usage information
- Any exceptions if raised
Streaming is not currently traced.
Prerequisites
To use MLflow Tracing with Semantic Kernel, you need to install MLflow and the relevant Semantic Kernel packages.
- Development
- Production
For development environments, install the full MLflow package with Databricks extras and Semantic Kernel:
pip install --upgrade "mlflow[databricks]>=3.1" semantic_kernel openai
The full mlflow[databricks]
package includes all features for local development and experimentation on Databricks.
For production deployments, install mlflow-tracing
and Semantic Kernel:
pip install --upgrade mlflow-tracing semantic_kernel openai
The mlflow-tracing
package is optimized for production use.
MLflow 3 is recommended for the best tracing experience.
Before running the examples, you'll need to configure your environment:
For users outside Databricks notebooks: Set your Databricks environment variables:
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"
For users inside Databricks notebooks: These credentials are automatically set for you.
API Keys: Ensure your LLM provider API keys are configured. For production environments, use Mosaic AI Gateway or Databricks secrets instead of hardcoded values for secure API key management.
export OPENAI_API_KEY="your-openai-api-key"
# Add other provider keys as needed
Example usage
Semantic Kernel primarily uses async patterns. In notebooks, you can await
directly; in scripts, wrap with asyncio.run()
.
import mlflow
mlflow.semantic_kernel.autolog()
import openai
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
openai_client = openai.AsyncOpenAI()
kernel = Kernel()
kernel.add_service(
OpenAIChatCompletion(
service_id="chat-gpt",
ai_model_id="gpt-4o-mini",
async_client=openai_client,
)
)
answer = await kernel.invoke_prompt("Is sushi the best food ever?")
print("AI says:", answer)
Token usage tracking
MLflow 3.2.0+ records token usage per LLM call and aggregates totals in the trace info.
import mlflow
last_trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id=last_trace_id)
print(trace.info.token_usage)
for span in trace.data.spans:
usage = span.get_attribute("mlflow.chat.tokenUsage")
if usage:
print(span.name, usage)
Disable auto-tracing
Disable Semantic Kernel auto-tracing with mlflow.semantic_kernel.autolog(disable=True)
or disable all with mlflow.autolog(disable=True)
.