MLflow Tracing Integrações

MLflow Tracing está integrado a uma grande variedade de estruturas e biblioteca Generative AI populares, oferecendo uma experiência de rastreamento automático de uma linha para todos eles. Isso permite que você obtenha observabilidade imediata em seus aplicativos GenAI com configuração mínima.

Esse amplo suporte significa que você pode obter observabilidade sem alterações significativas no código, aproveitando as ferramentas que você já utiliza. Para componentes personalizados ou bibliotecas não suportadas, MLflow também fornece APIsde rastreamento manual poderosas.

O rastreamento automático captura a lógica e as etapas intermediárias do seu aplicativo, como chamadas LLM, uso de ferramentas e interações de agentes, com base na sua implementação da biblioteca ou do SDK específico.

nota

Em serverless compute clusters, o autologging para estruturas de rastreamento genAI não é ativado automaticamente. Você deve habilitar explicitamente o registro automático chamando a função mlflow.<library>.autolog() apropriada para as integrações específicas que você deseja rastrear.

Visão geral das principais integrações

Aqui estão exemplos rápidos de algumas das integrações mais comumente usadas. Clique em tab para ver um exemplo básico de uso. Para obter pré-requisitos detalhados e cenários mais avançados para cada um, visite suas páginas de integração dedicadas (vinculadas a partir da guia ou da lista abaixo).

Python
import mlflow
import openai

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Set up MLflow tracking
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/openai-tracing-demo")

openai_client = openai.OpenAI()

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?",
    }
]

response = openai_client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=0.1,
    max_tokens=100,
)
# View trace in MLflow UI

Guia completo de integração do OpenAI

Python
import mlflow
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.langchain.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/langchain-tracing-demo")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7, max_tokens=1000)
prompt = PromptTemplate.from_template("Tell me a joke about {topic}.")
chain = prompt | llm | StrOutputParser()

chain.invoke({"topic": "artificial intelligence"})
# View trace in MLflow UI

Guia completo de integração LangChain

Python
import mlflow
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.langchain.autolog() # LangGraph uses LangChain's autolog

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/langgraph-tracing-demo")

@tool
def get_weather(city: str):
    """Use this to get weather information."""
    return f"It might be cloudy in {city}"

llm = ChatOpenAI(model="gpt-4o-mini")
graph = create_react_agent(llm, [get_weather])
result = graph.invoke({"messages": [("user", "what is the weather in sf?")]})
# View trace in MLflow UI

Guia completo de integração do LangGraph

Python
import mlflow
import anthropic
import os

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.anthropic.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/anthropic-tracing-demo")

client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude"}],
)
# View trace in MLflow UI

Guia completo de integração Anthropic

Python
import mlflow
import dspy

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.dspy.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/dspy-tracing-demo")

lm = dspy.LM("openai/gpt-4o-mini") # Assumes OPENAI_API_KEY is set
dspy.configure(lm=lm)

class SimpleSignature(dspy.Signature):
    input_text: str = dspy.InputField()
    output_text: str = dspy.OutputField()

program = dspy.Predict(SimpleSignature)
result = program(input_text="Summarize MLflow Tracing.")
# View trace in MLflow UI

Guia completo de integração DSPy

Python
import mlflow
import os
from openai import OpenAI # Databricks FMAPI uses OpenAI client

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.openai.autolog() # Traces Databricks FMAPI using OpenAI client

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/databricks-fmapi-tracing")

client = OpenAI(
    api_key=os.environ.get("DATABRICKS_TOKEN"),
    base_url=f"{os.environ.get('DATABRICKS_HOST')}/serving-endpoints"
)
response = client.chat.completions.create(
    model="databricks-llama-4-maverick",
    messages=[{"role": "user", "content": "Key features of MLflow?"}],
)
# View trace in MLflow UI

Guia completo de integração Databricks

Python
import mlflow
import boto3

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# import os
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.bedrock.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/bedrock-tracing-demo")

bedrock = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1" # Replace with your region
)
response = bedrock.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    messages=[{"role": "user", "content": "Hello World in one line."}]
)
# View trace in MLflow UI

Guia completo de integração do Bedrock

Python
import mlflow
from autogen import ConversableAgent
import os

# If running this code outside of a Databricks notebook (e.g., locally),
# uncomment and set the following environment variables to point to your Databricks workspace:
# os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
# os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

mlflow.autogen.autolog()

mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/autogen-tracing-demo")

config_list = [{"model": "gpt-4o-mini", "api_key": os.environ.get("OPENAI_API_KEY")}]
assistant = ConversableAgent("assistant", llm_config={&quot;config_list&quot;: config_list})
user_proxy = ConversableAgent("user_proxy", human_input_mode="NEVER", code_execution_config=False)

user_proxy.initiate_chat(assistant, message="What is 2+2?")
# View trace in MLflow UI

Guia completo de integração do AutoGen

Gerenciamento seguro de chaves API

Para ambientes de produção, Databricks recomenda que o senhor use AI Gateway ou Databricks secrets para gerenciar API key. AI O Gateway é o método preferido e oferece recursos adicionais de governança.

atenção

Nunca commit API chave diretamente em seu código ou Notebook. Sempre use os segredos do AI Gateway ou do Databricks para credenciais confidenciais.

AI Gateway (Recommended)
Databricks secrets

Databricks recomenda AI Gateway para governar e monitorar o acesso a modelos AI de primeira geração.

Crie um endpoint do Foundation Model configurado com o AI Gateway:

Em seu site Databricks workspace, vá para Serving > Create new endpoint .
Escolha um tipo de endpoint e um provedor.
Configure o endpoint com seu API key.
Durante a configuração do endpoint, ative o AI Gateway e configure a limitação de taxa, o fallback e os guardrails, conforme necessário.
O senhor pode obter um código gerado automaticamente para começar a consultar o endpoint rapidamente. Acesse Serving > seu endpoint > Use > Query . Certifique-se de adicionar o código de rastreamento:

Python
import mlflow
from openai import OpenAI
import os

# How to get your Databricks token: https://docs.databricks.com/en/dev-tools/auth/pat.html
# DATABRICKS_TOKEN = os.environ.get('DATABRICKS_TOKEN')
# Alternatively in a Databricks notebook you can use this:
DATABRICKS_TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

# Enable auto-tracing for OpenAI
mlflow.openai.autolog()

# Set up MLflow tracking (if running outside Databricks)
# If running in a Databricks notebook, these are not needed.
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/my-genai-app")

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="<YOUR_HOST_URL>/serving-endpoints"
)

chat_completion = client.chat.completions.create(
  messages=[
  {
    "role": "system",
    "content": "You are an AI assistant"
  },
  {
    "role": "user",
    "content": "What is MLflow?"
  }
  ],
  model="<YOUR_ENDPOINT_NAME>",
  max_tokens=256
)

print(chat_completion.choices[0].message.content)

Use os segredos deDatabricks para gerenciar a chave API:

Crie um escopo secreto e armazene seu API key:

Python
from databricks.sdk import WorkspaceClient

# Set your secret scope and key names
secret_scope_name = "llm-secrets"  # Choose an appropriate scope name
secret_key_name = "api-key"        # Choose an appropriate key name

# Create the secret scope and store your API key
w = WorkspaceClient()
w.secrets.create_scope(scope=secret_scope_name)
w.secrets.put_secret(
    scope=secret_scope_name,
    key=secret_key_name,
    string_value="your-api-key-here"  # Replace with your actual API key
)

Recupere e use o segredo em seu código:

Python
import mlflow
import openai
import os

# Configure your secret scope and key names
secret_scope_name = "llm-secrets"
secret_key_name = "api-key"

# Retrieve the API key from Databricks secrets
os.environ["OPENAI_API_KEY"] = dbutils.secrets.get(
    scope=secret_scope_name,
    key=secret_key_name
)

# Enable automatic tracing
mlflow.openai.autolog()

# Use OpenAI client with securely managed API key
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Explain MLflow Tracing"}],
    max_tokens=100
)

Habilitando várias integrações de rastreamento automático

Como os aplicativos GenAI geralmente combinam várias bibliotecas, o site MLflow Tracing permite que o senhor ative o rastreamento automático para várias integrações simultaneamente, proporcionando uma experiência de rastreamento unificada.

Por exemplo, para ativar o LangChain e o rastreamento direto do OpenAI:

Python
import mlflow

# Enable MLflow Tracing for both LangChain and OpenAI
mlflow.langchain.autolog()
mlflow.openai.autolog()

# Your code using both LangChain and OpenAI directly...
# ... an example can be found on the Automatic Tracing page ...

O MLflow gerará um rastreamento único e coeso que combina as etapas das chamadas LangChain e LLM diretas do OpenAI, permitindo que o senhor inspecione o fluxo completo. Mais exemplos de combinação de integrações podem ser encontrados na página Rastreamento automático.

Desativando o rastreamento automático

O rastreamento automático de qualquer biblioteca específica pode ser desativado chamando mlflow.<library>.autolog(disable=True). Para desativar todas as integrações de registro automático de uma só vez, use mlflow.autolog(disable=True).

Python
import mlflow

# Disable for a specific library
mlflow.openai.autolog(disable=True)

# Disable all autologging
mlflow.autolog(disable=True)

Visão geral das principais integrações​

Gerenciamento seguro de chaves API​

Habilitando várias integrações de rastreamento automático​

Desativando o rastreamento automático​

Visão geral das principais integrações

Gerenciamento seguro de chaves API

Habilitando várias integrações de rastreamento automático

Desativando o rastreamento automático