Otimize os prompts usando avaliadores personalizados.

Este Notebook mostra como criar avaliadores personalizados usando MLflow make_judge.

Muitas vezes, sistemas integrados de pontuação e avaliação não se adequam a todos os casos de uso. Utilize avaliadores ou juízes personalizados para garantir avaliações precisas e, assim, otimizar seus resultados.

O Notebook orienta você no uso de um avaliador de Markdown que otimiza um prompt para gerar uma saída em um formato mais adequado ao Markdown.

Python
%pip install --upgrade mlflow databricks-sdk dspy openai
dbutils.library.restartPython()

Use o MLflow `make_judge`

O lançamento recente do make_judge pelo MLflow permite que você crie qualquer modelo de juiz personalizado para o seu caso de uso específico.

Python
from mlflow.genai.judges import make_judge

# Create a scorer for customer support quality
markdown_output_judge = make_judge(
    name="markdown_quality",
    instructions=(
        "Evaluate if the answer in {{ outputs }} follows a markdown formatting and accurately answers the question in {{ inputs }} and matches {{ expectations }}. Rate as high, medium or low quality"
    ),
    model="databricks:/databricks-claude-sonnet-4-5"
)

Função objetivo para mapear o feedback

O feedback fornecido pelo juiz precisa ser convertido em um número que o otimizador possa usar. O otimizador também incorpora o feedback do juiz.

Você precisa de uma função para fornecer esse mapeamento de volta ao otimizador.

Python
def feedback_to_score(scores: dict) -> float:
    """Convert feedback values to numerical scores."""
    feedback_value = scores["markdown_quality"]

    # Map categorical feedback to numerical values
    feedback_mapping = {
        "high": 1.0,
        "medium": 0.5,
        "low": 0.0
    }

    # Handle Feedback objects by accessing .value attribute
    if hasattr(feedback_value, 'value'):
        feedback_str = str(feedback_value.value).lower()
    else:
        feedback_str = str(feedback_value).lower()

    return feedback_mapping.get(feedback_str, 0.0)

Teste o modelo

Você pode testar este modelo tal como está. No exemplo a seguir, o modelo não gera saída no formato Markdown.

Python
import mlflow
import openai
from mlflow.genai.optimize import GepaPromptOptimizer
from databricks_openai import DatabricksOpenAI

# Change this to your workspace catalog and schema
catalog = ""
schema = ""
prompt_location = f"{catalog}.{schema}.markdown"

openai_client = DatabricksOpenAI()

# Register initial prompt
prompt = mlflow.genai.register_prompt(
    name=prompt_location,
    template="Answer this question: {{question}}",
)

# Define your prediction function
def predict_fn(question: str) -> str:
    prompt = mlflow.genai.load_prompt(f"prompts:/{prompt_location}/1")
    completion = openai_client.chat.completions.create(
        model="databricks-gpt-oss-20b",
        messages=[{"role": "user", "content": prompt.format(question=question)}],
    )
    return completion.choices[0].message.content

Python
from IPython.display import Markdown

output = predict_fn("What is the capital of France?")

Markdown(output[1]['text'])

execução o otimizador

Alguns dados de exemplo foram fornecidos para você.

Python
# Training data with inputs and expected outputs
dataset = [
    {
        # The inputs schema should match with the input arguments of the prediction function.
        "inputs": {"question": "What is the capital of France?"},
        "expectations": {"expected_response": """## Paris - Capital of France

**Paris** is the capital and largest city of France, located in the *north-central* region.

### Key Facts:
- **Population**: ~2.2 million (city), ~12 million (metro area)
- **Founded**: 3rd century BC
- **Nickname**: *"City of Light"* (La Ville Lumière)

### Notable Landmarks:
1. **Eiffel Tower** - Iconic iron lattice tower
2. **Louvre Museum** - World's largest art museum
3. **Notre-Dame Cathedral** - Gothic masterpiece
4. **Arc de Triomphe** - Monument honoring French soldiers

> Paris is not only the political center but also a global hub for art, fashion, and culture."""},
    },
    {
        "inputs": {"question": "What is the capital of Germany?"},
        "expectations": {"expected_response": """## Berlin - Capital of Germany

**Berlin** is Germany's capital and largest city, situated in the *northeastern* part of the country.

### Historical Significance:
| Period | Importance |
|--------|------------|
| 1961-1989 | Divided by the **Berlin Wall** |
| 1990 | Reunification capital |
| Present | Political & cultural center |

### Must-See Attractions:
1. **Brandenburg Gate** - Neoclassical monument
2. **Reichstag Building** - Seat of German Parliament
3. **Museum Island** - UNESCO World Heritage site
4. **East Side Gallery** - Open-air gallery on Berlin Wall remnants

> *"Ich bin ein Berliner"* - Famous quote by JFK highlighting Berlin's symbolic importance during the Cold War."""},
    },
    {
        "inputs": {"question": "What is the capital of Japan?"},
        "expectations": {"expected_response": """## Tokyo (東京) - Capital of Japan

**Tokyo** is the capital of Japan and the world's most populous metropolitan area, located on the *eastern coast* of Honshu island.

### Demographics & Economy:
- **Population**: ~14 million (city), ~37 million (Greater Tokyo Area)
- **GDP**: One of the world's largest urban economies
- **Status**: Global financial hub and technology center

### Districts & Landmarks:
1. **Shibuya** - Famous crossing and youth culture
2. **Shinjuku** - Business district with Tokyo Metropolitan Government Building
3. **Asakusa** - Historic area with *Sensō-ji Temple*
4. **Akihabara** - Electronics and anime culture hub

### Cultural Blend:
- Ancient temples ⛩️ alongside futuristic skyscrapers 🏙️
- Traditional tea ceremonies 🍵 and cutting-edge technology 🤖

> Tokyo seamlessly combines **centuries-old traditions** with *ultra-modern innovation*, making it a unique global metropolis."""},
    },
    {
        "inputs": {"question": "What is the capital of Italy?"},
        "expectations": {"expected_response": """## Rome (Roma) - The Eternal City

**Rome** is the capital of Italy, famously known as *"The Eternal City"* (*La Città Eterna*), with over **2,750 years** of history.

### Historical Timeline:


753 BC → Founded (according to legend)
27 BC → Capital of Roman Empire
1871 → Capital of unified Italy
Present → Modern capital with ancient roots



### UNESCO World Heritage Sites:
1. **The Colosseum** - Ancient amphitheater (80 AD)
2. **Roman Forum** - Center of ancient Roman life
3. **Pantheon** - Best-preserved ancient Roman building
4. **Vatican City** - Independent city-state within Rome
   - *St. Peter's Basilica*
   - *Sistine Chapel* (Michelangelo's ceiling)

### Famous Quote:
> *"All roads lead to Rome"* - Ancient proverb reflecting Rome's historical importance as the center of the Roman Empire

### Cultural Significance:
- Birthplace of **Western civilization**
- Center of the *Catholic Church*
- Home to countless masterpieces of ***Renaissance art and architecture***"""},
    },
]

# Optimize the prompt
result = mlflow.genai.optimize_prompts(
    predict_fn=predict_fn,
    train_data=dataset,
    prompt_uris=[prompt.uri],
    optimizer=GepaPromptOptimizer(reflection_model="databricks:/databricks-claude-sonnet-4-5"),
    scorers=[markdown_output_judge],
    aggregation=feedback_to_score
)

# Use the optimized prompt
optimized_prompt = result.optimized_prompts[0]
print(f"Optimized template: {optimized_prompt.template}")

Analise suas instruções.

Abra o link para o seu experimento MLflow e siga os passos abaixo para que os prompts apareçam no seu experimento:

Certifique-se de que o tipo de experimento esteja definido como "Aplicativos e agentes GenAI".
Acesse a tab de prompts.
Clique em "Selecionar um esquema" no canto superior direito e insira o mesmo esquema que você definiu acima para ver o seu prompt.

Carregue o novo prompt e teste novamente.

Analise a aparência do prompt e carregue-o na sua função de previsão para ver como o modelo se comporta de forma diferente.

Python
from IPython.display import Markdown
prompt = mlflow.genai.load_prompt(f"prompts:/{prompt_location}/10")

Markdown(prompt.template)

Python
from IPython.display import Markdown

def predict_fn(question: str) -> str:
    prompt = mlflow.genai.load_prompt(f"prompts:/{prompt_location}/10")
    completion = openai_client.chat.completions.create(
        model="databricks-gpt-oss-20b",
        # load prompt template using PromptVersion.format()
        messages=[{"role": "user", "content": prompt.format(question=question)}],
    )
    return completion.choices[0].message.content

output = predict_fn("What is the capital of France?.")

Markdown(output[1]['text'])

Exemplo de caderno

Segue abaixo um Notebook executável que demonstra a otimização de prompts usando avaliadores personalizados.

Otimização imediata usando avaliadores personalizados

Open notebook in new tab

Use o MLflow make_judge​

Função objetivo para mapear o feedback​

Teste o modelo​

execução o otimizador​

Analise suas instruções.​

Carregue o novo prompt e teste novamente.​

Exemplo de caderno​