Scorer lifecycle management API reference

Beta

This feature is in Beta.

Production monitoring enables continuous quality assessment of your GenAI applications by automatically running scorers on live traffic. Using MLflow, you can register any scorer (including custom metrics and built-in or custom LLM Judges) and manage their lifecycle through registration, activation, updates, and deletion.

Production monitoring includes the following capabilities:

Register scorers and manage their lifecycle through object-oriented methods.
Configurable sampling rates so you can control the tradeoff between coverage and computational cost.
Use the same scorers in development and production to ensure consistent evaluation.
Continuous quality assessment with monitoring running in the background.
Assessment results are automatically attached as feedback to evaluated traces.
Immutable operations that return new scorer instances for better state management.

For information about legacy product monitoring, see Production monitoring API reference (legacy).

Scorer Lifecycle Overview

The new scorer lifecycle provides a clear progression through distinct states:

Unregistered: Scorer function exists locally but is not known to the server
Registered: Scorer is saved in MLFlow with a name (use .register())
Active: Scorer is running with a sample rate > 0 (use .start())
Stopped: Scorer is registered but not running (sample rate = 0, use .stop())
Deleted: Scorer is completely removed from the server (use .delete())

All lifecycle operations are immutable - they return new scorer instances rather than modifying the original.

API Reference

Scorer Instance Methods

`Scorer.register()`

API Reference: Scorer.register

Python
@scorer
def custom_scorer(outputs):
    return len(str(outputs.get("response", "")))

# Register the custom scorer
my_scorer = custom_scorer.register(name="response_length")

Parameters:

name (str): Unique name for the scorer within the experiment. Defaults to the existing name of the scorer.

Returns: New Scorer instance with server registration

`Scorer.start()`

API Reference: Scorer.start

Begin online evaluation with the specified sampling configuration.

Python
from mlflow.genai.scorers import ScorerSamplingConfig

# Start monitoring with sampling
active_scorer = registered_scorer.start(
    sampling_config=ScorerSamplingConfig(
        sample_rate=0.5,
        filter_string="trace.status = 'OK'"
    ),
)

Parameters:

name (str): Name of the scorer. If not provided, defaults to the current name of the scorer.
sampling_config (ScorerSamplingConfig): Trace sampling configuration
- sample_rate (float): Fraction of traces to evaluate (0.0-1.0). Default: 1.0
- filter_string (str, optional): MLflow-compatible filter for trace selection

Returns: New Scorer instance in active state

`Scorer.update()`

API Reference: Scorer.update

Modify the sampling configuration of an active scorer. This is an immutable operation.

Python
# Update sampling rate (returns new scorer instance)
updated_scorer = active_scorer.update(
    sampling_config=ScorerSamplingConfig(
        sample_rate=0.8,
    ),
)

# Original scorer remains unchanged
print(f"Original: {active_scorer.sample_rate}")  # 0.5
print(f"Updated: {updated_scorer.sample_rate}")   # 0.8

Parameters:

name (str): Name of the scorer. If not provided, defaults to the current name of the scorer.
sampling_config (ScorerSamplingConfig): Trace sampling configuration
- sample_rate (float): Fraction of traces to evaluate (0.0-1.0). Default: 1.0
- filter_string (str, optional): MLflow-compatible filter for trace selection

Returns: New Scorer instance with updated configuration

`Scorer.stop()`

API Reference: Scorer.stop

Stop online evaluation by setting sample rate to 0. Keeps the scorer registered.

Python
# Stop monitoring but keep scorer registered
stopped_scorer = active_scorer.stop()
print(f"Sample rate: {stopped_scorer.sample_rate}")  # 0

Parameters:

name (str): Name of the scorer. If not provided, defaults to the current name of the scorer.

Returns: New Scorer instance with sample_rate=0

Scorer Registry Functions

`mlflow.genai.scorers.get_scorer()`

API Reference: get_scorer

Retrieve a registered scorer by name.

Python
from mlflow.genai.scorers import get_scorer

# Get existing scorer by name
existing_scorer = get_scorer(name="safety_monitor")
print(f"Current sample rate: {existing_scorer.sample_rate}")

Parameters:

name (str): Name of the registered scorer

Returns: Scorer instance

`mlflow.genai.scorers.list_scorers()`

List all registered scorers for the current experiment.

Python
from mlflow.genai.scorers import list_scorers

# List all registered scorers
all_scorers = list_scorers()
for scorer in all_scorers:
    print(f"Name: {scorer._server_name}")
    print(f"Sample rate: {scorer.sample_rate}")
    print(f"Filter: {scorer.filter_string}")

Returns: List of Scorer instances

mlflow.genai.scorers.delete_scorer()

API Reference: delete_scorer

Delete a registered scorer by name.

Python
from mlflow.genai.scorers import delete_scorer

# Delete existing scorer by name
delete_scorer(name="safety_monitor")

Parameters:

name (str): Name of the registered scorer

Returns: None

Scorer Properties

`Scorer.sample_rate`

Current sampling rate (0.0-1.0). Returns 0 for stopped scorers.

Python
print(f"Sampling {scorer.sample_rate * 100}% of traces")

`Scorer.filter_string`

Current trace filter string for MLflow trace selection.

Python
print(f"Filter: {scorer.filter_string}")

Configuration Classes

`ScorerSamplingConfig`

API Reference: ScorerSamplingConfig

Data class that holds sampling configuration for a scorer.

Python
from mlflow.genai import ScorerSamplingConfig

config = ScorerSamplingConfig(
    sample_rate=0.5,
    filter_string="trace.status = 'OK'"
)

Attributes:

sample_rate (float, optional): Sampling rate between 0.0 and 1.0
filter_string (str, optional): MLflow trace filter

Usage Patterns

Basic Scorer Lifecycle

Python
from mlflow.genai.scorers import Safety, scorer, ScorerSamplingConfig

# Built-in scorer lifecycle
safety_scorer = Safety().register(name="safety_check")
safety_scorer = safety_scorer.start(
    sampling_config=ScorerSamplingConfig(sample_rate=1.0),
)
safety_scorer = safety_scorer.update(
    sampling_config=ScorerSamplingConfig(sample_rate=0.8),
)
safety_scorer = safety_scorer.stop()
safety_scorer.delete()

# Custom scorer lifecycle
@scorer
def response_length(outputs):
    return len(str(outputs.get("response", "")))

length_scorer = response_length.register(name="length_check")
length_scorer = length_scorer.start(
    sampling_config=ScorerSamplingConfig(sample_rate=0.5),
)

Managing Multiple Scorers

Python
from mlflow.genai.scorers import Safety, Guidelines, list_scorers

# Register multiple scorers
safety_scorer = Safety().register(name="safety")
safety_scorer = safety_scorer.start(
    sampling_config=ScorerSamplingConfig(sample_rate=1.0),
)

guidelines_scorer = Guidelines(
    name="english",
    guidelines=["Response must be in English"]
).register(name="english_check")
guidelines_scorer = guidelines_scorer.start(
    sampling_config=ScorerSamplingConfig(sample_rate=0.5),
)

# List and manage all scorers
all_scorers = list_scorers()
for scorer in all_scorers:
    if scorer.sample_rate > 0:
        print(f"{scorer.name} is active")
    else:
        print(f"{scorer.name} is stopped")

Immutable Updates

Python
# Demonstrate immutability
original_scorer = Safety().register(name="safety")
original_scorer = original_scorer.start(
   sampling_config=ScorerSamplingConfig(sample_rate=0.3),
)

# Update returns new instance
updated_scorer = original_scorer.update(
    sampling_config=ScorerSamplingConfig(sample_rate=0.8),
)

# Original remains unchanged
print(f"Original: {original_scorer.sample_rate}")  # 0.3
print(f"Updated: {updated_scorer.sample_rate}")    # 0.8

Metric backfill

`backfill_scorers()`

Python
from databricks.agents.scorers import backfill_scorers, BackfillScorerConfig

job_id = backfill_scorers(
    experiment_id="your-experiment-id",
    scorers=[
        BackfillScorerConfig(scorer=safety_scorer, sample_rate=0.8),
        BackfillScorerConfig(scorer=response_length, sample_rate=0.9)
    ],
    start_time=datetime(2024, 1, 1),
    end_time=datetime(2024, 1, 31)
)

Parameters:

All parameters are keyword-only.

experiment_id (str, optional): The ID of the experiment to backfill. If not provided, uses the current experiment context
scorers (Union[List[BackfillScorerConfig], List[str]], required): List of BackfillScorerConfig objects with custom sample rates (if sample_rate is not provided in BackfillScorerConfig, defaults to the registered scorer's sample rate), OR list of scorer names (strings) to use current sample rates from the experiment's scheduled scorers. Cannot be empty.
start_time (datetime, optional): Start time for backfill evaluation. If not provided, no start time constraint is applied
end_time (datetime, optional): End time for backfill evaluation. If not provided, no end time constraint is applied

Returns: Job ID of the created backfill job for status tracking (str)

Best Practices

Scorer State Management

Check scorer state before operations using sample_rate property
Use immutable pattern - assign results of .start(), .update(), .stop() to variables
Understand lifecycle - .stop() preserves registration, .delete() removes entirely

Naming and Organization

Use descriptive names that indicate the scorer's purpose
Follow naming conventions like "safety_check", "relevance_monitor"
Names must be unique within an experiment (maximum 20 scorers per experiment)

Sampling Strategy

Critical scorers: Use sample_rate=1.0 for safety and security checks
Expensive scorers: Use lower sample rates (0.05-0.2) for complex LLM judges
Development scorers: Use moderate rates (0.3-0.5) for iterative improvement

Metric backfill

Start small: Begin with smaller time ranges to estimate job duration and resource usage
Appropriate sample rates: Consider cost and time implications of high sample rates

Next steps

Run scorers in production - Step-by-step guide to enable monitoring.
Build evaluation datasets - Use monitoring results to improve quality.
Create custom scorers - Build scorers tailored to your needs.

Scorer Lifecycle Overview​

API Reference​

Scorer Instance Methods​

Scorer.register()​

Scorer.start()​

Scorer.update()​

Scorer.stop()​

Scorer Registry Functions​

mlflow.genai.scorers.get_scorer()​

mlflow.genai.scorers.list_scorers()​

mlflow.genai.scorers.delete_scorer()​

Scorer Properties​

Scorer.sample_rate​

Scorer.filter_string​

Configuration Classes​

ScorerSamplingConfig​

Usage Patterns​

Basic Scorer Lifecycle​

Managing Multiple Scorers​

Immutable Updates​

Metric backfill​

backfill_scorers()​

Best Practices​

Scorer State Management​

Naming and Organization​

Sampling Strategy​

Metric backfill​

Next steps​

Scorer Lifecycle Overview

API Reference

Scorer Instance Methods

`Scorer.register()`

`Scorer.start()`

`Scorer.update()`

`Scorer.stop()`

Scorer Registry Functions

`mlflow.genai.scorers.get_scorer()`

`mlflow.genai.scorers.list_scorers()`

mlflow.genai.scorers.delete_scorer()

Scorer Properties

`Scorer.sample_rate`

`Scorer.filter_string`

Configuration Classes

`ScorerSamplingConfig`

Usage Patterns

Basic Scorer Lifecycle

Managing Multiple Scorers

Immutable Updates

Metric backfill

`backfill_scorers()`

Best Practices

Scorer State Management

Naming and Organization

Sampling Strategy

Metric backfill

Next steps