Skip to main content

Version Tracking API Reference

Overview

MLflow version tracking enables you to create versioned representations of your GenAI applications using the LoggedModel entity. This page provides the API reference for tracking application versions in MLflow.

Why Version Your GenAI Application?

Reproducibility: Capture or link to the exact code (e.g., Git commit hash) and configurations used for a specific version, ensuring you can always reconstruct it.

Debugging Regressions: Track LoggedModel versions to easily compare problematic versions against known good versions by examining differences in code, configurations, evaluation results, and traces.

Objective Comparison: Systematically evaluate versions using mlflow.genai.evaluate() to compare metrics like quality scores, cost, and latency side-by-side.

Auditability: Each LoggedModel version serves as an auditable record, linking to specific code and configurations for compliance and incident investigation.

Core Concepts

LoggedModel

A LoggedModel in MLflow represents a specific version of your GenAI application. Each distinct state of your application that you want to evaluate, deploy, or refer back to can be captured as a new LoggedModel.

Key characteristics:

  • Central versioned entity for your GenAI application
  • Captures application state including configuration and parameters
  • Links to external code (typically via Git commit hash)
  • Tracks lifecycle from development through production

Version Tracking Methods

MLflow provides two approaches for version tracking:

  1. set_active_model: Simple version tracking that automatically creates a LoggedModel if needed and links subsequent traces
  2. create_external_model: Full control over version creation with extensive metadata, parameters, and tags

API Reference

set_active_model

Links traces to a specific LoggedModel version. If a model with the given name doesn't exist, it automatically creates one.

Python
def set_active_model(
name: Optional[str] = None,
model_id: Optional[str] = None
) -> ActiveModel:

Parameters

Parameter

Type

Required

Description

name

str | None

No*

Name of the model. If model doesn't exist, creates a new one

model_id

str | None

No*

ID of an existing LoggedModel

*Either name or model_id must be provided.

Return Value

Returns an ActiveModel object (subclass of LoggedModel) that can be used as a context manager.

Example Usage

Python
import mlflow

# Simple usage - creates model if it doesn't exist
mlflow.set_active_model(name="my-agent-v1.0")

# Use as context manager
with mlflow.set_active_model(name="my-agent-v2.0") as model:
print(f"Model ID: {model.model_id}")
# Traces within this context are linked to this model

# Use with existing model ID
mlflow.set_active_model(model_id="existing-model-id")

create_external_model

Creates a new LoggedModel for applications whose code and artifacts are stored outside MLflow (e.g., in Git).

Python
def create_external_model(
name: Optional[str] = None,
source_run_id: Optional[str] = None,
tags: Optional[dict[str, str]] = None,
params: Optional[dict[str, str]] = None,
model_type: Optional[str] = None,
experiment_id: Optional[str] = None,
) -> LoggedModel:

Parameters

Parameter

Type

Required

Description

name

str | None

No

Model name. If not specified, a random name is generated

source_run_id

str | None

No

ID of the associated run. Defaults to active run ID if within a run context

tags

dict[str, str] | None

No

Key-value pairs for organization and filtering

params

dict[str, str] | None

No

Model parameters and configuration (must be strings)

model_type

str | None

No

User-defined type for categorization (e.g., "agent", "rag-system")

experiment_id

str | None

No

Experiment to associate with. Uses active experiment if not specified

Return Value

Returns a LoggedModel object with:

  • model_id: Unique identifier for the model
  • name: The assigned model name
  • experiment_id: Associated experiment ID
  • creation_timestamp: When the model was created
  • status: Model status (always "READY" for external models)
  • tags: Dictionary of tags
  • params: Dictionary of parameters

Example Usage

Python
import mlflow

# Basic usage
model = mlflow.create_external_model(
name="customer-support-agent-v1.0"
)

# With full metadata
model = mlflow.create_external_model(
name="recommendation-engine-v2.1",
model_type="rag-agent",
params={
"llm_model": "gpt-4",
"temperature": "0.7",
"max_tokens": "1000",
"retrieval_k": "5"
},
tags={
"team": "ml-platform",
"environment": "staging",
"git_commit": "abc123def"
}
)

# Within a run context
with mlflow.start_run() as run:
model = mlflow.create_external_model(
name="my-agent-v3.0",
source_run_id=run.info.run_id
)

LoggedModel Class

The LoggedModel class represents a versioned model in MLflow.

Properties

Property

Type

Description

model_id

str

Unique identifier for the model

name

str

Model name

experiment_id

str

Associated experiment ID

creation_timestamp

int

Creation time (milliseconds since epoch)

last_updated_timestamp

int

Last update time (milliseconds since epoch)

model_type

str | None

User-defined model type

source_run_id

str | None

ID of the run that created this model

status

LoggedModelStatus

Model status (READY, FAILED_REGISTRATION, etc.)

tags

dict[str, str]

Dictionary of tags

params

dict[str, str]

Dictionary of parameters

model_uri

str

URI for referencing the model (e.g., "models:/model_id")

Common Patterns

Version Tracking with Git Integration

Python
import mlflow
import subprocess

# Get current git commit
git_commit = subprocess.check_output(["git", "rev-parse", "HEAD"]).decode().strip()[:8]

# Create versioned model name
model_name = f"my-app-git-{git_commit}"

# Track the version
model = mlflow.create_external_model(
name=model_name,
tags={"git_commit": git_commit}
)

Linking Traces to Versions

Python
import mlflow

# Set active model - all subsequent traces will be linked
mlflow.set_active_model(name="my-agent-v1.0")

# Your application code with tracing
@mlflow.trace
def process_request(query: str):
# This trace will be automatically linked to my-agent-v1.0
return f"Processing: {query}"

# Run the application
result = process_request("Hello world")

Production Deployment

In production, use environment variables instead of calling set_active_model():

Bash
# Set the model ID that traces should be linked to
export MLFLOW_ACTIVE_MODEL_ID="my-agent-v1.0"

Best Practices

  1. Use semantic versioning in model names (e.g., "app-v1.2.3")
  2. Include git commits in tags for traceability
  3. Parameters must be strings - convert numbers and booleans
  4. Use model_type to categorize similar applications
  5. Set active model before tracing to ensure proper linkage

Common Issues

Invalid parameter types:

Python
# Error: Parameters must be strings
# Wrong:
params = {"temperature": 0.7, "max_tokens": 1000}

# Correct:
params = {"temperature": "0.7", "max_tokens": "1000"}

Next Steps