Get started with MLflow 3

This article gets you started with MLflow 3. It describes how to install MLflow 3 and includes several demo notebooks to get started. It also includes links to pages that cover the new features of MLflow 3 in more detail.

What is MLflow 3 and how is it different from the existing MLflow version?

MLflow 3 on Databricks delivers state-of-the-art experiment tracking, observability, and performance evaluation for machine learning models, generative AI applications, and agents on the Databricks lakehouse. MLflow 3 introduces significant new capabilities while preserving core tracking concepts, making migration from 2.x quick and simple. Using MLflow 3 on Databricks, you can:

Centrally track and analyze the performance of your models, AI applications, and agents across all environments, from interactive queries in a development notebook through production batch or real-time serving deployments.

Model tracking UI.

View and access model metrics and parameters from the model version page in Unity Catalog and from the REST API, across all workspaces and experiments.

Model version page in Unity Catalog showing metrics from multiple runs.

Annotate requests and responses (traces) for comprehensive end-to-end observability of all of your gen AI applications and agents, enabling human experts and automated LLM-as-a-judge techniques to provide rich feedback. You can leverage this feedback to assess and compare the performance of application versions and to build datasets for improving quality.

Traces tab of model page showing details of multiple traces.

Evaluate GenAI applications at scale using the new mlflow.genai.evaluate() API with built-in and custom LLM judges for correctness, relevance, safety, and more, assessing the quality of your GenAI applications during development and production.

Evaluations tab of experiment page showing details of multiple assessments.

Orchestrate evaluation and deployment workflows using Unity Catalog and access comprehensive status logs for each version of your model, AI application, or agent.

A complex deployment job that includes staged rollout and metrics collection.

These capabilities simplify and streamline evaluation, deployment, debugging, and monitoring for all of your AI initiatives.

GenAI Observability and Evaluation

MLflow 3 introduces comprehensive GenAI capabilities combining tracing observability and AI-powered tools to reliably measure GenAI quality, enabling you to monitor and improve the quality of your applications throughout their lifecycle. You can now leverage the MLflow Experiment UI for real-time dashboarding and monitoring of traces from production applications, whether deployed on Databricks or externally, annotating production traces with feedback and building datasets for future iterations.

MLflow 3 provides first-class support for LLM judge and human feedback directly on MLflow Traces using the new Assessments feature. The new mlflow.genai.evaluate() API offers a simpler, more powerful approach to evaluation, integrating LLM judges powered by Agent Evaluation into the MLflow SDK. With support for both pre-built and custom scorers, you can be confident in the quality of your GenAI applications before deployment. Furthermore, Databricks Agent Evaluation APIs for judges, datasets, and labeling sessions (Review App) are now unified under the mlflow.genai namespace for a seamless experience. For further details, see MLflow 3 for GenAI.

Logged Models

Much of the new functionality of MLflow 3 derives from the new concept of a LoggedModel. When developing generative AI applications or agents, developers can create LoggedModels to capture git commits or sets of parameters as objects that can be linked to traces and metrics. For deep learning and classical ML applications, LoggedModels elevates the concept of a model produced by a training run, establishing it as a dedicated object to track the model lifecycle across different training and evaluation runs.

LoggedModels capture metrics, parameters, and traces across phases of development (training and evaluation) and across environments (development, staging, and production). When a LoggedModel is promoted to Unity Catalog as a Model Version, all performance data from the original LoggedModel becomes visible on the UC Model Version page, providing visibility across all workspaces and experiments. For more details, see Track and compare models using MLflow Logged Models.

Deployment jobs

MLflow 3 also introduces the concept of a deployment job. Deployment jobs use Lakeflow Jobs to manage the model lifecycle, including steps like evaluation, approval, and deployment. These model workflows are governed by Unity Catalog, and all events are saved to an activity log that is available on the model version page in Unity Catalog.

Migrating from MLflow 2.x

Although there are many new features in MLflow 3, the core concepts of experiments and runs, along with their metadata such as parameters, tags, and metrics, all remain the same. Migration from MLflow 2.x to 3.0 is very straightforward and should require minimal code changes in most cases. This section highlights some key differences from MLflow 2.x and what you should be aware of for a seamless transition.

Logging Models

When logging models in 2.x, the artifact_path parameter is used.

with mlflow.start_run():
    mlflow.pyfunc.log_model(
        artifact_path="model",
        python_model=python_model,
        ...
    )

In MLflow 3, use name instead, which allows the model to later be searched by name. The artifact_path parameter is still supported but has been deprecated. Additionally, MLflow no longer requires a run to be active when logging a model, because models have become first-class citizens in MLflow 3. You can directly log a model without first starting a run.

mlflow.pyfunc.log_model(
    name="model",
    python_model=python_model,
    ...
)

Model artifacts

In MLflow 2.x, model artifacts are stored as run artifacts under the run's artifact path. In MLflow 3, model artifacts are now stored in a different location, under the model's artifact path instead.

# MLflow 2.x
experiments/
  └── <experiment_id>/
    └── <run_id>/
      └── artifacts/
        └── ... # model artifacts are stored here
# MLflow 3
experiments/
  └── <experiment_id>/
    └── models/
      └── <model_id>/
        └── artifacts/
          └── ... # model artifacts are stored here

It is recommended to load models with mlflow.<model-flavor>.load_model using the model URI returned by mlflow.<model-flavor>.log_model to avoid any issues. This model URI is of the format models:/<model_id> (rather than runs:/<run_id>/<artifact_path> as in MLflow 2.x) and can also be constructed manually if only the model ID is available.

Model registry

In MLflow 3, the default registry URI is now databricks-uc, meaning the MLflow Model Registry in Unity Catalog will be used (see Manage model lifecycle in Unity Catalog for more details). The names of models registered in Unity Catalog are of the form <catalog>.<schema>.<model>. When calling APIs that require a registered model name, such as mlflow.register_model, this full, three-level name is used.

For workspaces that have Unity Catalog enabled and whose default catalog is in Unity Catalog, you can also use <model> as the name and the default catalog and schema will be inferred (no change in behavior from Mlflow 2.x). If your workspace has Unity Catalog enabled but its default catalog is not configured to be in Unity Catalog, you will need to specify the full three-level name.

Databricks recommends using the MLflow Model Registry in Unity Catalog for managing the lifecycle of your models.

If you want to continue using the Workspace Model Registry (legacy), use one of the following methods to set the registry URI to databricks:

Use mlflow.set_registry_uri("databricks").
Set the environment variable MLFLOW_REGISTRY_URI.
To set the environment variable for registry URI at scale, you can use init scripts. This requires all-purpose compute.

Other important changes

MLflow 3 clients can load all runs, models, and traces logged with MLflow 2.x clients. However, the reverse is not necessarily true, so models and traces logged with MLflow 3 clients may not be able to be loaded with older 2.x client versions.
The mlflow.evaluate API has been deprecated. For LLMs or GenAI applications, use the mlflow.genai.evaluate API instead. For traditional ML or deep learning models, use mlflow.models.evaluate which maintains full compatibility with the original mlflow.evaluate API
The run_uuid attribute has been removed from the RunInfo object. Use run_id instead in your code.

Install MLflow 3

To use MLflow 3, you must update the package to use the correct (>= 3.0) version. The following lines of code must be executed each time a notebook is run:

Python
%pip install mlflow>=3.0 --upgrade
dbutils.library.restartPython()

Example notebooks

The following pages illustrate the MLflow 3 model tracking workflow for traditional ML and deep learning. Each page includes an example notebook.

Next steps

To learn more about the new features of MLflow 3, see the following articles:

What is MLflow 3 and how is it different from the existing MLflow version?​

GenAI Observability and Evaluation​

Logged Models​

Deployment jobs​

Migrating from MLflow 2.x​

Logging Models​

Model artifacts​

Model registry​

Other important changes​

Install MLflow 3​

Example notebooks​

Next steps​