Skip to main content

Choose where your MLflow data is stored

MLflow tracking servers store and manage your experiment data, runs, and models. Configure your tracking servers to control where your MLflow data is stored and how to access experiments across different environments.

Databricks-hosted tracking server

By default, Databricks provides a managed MLflow tracking server that:

  • Requires no additional setup or configuration
  • Stores experiment data in your workspace
  • Integrates seamlessly with Databricks notebooks and clusters

Set the active experiment

By default all MLflow runs are logged to workspace's tracking server using the active experiment. If no experiment is explicitly set, runs are logged to the notebook experiment.

Control where runs are logged in Databricks by setting the active experiment:

Set an experiment for all subsequent runs in the execution.

Python
import mlflow

mlflow.set_experiment("/Shared/my-experiment")

Set up tracking to a remote MLflow tracking server

You may need to set up a connection to a remote MLflow tracking server. This could be because you are developing locally and want to track against the Databricks hosted server, or you want to track to a different MLflow tracking server. For example, one that's in a different workspace.

Common scenarios for remote tracking:

Scenario

Use Case

Cross-workspace tracking

Centralized experiment tracking across multiple workspaces

Local development

Develop locally but track experiments in Databricks

Remote self-hosted

Custom MLflow infrastructure with specific compliance requirements

Set up the tracking URI and experiment

To log experiments to a remote tracking server, configure both the tracking URI and experiment path:

Python
import mlflow

# Set the tracking URI to the remote server
mlflow.set_tracking_uri("databricks://remote-workspace-url")

# Set the experiment path in the remote server
mlflow.set_experiment("/Shared/centralized-experiments/my-project")

# All subsequent runs will be logged to the remote server
with mlflow.start_run():
mlflow.log_param("model_type", "random_forest")
mlflow.log_metric("accuracy", 0.95)

Authentication methods

Remote tracking server connections require proper authentication. Choose between Personal Access Tokens (PAT) or OAuth using service principals.

Use PATs for simple token-based authentication.

Pros: Simple setup, good for development

Cons: User-specific, requires manual token management

Python
import os

# Set authentication token
os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

# Configure remote tracking
mlflow.set_tracking_uri("databricks://remote-workspace-url")
mlflow.set_experiment("/Shared/remote-experiment")