Choose where your MLflow data is stored

MLflow tracking servers store and manage your experiment data, runs, and models. Configure your tracking servers to control where your MLflow data is stored and how to access experiments across different environments.

Databricks-hosted tracking server

By default, Databricks provides a managed MLflow tracking server that:

Requires no additional setup or configuration
Stores experiment data in your workspace
Integrates seamlessly with Databricks notebooks and clusters

Set the active experiment

By default all MLflow runs are logged to workspace's tracking server using the active experiment. If no experiment is explicitly set, runs are logged to the notebook experiment.

Control where runs are logged in Databricks by setting the active experiment:

Mlflow.set_experiment()
Mlflow.start_run()
Environment variables

Set an experiment for all subsequent runs in the execution.

Python
import mlflow

mlflow.set_experiment("/Shared/my-experiment")

Set the experiment for a specific run.

Python
with mlflow.start_run(experiment_id="12345"):
    mlflow.log_param("learning_rate", 0.01)

Set an experiment for all runs in the environment.

Python
import os
os.environ["MLFLOW_EXPERIMENT_NAME"] = "/Shared/my-experiment"
# or
os.environ["MLFLOW_EXPERIMENT_ID"] = "12345"

Set up tracking to a remote MLflow tracking server

You may need to set up a connection to a remote MLflow tracking server. This could be because you are developing locally and want to track against the Databricks hosted server, or you want to track to a different MLflow tracking server. For example, one that's in a different workspace.

Common scenarios for remote tracking:

Scenario	Use Case
Cross-workspace tracking	Centralized experiment tracking across multiple workspaces
Local development	Develop locally but track experiments in Databricks
Remote self-hosted	Custom MLflow infrastructure with specific compliance requirements

Set up the tracking URI and experiment

To log experiments to a remote tracking server, configure both the tracking URI and experiment path:

Python
import mlflow

# Set the tracking URI to the remote server
mlflow.set_tracking_uri("databricks://remote-workspace-url")

# Set the experiment path in the remote server
mlflow.set_experiment("/Shared/centralized-experiments/my-project")

# All subsequent runs will be logged to the remote server
with mlflow.start_run():
    mlflow.log_param("model_type", "random_forest")
    mlflow.log_metric("accuracy", 0.95)

Authentication methods

Remote tracking server connections require proper authentication. Choose between Personal Access Tokens (PAT) or OAuth using service principals.

PAT
OAuth (service principal)

Use PATs for simple token-based authentication.

Pros: Simple setup, good for development

Cons: User-specific, requires manual token management

Python
import os

# Set authentication token
os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"

# Configure remote tracking
mlflow.set_tracking_uri("databricks://remote-workspace-url")
mlflow.set_experiment("/Shared/remote-experiment")

Use OAuth with service principal credentials for automated workflows.

Pros: Better for automation, centralized identity management

Cons: Requires service principal setup and OAuth configuration

Create a service principal. See Manage service principals.

Python
import os

# Set service principal credentials
os.environ["DATABRICKS_CLIENT_ID"] = "your-service-principal-client-id"
os.environ["DATABRICKS_CLIENT_SECRET"] = "your-service-principal-secret"

# Configure remote tracking
mlflow.set_tracking_uri("databricks://remote-workspace-url")
mlflow.set_experiment("/Shared/remote-experiment")

Databricks-hosted tracking server​

Set the active experiment​

Set up tracking to a remote MLflow tracking server​

Set up the tracking URI and experiment​

Authentication methods​

Databricks-hosted tracking server

Set the active experiment

Set up tracking to a remote MLflow tracking server

Set up the tracking URI and experiment

Authentication methods