Choose where your MLflow data is stored
MLflow tracking servers store and manage your experiment data, runs, and models. Configure your tracking servers to control where your MLflow data is stored and how to access experiments across different environments.
Databricks-hosted tracking server
By default, Databricks provides a managed MLflow tracking server that:
- Requires no additional setup or configuration
- Stores experiment data in your workspace
- Integrates seamlessly with Databricks notebooks and clusters
Set the active experiment
By default all MLflow runs are logged to workspace's tracking server using the active experiment. If no experiment is explicitly set, runs are logged to the notebook experiment.
Control where runs are logged in Databricks by setting the active experiment:
- Mlflow.set_experiment()
- Mlflow.start_run()
- Environment variables
Set an experiment for all subsequent runs in the execution.
import mlflow
mlflow.set_experiment("/Shared/my-experiment")
Set the experiment for a specific run.
with mlflow.start_run(experiment_id="12345"):
mlflow.log_param("learning_rate", 0.01)
Set an experiment for all runs in the environment.
import os
os.environ["MLFLOW_EXPERIMENT_NAME"] = "/Shared/my-experiment"
# or
os.environ["MLFLOW_EXPERIMENT_ID"] = "12345"
Set up tracking to a remote MLflow tracking server
You may need to set up a connection to a remote MLflow tracking server. This could be because you are developing locally and want to track against the Databricks hosted server, or you want to track to a different MLflow tracking server. For example, one that's in a different workspace.
Common scenarios for remote tracking:
Scenario | Use Case |
---|---|
Cross-workspace tracking | Centralized experiment tracking across multiple workspaces |
Local development | Develop locally but track experiments in Databricks |
Remote self-hosted | Custom MLflow infrastructure with specific compliance requirements |
Set up the tracking URI and experiment
To log experiments to a remote tracking server, configure both the tracking URI and experiment path:
import mlflow
# Set the tracking URI to the remote server
mlflow.set_tracking_uri("databricks://remote-workspace-url")
# Set the experiment path in the remote server
mlflow.set_experiment("/Shared/centralized-experiments/my-project")
# All subsequent runs will be logged to the remote server
with mlflow.start_run():
mlflow.log_param("model_type", "random_forest")
mlflow.log_metric("accuracy", 0.95)
Authentication methods
Remote tracking server connections require proper authentication. Choose between Personal Access Tokens (PAT) or OAuth using service principals.
- PAT
- OAuth (service principal)
Use PATs for simple token-based authentication.
Pros: Simple setup, good for development
Cons: User-specific, requires manual token management
import os
# Set authentication token
os.environ["DATABRICKS_TOKEN"] = "your-personal-access-token"
# Configure remote tracking
mlflow.set_tracking_uri("databricks://remote-workspace-url")
mlflow.set_experiment("/Shared/remote-experiment")
Use OAuth with service principal credentials for automated workflows.
Pros: Better for automation, centralized identity management
Cons: Requires service principal setup and OAuth configuration
Create a service principal. See Manage service principals.
import os
# Set service principal credentials
os.environ["DATABRICKS_CLIENT_ID"] = "your-service-principal-client-id"
os.environ["DATABRICKS_CLIENT_SECRET"] = "your-service-principal-secret"
# Configure remote tracking
mlflow.set_tracking_uri("databricks://remote-workspace-url")
mlflow.set_experiment("/Shared/remote-experiment")