MLflow Projects: Run on Databricks


This topic describes features that are in Private Preview.

An MLflow Project is a format for packaging data science code in a reusable and reproducible way. The MLflow Projects component includes an API and command-line tools for running projects, which also integrate with the Tracking component to automatically record the parameters and git commit of your source code for reproducibility. This topic describes how to run an MLflow project remotely on Databricks clusters using the MLflow CLI, which makes it easy to vertically scale your data science code.

To get started with MLflow projects, check out the MLflow App Library, which contains a repository of ready-to-run projects aimed at making it easy to include ML functionality into your code.

Run an MLflow project

To run an MLflow project on a Databricks cluster in the default workspace, use the command:

mlflow run <uri> -m databricks --cluster-spec <json-cluster-spec>

<uri> is a Git repository URI for an MLflow project and <json-cluster-spec> is a JSON document containing a cluster specification.

An example cluster specification is:

  "spark_version": "5.0.x-scala2.11",
  "num_workers": 1,
  "node_type_id": "i3.xlarge"


If you are using Databricks Runtime 4.3 or lower, you must specify the following spark_conf in your cluster specification:

  "spark_version": "5.0.x-scala2.11",
  "num_workers": 1,
  "node_type_id": "i3.xlarge",
  "spark_conf": {"spark.databricks.chauffeur.shellCommandTask.enabled": "true"}

You can pass Git credentials using the git-username and git-password arguments or the MLFLOW_GIT_USERNAME and MLFLOW_GIT_PASSWORD environment variables.

To run against a Databricks cluster in a non-default workspace, specify databricks://<profile>, where <profile> is a Databricks CLI profile, in the MLFLOW_TRACKING_URI environment variable.


The API for running projects, mlflow.start_run(), accepts a source_name argument. This argument is used if you run a project from a file, but is ignored if you run from a Databricks notebook or using the CLI command mlflow run.


This example shows how to run the MLflow tutorial project on a Databricks cluster, view the job run output, and view the run in the MLflow UI.

Run the MLflow tutorial project

The following command runs the MLflow tutorial project, training a wine model, and records the training parameters and metrics in MLflow experiment 49 on a workspace defined in the CLI profile mlflow:

export MLFLOW_TRACKING_URI=databricks://mlflow
mlflow run -P alpha=0.1 --experiment-id 49 -m databricks -c cluster-spec.json
=== Fetching project from into /var/folders/kc/l20y4txd5w3_xrdhw6cnz1080000gp/T/tmp6_rk_mme ===
=== Uploading project to DBFS path /dbfs/mlflow-experiments/49/projects-code/db7ec766f11c6d1fcdb7bf64e7429b4a355712e1a14b5039bc06717539334b1b.tar.gz ===
=== Finished uploading project to /dbfs/mlflow-experiments/49/projects-code/db7ec766f11c6d1fcdb7bf64e7429b4a355712e1a14b5039bc06717539334b1b.tar.gz ===
=== Running entry point main of project on Databricks ===
=== Launched MLflow run as Databricks job run with ID 2372743. Getting run status page URL... ===
=== Check the run's status at https://<databricks-instance>#job/11641/run/1 ===

View the Databricks job run

The Databricks job run output at https://<databricks-instance>#job/11641/run/1 is:


View the experiment in the MLflow UI

To view experiment in the MLflow UI, go to https://<databricks-instance>/mlflow/#/experiments/49. The output from running the job is:


Display MLflow run information

To display the MLflow run information details, click the link in the Date column.


You can navigate back to the Databricks job run page by clicking the Logs link in the Job Output field.