MLlib + Automated MLflow Tracking

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. Both Databricks Runtime 5.3 and Databricks Runtime 5.3 ML and above support automated MLflow Tracking for Apache Spark MLlib model tuning in Python.

When automated MLflow tracking from MLlib is enabled, and you run tuning code that uses CrossValidator or TrainValidationSplit, hyperparameters and evaluation metrics are automatically logged in MLflow. Without automated MLflow tracking, you must make explicit API calls to log to MLflow.

Automated MLflow tracking is enabled by default for both Databricks Runtime 5.4 and Databricks Runtime 5.4 ML and above. To enable automated MLflow tracking for runtime versions lower than 5.4, set the Spark configuration spark.databricks.mlflow.trackMLlib.enabled to true.

Here is a notebook that shows automated MLflow tracking in action.

After you perform the actions in the last cell in the notebook, your MLflow UI should display: