Hyperparameter tuning

Databricks Runtime for Machine Learning incorporates Hyperopt, an open source tool that automates the process of model selection and hyperparameter tuning.

Hyperparameter tuning with Ray

Databricks Runtime ML includes Ray, an open-source framework that specializes in parallel compute processing for scaling ML workflows and AI applications. See Use Ray on Databricks.

Hyperparameter tuning with Hyperopt

Databricks Runtime ML includes Hyperopt, a Python library that facilitates distributed hyperparameter tuning and model selection. With Hyperopt, you can scan a set of Python models while varying algorithms and hyperparameters across spaces that you define. Hyperopt works with both distributed ML algorithms such as Apache Spark MLlib and Horovod, as well as with single-machine ML models such as scikit-learn and TensorFlow.

The basic steps when using Hyperopt are:

  1. Define an objective function to minimize. Typically this is the training or validation loss.

  2. Define the hyperparameter search space. Hyperopt provides a conditional search space, which lets you compare different ML algorithms in the same run.

  3. Specify the search algorithm. Hyperopt uses stochastic tuning algorithms that perform a more efficient search of hyperparameter space than a deterministic grid search.

  4. Run the Hyperopt function fmin(). fmin() takes the items you defined in the previous steps and identifies the set of hyperparameters that minimizes the objective function.

To get started quickly using Hyperopt with scikit-learn algorithms, see:

For more details about how Hyperopt works, and for additional examples, see:

Automated MLflow tracking

Note

MLlib automated MLflow tracking is deprecated on clusters that run Databricks Runtime 10.1 ML and above, and it is disabled by default on clusters running Databricks Runtime 10.2 ML and above. Instead, use MLflow PySpark ML autologging by calling mlflow.pyspark.ml.autolog(), which is enabled by default with Databricks Autologging.

To use the old MLlib automated MLflow tracking in Databricks Runtime 10.2 ML and above, enable it by setting the Spark configurations spark.databricks.mlflow.trackMLlib.enabled true and spark.databricks.mlflow.autologging.enabled false.