Hyperopt is a popular open-source hyperparameter tuning library. Hyperopt offers two tuning algorithms: Random Search and the Bayesian method Tree of Parzen Estimators (TPE), which offer improved compute efficiency compared to a brute force approach such as grid search.
Databricks Runtime 5.4 ML and above includes Hyperopt, augmented with an implementation powered by Apache Spark. By using the
SparkTrials extension of
hyperopt.Trials, you can easily distribute a Hyperopt run without making other changes to your Hyperopt usage. When applying the
hyperopt.fmin() function, you pass in the
SparkTrials can accelerate single-machine tuning by distributing trials to Spark workers.
MLflow is an open source platform for managing the end-to-end machine learning lifecycle. Databricks Runtime 5.4 ML and above support automated MLflow tracking for hyperparameter tuning with Hyperopt and
SparkTrials in Python. When automated MLflow tracking is enabled and you run
SparkTrials, hyperparameters and evaluation metrics are automatically logged in MLflow. Without automated MLflow tracking, you must make explicit API calls to log to MLflow. Automated MLflow tracking is enabled by default. To disable it, set the Spark configuration
false. You can still use
SparkTrials to distribute tuning even without automated MLflow tracking.
Databricks does not support logging to MLflow from workers, so you cannot add custom logging code in the objective function you pass to Hyperopt.
This section describes how to configure the arguments you pass to Hyperopt, best practices in using Hyperopt, and troubleshooting issues that may arise when using Hyperopt.
fmin() documentation has detailed explanations for all the arguments. We briefly mention the important ones below:
fn: The objective function to be called with a value generated from the hyperparameter space (
fncan return the loss as a scalar value or in a dictionary (refer to Hyperopt docs for details). This is usually where most your code would be, for example, loss calculation, model training, and so on.
space: An expression that generates the hyperparameter space Hyperopt searches. A simple example is
hp.uniform('x', -10, 10), which defines a single-dimension search space between -10 and 10. Hyperopt provides great flexibility in defining the hyperparameter space. After you are familiar with Hyperopt you can use this argument to make your tuning more efficient.
algo: The search algorithm Hyperopt uses to search the hyperparameter space (
space). Typical values are
hyperopt.random.suggestfor Random Search and
max_evals: The number of hyperparameter settings to try, that is, the number of models to fit. This number should be large enough to amortize overhead.
max_queue_len: The number of hyperparameter settings Hyperopt should generate ahead of time. Since the Hyperopt TPE generation algorithm can take some time, it can be helpful to increase this beyond the default value of 1, but generally no larger than the
parallelism: The maximum number of concurrent runs allowed. This value cannot be greater than 128 or the number of CPUs on all worker nodes of a cluster combined. Higher concurrency will usually shorten the wall clock time to finding the optimal configuration. However, the total amount of compute needed (or DBUs) is typically more than what is needed if you run tuning serially. The reason is that a single serial tuning run is always able to access the entire prior (previous results), while with parallel runs the optimizer cannot know the outcome of the other concurrent runs still in progress when selecting new hyperparameter values to test.
timeout: The maximum number of seconds an
fmin()call can take. Once this number is exceeded, all runs are terminated and
fmin()would then exit. All information about completed runs is preserved based on which best model is selected. This argument can save you time as well as help you control your cluster cost.
SparkTrials API is included in the Example Notebook. To find it, search for
Here are a few things that help you get the most out of using Hyperopt:
- Bayesian approaches can be much more efficient than grid search and random search. Hence, with the Hyperopt Tree of Parzen Estimators (TPE) algorithm, it is often possible to explore more hyperparameters and larger ranges. However, if you can use domain knowledge to restrict the search domain which will help to speed up tuning and produce better results.
- For models with long training times, start experimenting with small datasets and as many hyperparameters as possible. Use MLflow to introspect the best performing models, make informed decisions about how to fix as many hyperparameters as you can, and intelligently down-scope the parameter space as you prepare for tuning at scale.
- Take advantage of Hyperopt support for conditional dimensions and hyperparameters. For example, when you evaluate multiple flavors of gradient descent, instead of limiting the hyperparameter space to just the common hyperparameters, you can have Hyperopt include conditional hyperparameters – the ones that are only appropriate for a subset of the flavors.
- If the loss is
NaN(not a number), it is usually because the objective function passed to
NaNloss does not affect other runs and you can safely ignore it. If you want to avoid
NaNlosses, you can either adjust the hyperparameter space or modify your objective function.
- With Hyperopt search methods the loss usually does not decrease monotonically with each run. However, you can often find the best hyperparameters more quickly than using other methods.
- Both Hyperopt and Spark incur certain overheads. For short trial runs (low tens of seconds), these overheads dominate and the speedup could be pretty small or even zero.
- When you use
hp.choice, Hyperopt returns only the index of the choice list. Therefore the parameter logged in MLflow is also the index. You can use
hyperopt.space_evalto retrieve the parameter values.
Here is a notebook that shows distributed Hyperopt + automated MLflow tracking in action. Before diving into the notebook, make sure you:
mlflowfrom PyPI on your cluster
After you perform the actions in the last cell in the notebook, your MLflow UI should display: