Hyperparameter tuning with Optuna

Optuna is an open-source Python library for hyperparameter tuning that can be scaled horizontally across multiple compute resources. Optuna also integrates with MLflow for model and trial tracking and monitoring.

Install Optuna

Use the following commands to install Optuna and its integration module.

%pip install optuna
%pip install optuna-integration # Integration with MLflow

Define search space and run Optuna optimization

Here are the steps in a Optuna workflow:

  1. Define an objective function to optimize. Within the objective function, define the hyperparameter search space.

  2. Create an Optuna Study object, and run the tuning algorithm by calling the optimize function of the Study object.

Below is a minimal example from the Optuna documentation.

  • Define objective function objective, and call the suggest_float function to define the search space for the parameter x.

  • Create a Study, and optimize the objective function with 100 trials, i.e., 100 calls of the objective function with different values of x.

  • Get the best parameters of the Study

def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2

study = optuna.create_study()
study.optimize(objective, n_trials=100)

best_params = study.best_params

Parallelize Optuna trials to multiple machines

You can distribute Optuna trials to multiple machines in a Databricks cluster with Joblib Apache Spark Backend.

import joblib
from joblibspark import register_spark

register_spark() # register Spark backend for Joblib
with joblib.parallel_backend("spark", n_jobs=-1):
    study.optimize(objective, n_trials=100)

Integrate with MLflow

To track hyperparameters and metrics of all the Optuna trials, use the MLflowCallback of Optuna Integration modules when you call the optimize function.

import mlflow
from optuna.integration.mlflow import MLflowCallback

mlflow_callback = MLflowCallback(
    tracking_uri="databricks",
    metric_name="accuracy",
    create_experiment=False,
    mlflow_kwargs={
        "experiment_id": experiment_id
    }
)

study.optimize(objective, n_trials=100, callbacks=[mlflow_callback])

Notebook example

This notebook provides an example of using Optuna to select a scikit-learn model and a set of hyperparameters for the Iris dataset.

On top of a single-machine Optuna workflow, the notebook showcases how to

  • Parallelize Optuna trials to multiple machines via Joblib

  • Track trial runs with MLflow

Scaling up hyperparameter tuning with Optuna and MLflow

Open notebook in new tab