This notebook provides a quick overview of machine learning model training on Databricks. To train models, you can use libraries like scikit-learn that are preinstalled Databricks Runtime ML. In addition, you can use MLflow to track the trained models, and Hyperopt with SparkTrials to scale hyperparameter tuning.
In this tutorial, you train a simple classification model using MLflow to track model development and Hyperopt to improve the model’s performance. For more details on productionizing machine learning on Databricks including model lifecycle management and model inference, see the ML end-to-end example.
For additional example notebooks to get started quickly on Databricks, see 10-minute tutorials: Get started with machine learning on Databricks.
Databricks Runtime 7.5 ML or above.
If you do not have access to Databricks Runtime 7.5 ML or above, try Get started with scikit-learn in Databricks (Databricks Runtime 7.0 ML or above) or End-to-end example of building machine learning models on Databricks (Databricks Runtime 6.5 ML or above).