Model training examples

This section includes examples showing how to train machine learning models on Databricks using many popular open-source libraries.

You can also use AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.

For an example notebook that shows how to train a machine learning model that uses data in Unity Catalog and write predictions back to Unity Catalog, see Train and register machine learning models with Unity Catalog.

Machine learning examples

Package

Notebook(s)

Features

scikit-learn

Machine learning tutorial

Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow

scikit-learn

End-to-end example

Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost, Model Registry, Model Serving

MLlib

MLlib examples

Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer

xgboost

XGBoost examples

Python, PySpark, and Scala, single node workloads and distributed training

Hyperparameter tuning examples

For general information about hyperparameter tuning in Databricks, see Hyperparameter tuning.

Package

Notebook

Features

Hyperopt

Distributed hyperopt

Distributed hyperopt, scikit-learn, MLflow

Hyperopt

Compare models

Use distributed hyperopt to search hyperparameter space for different model types simultaneously

Hyperopt

Distributed training algorithms and hyperopt

Hyperopt, MLlib

Hyperopt

Hyperopt best practices

Best practices for datasets of different sizes