Use XGBoost on Databricks


If you use XGBoost 0.90 for training and the training job fails, the shared Spark context will be killed and the only way to recover is to restart the cluster. This is a bug in XGBoost.

Single node training in Python

The Python package allows you to train only single node workloads.

XGBoost Python notebook

Open notebook in new tab

Distributed training in Scala

To perform distributed training, you must use XGBoost’s Scala/Java packages. The examples in this section show how you can use XGBoost with MLlib. The first example shows how to embed an XGBoost model into an MLlib ML pipeline. The second example shows how to use MLlib cross validation to tune an XGBoost model.

XGBoost classification with ML pipeline notebook

Open notebook in new tab

XGBoost regression with cross-validation notebook

Open notebook in new tab