Use XGBoost on Databricks
If you use XGBoost 0.90 for training and the training job fails, the shared Spark context will be killed and the only way to recover is to restart the cluster. This is a bug in XGBoost.
Single node training in Python
The Python package allows you to train only single node workloads.
Distributed training in Scala
To perform distributed training, you must use XGBoost’s Scala/Java packages. The examples in this section show how you can use XGBoost with MLlib. The first example shows how to embed an XGBoost model into an MLlib ML pipeline.
The second example shows how to use MLlib cross validation to tune an XGBoost model.