Third-Party Machine Learning Integrations

We recommend Spark MLLib as the first library customers should use because it seamlessly integrates with other components of Spark such as Spark SQL, Spark Streaming, and DataFrames. Although Databricks comes with Spark MLlib pre-installed, data scientists may want to use third-party machine learning libraries and frameworks in their data pipelines.

In this section, we provide instructions for how to install, configure and run some of these third-party ML tools in Databricks.


Databricks provides these examples on a best-effort basis. Because they are external libraries, they may change in ways that are not easy to predict. If you need additional support on third-party tools, please refer to the documentation, mailing lists or other support options provided by the library vendor or maintainer directly.


See Install and Use XGBoost for more information.

H2O Sparkling Water


This notebook is too large to display inline. Get notebook link.