Third-Party Machine Learning Integrations

We recommend Spark MLLib as the first library customers should use because it seamlessly integrates with other components of Spark such as Spark SQL, Spark Streaming, and DataFrames. Though Databricks comes with Spark MLlib pre-installed, data scientists may want to use third-party machine learning libraries and frameworks in their data pipelines.

In this section, we provide instructions for how to install, configure and run some of these third-party ML tools in Databricks.

Note

Databricks provides these examples on a best-effort basis. Because they are external libraries, they may change in ways that Databricks cannot predict. If you need additional support on third-party tools, please refer to the documentation, mailing lists or other support options provided by the library vendor or maintainer directly.

XGBoost

See Install and Use XGBoost for more information.