Use Apache Spark MLlib on Databricks
Apache Spark MLlib is the Apache Spark machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, and underlying optimization primitives. Databricks recommends the following Apache Spark MLlib guides:
Example notebooks
The following notebooks demonstrate how to use various Apache Spark MLlib features using Databricks.
In this section:
Binary classification example
This notebook shows you how to build a binary classification application using the Apache Spark MLlib Pipelines API.
Decision trees examples
These examples demonstrate various applications of decision trees using the Apache Spark MLlib Pipelines API.
Apache Spark MLlib pipelines and Structured Streaming example
This notebook shows how to train an Apache Spark MLlib pipeline on historic data and apply it to streaming data.
Advanced Apache Spark MLlib example
This notebook illustrates how to create a custom transformer.
For reference information about MLlib features, Databricks recommends the following Apache Spark API reference:
For using Apache Spark MLlib from R, see the R machine learning documentation.
For Databricks support for visualizing machine learning algorithms, see Machine learning visualizations.