Use Apache Spark MLlib on Databricks

This page provides example notebooks showing how to use MLlib on Databricks.

Apache Spark MLlib is the Apache Spark machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, and underlying optimization primitives. For reference information about MLlib features, Databricks recommends the following Apache Spark API references:

The pyspark.ml package from Apache Spark MLlib is supported on serverless, standard, and dedicated compute.

For information about using Apache Spark MLlib from R, see the R machine learning documentation.

Binary classification example notebook

This notebook shows you how to build a binary classification application using the Apache Spark MLlib Pipelines API.

Binary classification notebook

Open notebook in new tab Open in Databricks

Decision trees example notebooks

These examples demonstrate various applications of decision trees using the Apache Spark MLlib Pipelines API.

Decision trees

These notebooks show you how to perform classifications with decision trees.

Decision trees for digit recognition notebook

Open notebook in new tab Open in Databricks

Decision trees for SFO survey notebook

Open notebook in new tab Open in Databricks

GBT regression using MLlib pipelines

This notebook shows you how to use MLlib pipelines to perform a regression using gradient boosted trees to predict bike rental counts (per hour) from information such as day of the week, weather, season, and so on.

Bike sharing regression notebook

Open notebook in new tab Open in Databricks

Advanced Apache Spark MLlib notebook example

This notebook illustrates how to create a custom transformer.

Custom transformer notebook

Open notebook in new tab Open in Databricks

Binary classification example notebook​

Binary classification notebook

Decision trees example notebooks​

Decision trees​

Decision trees for digit recognition notebook

Decision trees for SFO survey notebook

GBT regression using MLlib pipelines​

Bike sharing regression notebook

Advanced Apache Spark MLlib notebook example​

Custom transformer notebook

Binary classification example notebook

Decision trees example notebooks

Decision trees

GBT regression using MLlib pipelines

Advanced Apache Spark MLlib notebook example