Machine learning with MLlib tutorial


Databricks Runtime ML is a comprehensive tool for developing and deploying machine learning models with Databricks. It includes the most popular machine learning and deep learning libraries, as well as MLflow, a machine learning platform API for tracking and managing the end-to-end machine learning lifecycle. See Databricks Machine Learning guide for details.

The Apache Spark machine learning library (MLlib) allows data scientists to focus on their data problems and models instead of solving the complexities surrounding distributed data (such as infrastructure, configurations, and so on). The tutorial notebook takes you through the steps of loading and preprocessing data, training a model using an MLlib algorithm, evaluating model performance, tuning the model, and making predictions. It also illustrates the use of MLlib pipelines and the MLflow machine learning platform.


Use the notebook that corresponds to the Databricks Runtime version on your cluster. For more machine learning examples, see Databricks Machine Learning guide.

Get started with MLlib notebook (Databricks Runtime 7.0 and above)

Open notebook in new tab

Get started with MLlib notebook (Databricks Runtime 5.5 LTS or 6.x)

Open notebook in new tab