Introduction to Databricks Machine Learning

This article is an introduction to Databricks Machine Learning. It describes the benefits of using Databricks for common ML tasks and provides links to notebooks, tutorials, and user guides to help you get started.

The diagram shows how the capabilities of Databricks map to the steps of the model development and deployment process.

Machine learning diagram: Model development and deployment on Databricks

What is Databricks Machine Learning?

Databricks Machine Learning provides an integrated machine learning environment that helps you simplify and standardize your ML development processes. With Databricks Machine Learning, you can:

Use the Databricks workspace with Databricks Machine Learning

Databricks Machine Learning also includes all of the capabilities of the Databricks workspace, including:

For machine learning applications, Databricks recommends using a cluster running Databricks Runtime for Machine Learning.

Use Databricks for deep learning applications

Databricks Machine Learning provides pre-built deep learning infrastructure, including built-in, pre-configured GPU support with drivers and supporting libraries. It also includes the most common deep learning libraries like TensorFlow, PyTorch, and Keras and supporting libraries like Petastorm, Hyperopt, and Horovod.

To get started with deep learning on Databricks, see:

Use Databricks for LLMs and generative AI

Databricks Runtime for Machine Learning includes libraries like Hugging Face Transformers and LangChain that allow you to integrate existing pre-trained models or other open-source libraries into your workflow. The Databricks MLflow integration makes it easy to use the MLflow tracking service with transformer pipelines, models, and processing components. In addition, you can integrate OpenAI models or solutions from partners like John Snow Labs in your Databricks workflows.

With Databricks, you can customize a LLM on your data for your specific task. With the support of open source tooling, such as Hugging Face and DeepSpeed, you can efficiently take a foundation LLM and start training with your own data to have more accuracy for your domain and workload.

In addition, Databricks provides AI functions that SQL data analysts can use to access LLM models, including from OpenAI, directly within their data pipelines and workflows. See AI Functions on Databricks.

Next steps

To get started, see:

To learn about key Databricks Machine Learning features, see:

For a recommended MLOps workflow on Databricks Machine Learning, see: