AI and machine learning integrations

Databricks has validated integrations with various third-party solutions that enable common machine learning scenarios.

Ray integration

Ray is an open source framework for scaling Python applications. It includes libraries specific to AI workloads, making it especially suited for developing AI applications. Running Ray on Databricks allows you to leverage the breadth of the Databricks ecosystem, enhancing data processing and machine learning workflows with services and integrations unavailable in open source Ray.

See What is Ray on Databricks? for more information.

Graphframes integration

GraphFrames is a package for Apache Spark that provides DataFrame-based graphs. It provides high-level APIs in Java, Python, and Scala. It aims to provide both the functionality of GraphX and extended functionality, taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

Large-language models (LLMs)

Databricks makes it easy to access and build off of publicly available large language models. Databricks Runtime ML includes libraries like Hugging Face Transformers and LangChain to integrate existing pre-trained models or other open-source libraries into your workflow. Additionally, Databricks offers built-in functionality for SQL users to access and experiment with LLMs like Azure OpenAI and OpenAI using AI functions.

Data labeling

Labeling additional training data is an important step for many machine learning workflows, such as classification or computer vision applications. Databricks does not directly support data labeling; however, the Databricks partnership with Labelbox simplifies the process.

See Partner Connect documentation for Labelbox.