This feature is in Public Preview.
This article provides an example that demonstrates how to use the
pyspark.ml.connect module to perform distributed training to train Spark ML models and run model inference on Databricks Connect.
Spark 3.5 introduces
pyspark.ml.connect which is designed for supporting Spark connect mode and Databricks Connect. Learn more about Databricks Connect.
pyspark.ml.connect module consists of common learning algorithms and utilities, including classification, feature transformers, ML pipelines, and cross validation. This module provides similar interfaces to the legacy `pyspark.ml` module, but the
pyspark.ml.connect module currently only contains a subset of the algorithms in
pyspark.ml. The supported algorithms are listed below:
Set up Databricks Connect on your clusters. See Cluster configuration for Databricks Connect.
Databricks Runtime 14.0 ML or higher installed.
Cluster access mode of
The following notebook demonstrates how to use Distributed ML on Databricks Connect:
For reference information about APIs in
pyspark.ml.connect, Databricks recommends the Apache Spark API reference