Export Apache Spark ML models and pipelines

This article discusses the export part of a Databricks ML Model Export workflow; see Import models into your application for the import and scoring part of the workflow.

With Databricks ML Model Export, you can easily export your trained Apache Spark ML models and pipelines.

import com.databricks.ml.local.ModelExport
val lr = new LogisticRegression()
val model = lr.fit(trainingData)
// Export the model into provided directory
ModelExport.exportModel(lrModel, "<storage-location>")

For detailed code examples, see the example notebooks.

Model Export format

MLlib models are exported as JSON files, with a format matching the Spark ML persistence format. The key changes from MLlib’s format are:

  • Using JSON instead of Parquet
  • Adding extra metadata

The Model Export format has several benefits:


Simple, human-readable format that can be checked into version control systems

Matching MLlib format

Model Export stays in sync with MLlib standards and APIs


The extra metadata from Databricks allows scoring outside of Spark

For example, exporting a logistic regression model produces a directory containing the following JSON files:

  • metadata, which contains the type of the model and how it was configured for training. This file matches MLlib’s metadata file.

  • data, which contains the trained model parameters. This file matches MLlib’s data file. type is the MLlib vector format: 0 for sparse vector, 1 for dense vector.

      "interceptVector": {
        "values":[ -8.44645260967139 ]
        "values":[ -0.01747691176982096, 1.542111173068903, 0.700895509427004, 0.025215711086829903 ],
  • dbmlMetadata, which contains extra information specific to Databricks ML Model Export.


Supported models

You can programmatically retrieve a list of supported models by calling ModelExport.supportedModels. The following models are supported:


Probabilistic classifiers ( decision tree classifiers, logistic regression, random forest classifiers, etc.) can output an additional probability field containing a vector of class probabilities.

Exporting models from Databricks

The following notebooks demonstrate how to export ML models from Databricks.

Model export Scala notebook

Open notebook in new tab

Model export Python notebook

Open notebook in new tab