An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, batch inference on Apache Spark or real-time serving through a REST API. The format defines a convention that lets you save a model in different flavors (python-function, pytorch, sklearn, and so on), that can be understood by different model serving and inference platforms.
With Databricks Runtime 8.4 ML and above, when you log a model, MLflow automatically logs
requirements.txt files. You can use these files to recreate the model development environment and reinstall dependencies using
To log a model to the MLflow tracking server, use
To load a previously logged model for inference or further development, use
modelpath is one of the following:
- a run-relative path (such as
- a DBFS path
- a registered model path (such as
For a complete list of options for loading MLflow models, see Referencing Artifacts in the MLflow documentation.
For Python MLflow models, an additional option is to use
mlflow.pyfunc.load_model() to load the model as a generic Python function.
You can use the following code snippet to load the model and score data points.
model = mlflow.pyfunc.load_model(model_path) model.predict(model_input)
Alternatively, you can export the model as an Apache Spark UDF to use for scoring on a Spark cluster.
# load input data table as a Spark DataFrame input_data = spark.table(input_table_name) model_udf = mlflow.pyfunc.spark_udf(model_path) df = input_data.withColumn("prediction", model_udf())
When you log a model in a Databricks notebook, Databricks automatically generates code snippets that you can copy and use to load and run the model. To view these code snippets:
- Navigate to the Runs screen for the run that generated the model. (See View notebook experiment for how to display the Runs screen.)
- Scroll to the Artifacts section.
- Click the name of the logged model. A panel opens to the right showing code you can use to load the logged model and make predictions on Spark or pandas DataFrames.
You can register models in the MLflow Model Registry, a centralized model store that provides a UI and set of APIs to manage the full lifecycle of MLflow Models. For general information about the Model Registry, see MLflow Model Registry on Databricks. For instructions on how to use the Model Registry to manage models in Databricks, see Manage models.
To register a model using the API, use
To save a model locally, use
modelpath must be a DBFS path. For example, if you use a DBFS location
dbfs:/my_project_models to store your project work, you must use the model path
modelpath = "/dbfs/my_project_models/model-%f-%f" % (alpha, l1_ratio) mlflow.sklearn.save_model(lr, modelpath)
You can download the logged model artifacts (such as model files, plots, and metrics) for a registered model with various APIs.
Python API example:
from mlflow.store.artifact.models_artifact_repo import ModelsArtifactRepository model_uri = MlflowClient.get_model_version_download_uri(model_name, model_version) ModelsArtifactRepository(model_uri).download_artifacts(artifact_path="")
Java API example:
MlflowClient mlflowClient = new MlflowClient(); // Get the model URI for a registered model version. String modelURI = mlflowClient.getModelVersionDownloadUri(modelName, modelVersion); // Or download the model artifacts directly. File modelFile = mlflowClient.downloadModelVersion(modelName, modelVersion);
CLI command example:
mlflow artifacts download --artifact-uri models:/<name>/<version|stage>