mlflow-mleap-deployment

MLflow: Deploying PySpark models saved as MLeap to SageMaker

NOTE: Databricks Runtime does not support open source MLeap. To use MLeap, you must create a cluster running Databricks Runtime 13.3 LTS ML or below. These versions of Databricks Runtime ML have a custom version of MLeap preinstalled.

This notebook is part 2 of the MLflow MLeap example. The first part, MLflow Deployment: Train PySpark Model and Log in MLeap Format, focuses on training a PySpark model and logs the training metrics, parameters, and model in MLeap format to the MLflow tracking server.

Note: We do not recommend using Run All because it takes several minutes to deploy and update models in SageMaker; models cannot be queried until they are active.

The notebook contains the following sections:

Setup

Launch a Python 3 cluster configured with an IAM role for SageMaker deployment
Install the MLeap Scala libraries
Install MLflow and boto3.

Deploy the model to SageMaker

Specify a Docker image URI for deployment
Use MLflow to deploy the model to SageMaker
Check the status of the deployed model
- Determine if the deployed model is active and ready to be queried

Query the deployed model

Construct a query using test data
Evaluate the query using the deployed model

Clean up the deployment

Delete the model deployment using the MLflow API
Confirm that the deployment was terminated

Create a cluster and install MLflow and MLeap on your cluster

Create a GPU-enabled cluster with the following:
- Python Version: Python 3
- An attached IAM role that supports SageMaker deployment. For information about setting up a cluster IAM role for SageMaker deployment, see the SageMaker deployment guide.
Install required libraries.
1. Create library with Source Maven Coordinate and the fully-qualified Maven artifact coordinate:
  - ml.combust.mleap:mleap-spark_2.11:0.13.0
2. Install the libraries into the cluster.
If you are running Databricks Runtime, run Cmd 4 to install mlflow. If you are using Databricks Runtime ML, you can skip this step as the required libraries are already installed.
Attach this notebook to the cluster.

Evaluate the query using the deployed model

Transform the query dataframe into JSON format and evaluate it by posting the JSON to the deployed model.

Note: Deployed MLeap models only process JSON-serialized Pandas dataframes with the split orientation. You can convert a Spark DataFrame to this format as follows:

model_input_json = spark_dataframe.toPandas().to_json(orient='split')

mlflow-mleap-deployment(Python)

MLflow: Deploying PySpark models saved as MLeap to SageMaker

Note: We do not recommend using Run All because it takes several minutes to deploy and update models in SageMaker; models cannot be queried until they are active.

Setup

Deploy the model to SageMaker

Query the deployed model

Clean up the deployment

Setup

Create a cluster and install MLflow and MLeap on your cluster

Load pipeline training data

Set region, run ID, model URI

Deploy the model to SageMaker

Use MLflow to deploy the model to SageMaker

Check the status of the deployed model

Query the deployed model

Construct a query using test data

Evaluate the query using the deployed model

Clean up the deployment

Delete the deployment using MLflow

Confirm that the deployment was terminated