%md # MLflow logging API example (Python) (MLflow 3.0) This notebook illustrates how to use the MLflow logging API to start an MLflow run and log the model, model parameters, evaluation metrics, and other artifacts to the logged model and run. The easiest way to get started using MLflow tracking with Python is to use the MLflow [`autolog()` API](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging). If you need more control over the metrics logged for each training run, or want to log additional artifacts such as tables or plots, you can use the `mlflow.log_metric()`, and `mlflow.log_artifact()` APIs demonstrated in this notebook. This tutorial leverages features from MLflow 3.0. For more details, see "Get started with MLflow 3.0" ([AWS](https://docs.databricks.com/aws/en/mlflow/mlflow-3-install)|[Azure](https://learn.microsoft.com/en-us/azure/databricks/mlflow/mlflow-3-install)|[GCP](https://docs.databricks.com/gcp/en/mlflow/mlflow-3-install)) This notebook creates a Random Forest model on a simple dataset and uses the MLflow Tracking API to log the model and selected model parameters and metrics.
MLflow logging API example (Python) (MLflow 3.0)
This notebook illustrates how to use the MLflow logging API to start an MLflow run and log the model, model parameters, evaluation metrics, and other artifacts to the logged model and run. The easiest way to get started using MLflow tracking with Python is to use the MLflow autolog()
API. If you need more control over the metrics logged for each training run, or want to log additional artifacts such as tables or plots, you can use the mlflow.log_metric()
, and mlflow.log_artifact()
APIs demonstrated in this notebook.
This tutorial leverages features from MLflow 3.0. For more details, see "Get started with MLflow 3.0" (AWS|Azure|GCP)
This notebook creates a Random Forest model on a simple dataset and uses the MLflow Tracking API to log the model and selected model parameters and metrics.
# Upgrade to the latest MLflow version to use MLflow 3.0 features %pip install mlflow>=3.0 --upgrade dbutils.library.restartPython()
%md Import the required libraries.
Import the required libraries.
import mlflow import mlflow.sklearn import pandas as pd import matplotlib.pyplot as plt from numpy import savetxt from sklearn.model_selection import train_test_split from sklearn.datasets import load_diabetes from sklearn.ensemble import RandomForestRegressor from sklearn.metrics import mean_squared_error
%md Import the dataset from scikit-learn and create the training and test datasets.
Import the dataset from scikit-learn and create the training and test datasets.
db = load_diabetes() X = db.data y = db.target X_train, X_test, y_train, y_test = train_test_split(X, y)
%md Create a random forest model and log the model, model parameters, evaluation metrics, and other artifacts using `mlflow.log_param()`, `mlflow.log_metric()`, `mlflow.log_model()`, and `mlflow.log_artifact()`. These functions let you control exactly which parameters and metrics are logged, and also let you log other artifacts of the run such as tables and plots.
Create a random forest model and log the model, model parameters, evaluation metrics, and other artifacts using mlflow.log_param()
, mlflow.log_metric()
, mlflow.log_model()
, and mlflow.log_artifact()
. These functions let you control exactly which parameters and metrics are logged, and also let you log other artifacts of the run such as tables and plots.
with mlflow.start_run(): # Set the model parameters. n_estimators = 100 max_depth = 6 max_features = 3 params = { "n_estimators": n_estimators, "max_depth": max_depth, "max_features": max_features } # Log the model parameters used for this run. mlflow.log_params(params) # Create and train model. rf = RandomForestRegressor(n_estimators = n_estimators, max_depth = max_depth, max_features = max_features) rf.fit(X_train, y_train) # Log the model create by this run, creating a Logged Model which inherits the parameters logged_model = mlflow.sklearn.log_model(rf, name="random-forest-model", input_example=X_train) # Use the model to make predictions on the test dataset. predictions = rf.predict(X_test) # Define a metric to use to evaluate the model. mse = mean_squared_error(y_test, predictions) # Log the value of the metric from this run, linking to the logged model mlflow.log_metric("mse", mse) # Save the table of predicted values savetxt('predictions.csv', predictions, delimiter=',') # Log the saved table as an artifact mlflow.log_artifact("predictions.csv") # Convert the residuals to a pandas dataframe to take advantage of graphics capabilities df = pd.DataFrame(data = predictions - y_test) # Create a plot of residuals plt.plot(df) plt.xlabel("Observation") plt.ylabel("Residual") plt.title("Residuals") # Save the plot and log it as an artifact plt.savefig("residuals_plot.png") mlflow.log_artifact("residuals_plot.png")
%md To view the results, click the **Experiments** icon <img src="https://docs.databricks.com/_static/images/icons/experiment.png"/> in the right sidebar. This sidebar displays the parameters and metrics for each run of this notebook. Click the name of the run to open the Runs page in a new tab. This page shows all of the information that was logged from the run. Select the **Artifacts** tab to find the logged model and plot. From the experiments page, switch to the **Models** tab to view the logged model that was created, along with all relevant metadata such as parameters and metrics. For more information, see "MLflow experiments" ([AWS](https://docs.databricks.com/applications/mlflow/experiments.html)|[Azure](https://docs.microsoft.com/azure/databricks/applications/mlflow/experiments)|[GCP](https://docs.gcp.databricks.com/applications/mlflow/experiments.html)).
To view the results, click the Experiments icon in the right sidebar. This sidebar displays the parameters and metrics for each run of this notebook.
Click the name of the run to open the Runs page in a new tab. This page shows all of the information that was logged from the run. Select the Artifacts tab to find the logged model and plot.
From the experiments page, switch to the Models tab to view the logged model that was created, along with all relevant metadata such as parameters and metrics.
For more information, see "MLflow experiments" (AWS|Azure|GCP).