MLflow guide

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It has the following primary components:

  • Tracking: Allows you to track experiments to record and compare parameters and results.

  • Models: Allow you to manage and deploy models from a variety of ML libraries to a variety of model serving and inference platforms.

  • Projects: Allow you to package ML code in a reusable, reproducible form to share with other data scientists or transfer to production.

  • Model Registry: Allows you to centralize a model store for managing models’ full lifecycle stage transitions: from staging to production, with capabilities for versioning and annotating. Databricks provides a managed version of the Model Registry in Unity Catalog.

  • Model Serving: Allows you to host MLflow Models as REST endpoints.

Databricks provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Databricks workspace features such as experiment and run management and notebook revision capture. MLflow on Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects.

First-time users should begin with the quickstart, which demonstrates the basic MLflow tracking APIs. The subsequent articles introduce each MLflow component with example notebooks and describe how these components are hosted within Databricks.

MLflow supports Java, Python, R, and REST APIs.

Note

If you’re just getting started with Databricks, consider using MLflow on Databricks Community Edition, which provides a simple managed MLflow experience for lightweight experimentation. Remote execution of MLflow projects is not supported on Databricks Community Edition. We plan to impose moderate limits on the number of experiments and runs. For the initial launch of MLflow on Databricks Community Edition no limits are imposed.

MLflow data stored in the control plane (experiment runs, metrics, tags and params) is encrypted using a platform-managed key. Encryption using Customer-managed keys for managed services is not supported for that data. On the other hand, the MLflow models and artifacts stored in your root (DBFS) storage can be encrypted using your own key by configuring customer-managed keys for workspace storage.