Open source vs. managed MLflow on Databricks
This page is meant to help open source MLflow users get familiar with using MLflow on Databricks. Databricks-managed MLflow uses the same APIs but provides additional capabilities through integrations with the broader Databricks platform.
Benefits of managed MLflow on Databricks
Open source MLflow provides the core data model, API, and SDK. This means your data and workloads are always portable.
Managed MLflow on Databricks adds:
- Enterprise-grade governance and security through integration with the Databricks platform, Lakehouse, and Unity Catalog. Your AI and ML data, tools, agents, models, and other assets can be governed and used in the same platform as the rest of your data and workloads.
- Fully managed hosting on production-ready, scalable servers
- Integrations for development and production with the broader Mosaic AI platform
See the Managed MLflow product page for more details on benefits, and see the rest of this page to learn about technical details.
Your data is always yours - The core data model and APIs are completely open source. You can export and use your MLflow data anywhere.
Additional capabilities on Databricks
This section lists important capabilities enabled on managed MLflow through integrations with the broader Databricks platform. For overviews of all capabilities of MLflow for GenAI, see MLflow 3 for GenAI and the open source GenAI documentation.
Enterprise-grade governance and security
- Enterprise governance with Unity Catalog: Models, feature tables, vector indexes, tools, and more are governed centrally under Unity Catalog. When deploying agents, authentication for agent, data, and tool access can be precisely controlled using both authentication passthrough and on-behalf-of-user authentication.
- Lakehouse data integration: Leverage AI/BI Genie spaces and dashboards and Databricks SQL to analyze logs and traces from MLflow experiments.
- Security and management: MLflow permissions follow the same governance patterns as the broader Databricks platform:
- Workspace objects such as experiments follow workspace permissions.
- Unity Catalog objects such as registered models follow Unity Catalog privileges.
- UI and API authentication and access match the Databricks platform and REST API.
- Auditing: System tables provide usage and audit logs for managed MLflow.
Fully managed hosting on production-ready servers
- Fully managed: Databricks provides MLflow servers with automatic updates, designed for scalability and production. For details, see Resource limits.
- Trusted platform: Managed MLflow is used by thousands of customers across the globe.
Integrations for development and production
Development of AI and ML is streamlined by integrations such as:
- Notebook integration: Databricks notebooks are automatically connected to the MLflow server and can use both notebook experiments and workspace experiments for tracking and sharing results. Databricks notebooks support autologging for MLflow tracking. For GenAI, Databricks notebooks can display an inline tracing UI for interactive analysis.
- GenAI human feedback tools: For GenAI evaluation, Databricks provides a Review App for human feedback that includes a Chat UI for vibe checks and expert feedback UI for labeling traces.
Production AI and ML are facilitated by integrations such as:
- Infrastructure-as-code for CI/CD: Manage MLflow experiments, models, and more with Databricks Asset Bundles and MLOps Stacks.
- Model deployment using CI/CD: MLflow 3 deployment jobs integrate Databricks Workflows with Unity Catalog to automate staged deployment of ML models.
- Feature Store integration: Databricks Feature Store + MLflow integration provides simpler deployment for ML models that use feature tables.
- GenAI production monitoring: Databricks provides a production monitoring service that continuously evaluates a sample of your production traffic using LLM judges and scorers. This is powered by production-scale trace ingestion that includes storing traces to Unity Catalog tables.
Open source telemetry collection was introduced in MLflow 3.2.0, and is disabled on Databricks by default. For more details, refer to the MLflow usage tracking documentation.
Next steps
Get started with MLflow on Databricks:
- Create a free trial Databricks account to use Databricks-managed MLflow
- Tutorial: Connect your development environment to MLflow
- Get started: MLflow 3 for GenAI
- Get started with MLflow 3 for models
Related reference material:
- Open source MLflow for GenAI documentation
- Databricks REST API, which includes the MLflow API
- Databricks SDKs, which include MLflow operations