MLflow API Reference

This page provides an index of important MLflow APIs used in GenAI applications, with direct links to the official MLflow documentation.

MLflow features marked as "Databricks only" are only available on Databricks-managed MLflow.

Quick links

Beta and Experimental Features

Some of the APIs referenced on this page are currently in the Beta or Experimental stages. These APIs are subject to change or removal in future releases. Experimental APIs are available to all customers, and Beta APIs are available to most customers automatically. If you do not have access to a Beta API and need to request access, contact your Databricks support representative.

Experiment management

Manage MLflow experiments and runs for tracking GenAI application development:

SDK

mlflow.search_runs() - Search and filter runs by criteria
mlflow.set_experiment() - Set the active MLflow experiment
mlflow.start_run() - Start a new MLflow run for tracking

Entities

mlflow.entities.Experiment - Experiment metadata and configuration
mlflow.entities.Run - Run metadata, metrics, and parameters

Tracing

Instrument and capture execution traces from GenAI applications:

SDK

mlflow.delete_trace_tag() - Remove a tag from a trace
mlflow.get_current_active_span() - Get the currently active span
mlflow.get_last_active_trace() - Retrieve the most recently completed trace
mlflow.get_last_active_trace_id() - Get ID of the last active trace
mlflow.get_trace() - Retrieve a trace by ID
mlflow.search_traces() - Search and filter traces
mlflow.set_trace_tag() - Add a tag to a trace
mlflow.start_span() - Manually start a new span
mlflow.trace - Decorator to automatically trace function execution
mlflow.traceName - Context manager to set trace name
mlflow.traceOutputs - Context manager to set trace outputs
mlflow.tracing - Tracing module with configuration functions
mlflow.tracing.disable - Disable tracing globally
mlflow.tracing.disable_notebook_display() - Disable trace display in notebooks
mlflow.tracing.enable - Enable tracing globally
mlflow.tracing.enable_notebook_display() - Enable trace display in notebooks
mlflow.update_current_trace() - Update metadata for the current trace

Entities

mlflow.entities.Trace - Complete trace with all spans and metadata
mlflow.entities.TraceData - Trace execution data
mlflow.entities.TraceInfo - Trace metadata and summary information
mlflow.entities.Span - Individual span within a trace
mlflow.entities.SpanEvent - Event occurring within a span
mlflow.entities.SpanType - Span type classification enum
mlflow.entities.Document - Document entity for RAG applications

Tracing integrations

Auto-instrumentation for popular GenAI frameworks and libraries:

mlflow.anthropic.autolog - Anthropic Claude integration
mlflow.autogen.autolog - Microsoft AutoGen integration
mlflow.bedrock.autolog - AWS Bedrock integration
mlflow.crewai.autolog - CrewAI integration
mlflow.dspy.autolog - DSPy integration
mlflow.gemini.autolog - Google Gemini integration
mlflow.groq.autolog - Groq integration
mlflow.langchain.autolog - LangChain integration
mlflow.litellm.autolog - LiteLLM integration
mlflow.llama_index.autolog - LlamaIndex integration
mlflow.mistral.autolog - Mistral AI integration
mlflow.openai.autolog - OpenAI integration

Evaluation and monitoring

Core evaluation SDK

Core APIs for offline evaluation and production monitoring:

mlflow.genai.evaluate() - Evaluation harness to orchestrate offline evaluation with scorers and datasets
mlflow.genai.to_predict_fn() - Convert model output to standardized prediction function format
mlflow.genai.Scorer - Custom scorer class for object-oriented implementation with state management
mlflow.genai.scorer() - Scorer decorator for scorer creation and evaluation logic

Built-in scorers

Quality assessment scorers ready for immediate use:

mlflow.genai.scorers.Safety - Content safety evaluation
mlflow.genai.scorers.Correctness - Answer accuracy assessment
mlflow.genai.scorers.RelevanceToQuery - Query relevance scoring
mlflow.genai.scorers.Guidelines - Custom guideline compliance
mlflow.genai.scorers.ExpectationsGuidelines - Guideline evaluation with expectations
mlflow.genai.scorers.RetrievalGroundedness - RAG grounding assessment
mlflow.genai.scorers.RetrievalRelevance - Retrieved context relevance
mlflow.genai.scorers.RetrievalSufficiency - Context sufficiency evaluation

Built-in scorer helpers:

mlflow.genai.scorers.get_all_scorers() - Retrieve all built-in scorers

Judge functions

LLM-based assessment functions for direct use or scorer wrapping:

mlflow.genai.judges.is_safe() - Safety assessment
mlflow.genai.judges.is_correct() - Correctness evaluation
mlflow.genai.judges.is_grounded() - Grounding verification
mlflow.genai.judges.is_context_relevant() - Context relevance
mlflow.genai.judges.is_context_sufficient() - Context sufficiency
mlflow.genai.judges.meets_guidelines() - Custom guideline assessment
mlflow.genai.make_judge() - Create custom judges

Judge output entities

mlflow.genai.judges.CategoricalRating - Enum for categorical judge responses
- mlflow.genai.judges.CategoricalRating.YES - Positive rating
- mlflow.genai.judges.CategoricalRating.NO - Negative rating
- mlflow.genai.judges.CategoricalRating.UNKNOWN - Uncertain rating

Production monitoring scorer lifecycle SDK (Databricks only)

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

Scorer lifecycle management for continuous quality tracking in production:

Scorer instance methods

Scorer.register() - Register custom scorer with server
Scorer.start() - Begin online evaluation with sampling
Scorer.update() - Modify sampling configuration
Scorer.stop() - Stop online evaluation

Scorer registry functions

mlflow.genai.scorers.get_scorer() - Retrieve registered scorer by name
mlflow.genai.scorers.list_scorers() - List all registered scorers
mlflow.genai.scorers.delete_scorer() - Delete registered scorers by name

Scorer properties

Scorer.sample_rate - Current sampling rate (0.0-1.0)
Scorer.filter_string - Current trace filter

Configuration classes

mlflow.genai.ScorerSamplingConfig - Sampling configuration data class

Assessment entities

Data structures for storing evaluation results and feedback:

mlflow.entities.Assessment - Evaluation result container
mlflow.entities.AssessmentError - Assessment error details
mlflow.entities.AssessmentSource - Source of the assessment
mlflow.entities.AssessmentSourceType - Assessment source type enum
mlflow.entities.Expectation - Expected ground truth outcome
mlflow.entities.Feedback - Scorer output with value and rationale

Evaluation datasets

Create and manage versioned test datasets for systematic evaluation:

SDK

mlflow.genai.create_dataset() - Create a new evaluation dataset
mlflow.genai.delete_dataset() - Delete an evaluation dataset
mlflow.genai.get_dataset() - Retrieve an existing evaluation dataset

Entities

mlflow.genai.datasets.EvaluationDataset - Versioned test data container
- merge_records() - Combine records from multiple sources
- set_profile() - Configure dataset profile settings
- to_df() - Convert dataset to pandas DataFrame
- to_evaluation_dataset() - Convert to evaluation dataset format

Human labeling and review app (Databricks only)

Human feedback collection and review workflows for systematic quality assessment:

Entities

mlflow.genai.Agent - Agent configuration for review app testing
mlflow.genai.LabelingSession - Human labeling workflow manager
- add_dataset() - Add evaluation dataset to labeling session
- add_traces() - Add traces for human review
- set_assigned_users() - Assign reviewers to session
- sync() - Synchronize session state
mlflow.genai.ReviewApp - Interactive review application
- add_agent() - Add agent for testing
- remove_agent() - Remove agent from review app

Prompt management

Version control and lifecycle management for prompts used in GenAI applications:

SDK

mlflow.genai.load_prompt() - Load a versioned prompt from the registry
mlflow.genai.optimize_prompt() - Automatically improve prompts using optimization algorithms
mlflow.genai.register_prompt() - Register a new prompt to the registry
mlflow.genai.search_prompts() - Search for prompts by name or tags
mlflow.genai.delete_prompt_alias() - Remove an alias from a prompt version
mlflow.genai.set_prompt_alias() - Assign an alias to a prompt version

Entities

mlflow.entities.Prompt - Prompt metadata and version information

Prompt optimization

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

Automated prompt improvement using data-driven optimization algorithms:

SDK

mlflow.genai.optimize.optimize_prompt() - Run prompt optimization process

Entities

mlflow.genai.optimize.LLMParams - LLM configuration parameters
mlflow.genai.optimize.OptimizerConfig - Optimization algorithm configuration
mlflow.genai.optimize.PromptOptimizationResult - Optimization results and metrics

App version tracking

Track and manage GenAI application versions in production:

SDK

mlflow.set_active_model() - Set the active model for version tracking
mlflow.clear_active_model() - Clear the active model context
mlflow.get_active_model_id() - Get the current active model ID
mlflow.create_external_model() - Register an external model deployment
mlflow.delete_logged_model_tag() - Remove a tag from logged model
mlflow.finalize_logged_model() - Finalize a logged model
mlflow.get_logged_model() - Retrieve logged model by ID
mlflow.initialize_logged_model() - Initialize a new logged model
mlflow.last_logged_model() - Get the most recently logged model
mlflow.search_logged_models() - Search for logged models
mlflow.set_logged_model_tags() - Add tags to logged model
mlflow.log_model_params() - Log parameters for a model

Entities

mlflow.entities.LoggedModel - Logged model metadata and information
mlflow.entities.LoggedModelStatus - Logged model status enum
mlflow.ActiveModel - Active model context manager

Quick links​

Experiment management​

SDK​

Entities​

Tracing​

SDK​

Entities​

Tracing integrations​

Evaluation and monitoring​

Core evaluation SDK​

Built-in scorers​

Judge functions​

Judge output entities​

Production monitoring scorer lifecycle SDK (Databricks only)​

Scorer instance methods​

Scorer registry functions​

Scorer properties​

Configuration classes​

Assessment entities​

Evaluation datasets​

SDK​

Entities​

Human labeling and review app (Databricks only)​

Labeling session SDK​

Label schema types​

Label schema SDK​

Entities​

Prompt management​

SDK​

Entities​

Prompt optimization​

SDK​

Entities​

App version tracking​

SDK​

Entities​

Quick links

Experiment management

SDK

Entities

Tracing

SDK

Entities

Tracing integrations

Evaluation and monitoring

Core evaluation SDK

Built-in scorers

Judge functions

Judge output entities

Production monitoring scorer lifecycle SDK (Databricks only)

Scorer instance methods

Scorer registry functions

Scorer properties

Configuration classes

Assessment entities

Evaluation datasets

SDK

Entities

Human labeling and review app (Databricks only)

Labeling session SDK

Label schema types

Label schema SDK

Entities

Prompt management

SDK

Entities

Prompt optimization

SDK

Entities

App version tracking

SDK

Entities