Monitor model quality and endpoint health

Mosaic AI Model Serving provides advanced tooling for monitoring the quality and health of models and their deployments. The following table is an overview of each monitoring tool available.

Tool	Description	Purpose	Access
Service logs	Captures `stdout` and `stderr` streams from the model serving endpoint.	Useful for debugging during model deployment. Use `print(..., flush=true)` for immediate display in the logs.	Accessible using the Logs tab in the Serving UI. Logs are streamed in real-time and can be exported through the API.
Build logs	Displays output from the process which automatically creates a production-ready Python environment for the model serving endpoint.	Useful for diagnosing model deployment and dependency issues.	Available upon completion of the model serving build under Build logs in the Logs tab. Logs can be exported through the API.
Endpoint health metrics	Provides insights into infrastructure metrics like latency, request rate, error rate, CPU usage, and memory usage.	Important for understanding the performance and health of the serving infrastructure.	Available by default in the Serving UI for the last 14 days. Data can also be streamed to observability tools in real-time.
AI Gateway-enabled inference tables	Automatically logs online prediction requests and responses into Delta tables managed by Unity Catalog for endpoints that serve custom models, external models, or provisioned throughput workloads.	Use this tool for monitoring and debugging model quality or responses, generating training data sets, or conducting compliance audits.	Can be enabled for existing and new model serving endpoints when enabling AI Gateway features using the Serving UI or REST API.

Additional resources​

Additional resources