Operational excellence for Databricks
The architectural principles of the operational excellence pillar cover all operational processes that keep Databricks running. Operational excellence addresses the ability to operate Databricks efficiently and discusses how to operate, manage, and monitor Databricks to deliver business value.

Principles of operational excellence
-
Optimize build and release processes
Use software engineering best practices across your entire Databricks environment. Build and release using continuous integration and continuous delivery pipelines for both DevOps and MLOps.
-
Automate deployments and workloads
Automating deployments and workloads for Databricks helps standardize these processes, eliminate human error, improve productivity, and provide greater repeatability. This includes using “configuration as code” to avoid configuration drift, and “infrastructure as code” to automate the provisioning of all required Databricks and cloud services.
For ML specifically, processes should drive automation: Not every step of a process can or should be automated. People still determine the business questions, and some models will always need human oversight before deployment. Therefore, the development process is primary and each module in the process should be automated as needed. This allows incremental build-out of automation and customization.
-
Set up monitoring, alerting, and logging
Workloads in Databricks typically integrate Databricks platform services and external cloud services, for example as data sources or targets. Successful execution can only occur if each service in the execution chain is functioning properly. When this is not the case, monitoring, alerting, and logging are important to detect and track problems and understand system behavior.
-
Manage capacity and quotas
For any service that is launched in a cloud, take limits into account, for example access rate limits, number of instances, number of users, and memory requirements. Before designing a solution, these limits must be understood.