Production job scheduling cheat sheet
This article aims to provide clear and opinionated guidance for production job scheduling. Using best practices can help reduce costs, improve performance, and tighten security.
| Best Practice | Impact | Docs | 
|---|---|---|
| Use jobs clusters for automated workflows | Cost: Jobs clusters are billed at lower rates than interactive clusters. | |
| Restart long-running clusters | Security: Restart clusters to take advantage of patches and bug fixes to the Databricks Runtime. | |
| Use service principals instead of user accounts to run production jobs | Security: If jobs are owned by individual users, when those users leave the org, these jobs may stop running. | |
| Use Lakeflow Jobs for orchestration whenever possible | Cost: There's no need to use external tools to orchestrate if you are only orchestrating workloads on Databricks. | |
| Use latest LTS version of Databricks Runtime | Performance and cost: Databricks is always improving Databricks Runtime for usability, performance, and security. | |
| Don't store production data in DBFS root | Security: When data is stored in the DBFS root, all users can access it. |