Databricks provides a comprehensive suite of tools and integrations to support your data processing workflows.
You can use a Databricks job to run a data processing or data analysis task in a Databricks cluster with scalable resources. Your job can consist of a single task or can be a large, multi-task workflow with complex dependencies. Databricks manages the task orchestration, cluster management, monitoring, and error reporting for all of your jobs. You can run your jobs immediately or periodically through an easy-to-use scheduling system. You can implement job tasks using notebooks, JARS, Delta Live Tables pipelines, or Python, Scala, Spark submit, and Java applications.
You create jobs through the Jobs UI, the Jobs API, or the Databricks CLI. The Jobs UI allows you to monitor, test, and troubleshoot your running and completed jobs.
To get started:
Delta Live Tables is a framework for building reliable, maintainable, and testable data processing pipelines. You define the transformations to perform on your data, and Delta Live Tables manages task orchestration, cluster management, monitoring, data quality, and error handling. You can build your entire data processing workflow with a Delta Live Tables pipeline, or you can integrate your pipeline into a Databricks jobs workflow to orchestrate a complex data processing workflow.
To get started, see the Delta Live Tables introduction.
Databricks provides integrations with popular orchestration tools such as Apache Airflow. See Managing dependencies in data pipelines.