Implement data processing and analysis workflows with Jobs
You can use a Databricks job to orchestrate your data processing, machine learning, or data analytics pipelines on the Databricks platform. Databricks Jobs support a number of workload types, including notebooks, scripts, Delta Live Tables pipelines, Databricks SQL queries, and dbt projects. The following articles guide you in using the features and options of Databricks Jobs to implement your data pipelines.
Tip
You can use Databricks Asset Bundles to define and programmatically manage your jobs. See What are Databricks Asset Bundles? and Develop a job on Databricks using Databricks Asset Bundles.
Transform, analyze, and visualize your data with a Databricks job
You can use a job to create a data pipeline that ingests, transforms, analyzes, and visualizes data. The example in Use Databricks SQL in a Databricks job builds a pipeline that:
Uses a Python script to fetch data using a REST API.
Uses Delta Live Tables to ingest and transform the fetched data and save the transformed data to Delta Lake.
Uses the Jobs integration with Databricks SQL to analyze the transformed data and create graphs to visualize the results.
Use dbt transformations in a job
Use the dbt
task type if you are doing data transformation with a dbt core project and want to integrate that project into a Databricks job, or you want to create new dbt transformations and run those transformations in a job. See Use dbt transformations in a Databricks job.
Use a Python package in a job
Python wheel files are a standard way to package and distribute the files required to run a Python application. You can easily create a job that uses Python code packaged as a Python wheel file with the Python wheel
task type. See Use a Python wheel file in a Databricks job.
Use code packaged in a JAR
Libraries and applications implemented in a JVM language such as Java and Scala are commonly packaged in a Java archive (JAR) file. Databricks Jobs supports code packaged in a JAR with the JAR
task type. See Use a JAR in a Databricks job.
Orchestrate your jobs with Apache Airflow
Databricks recommends using Databricks Jobs to orchestrate your workflows. However, Apache Airflow is commonly used as a workflow orchestration system and provides native support for Databricks Jobs. While Databricks Jobs provides a visual UI to create your workflows, Airflow uses Python files to define and deploy your data pipelines. For an example of creating and running a job with Airflow, see Orchestrate Databricks jobs with Apache Airflow.