Skip to main content

Pipeline developer reference

This section contains reference and instructions for pipeline developers.

Data loading and transformations are implemented in pipelines by queries that define streaming tables and materialized views. To implement these queries, Lakeflow Spark Declarative Pipelines supports SQL and Python interfaces. Because these interfaces provide equivalent functionality for most data processing use cases, pipeline developers can choose the interface that they are most comfortable with.

Python development

Create pipelines using Python code.

Topic

Description

Develop pipeline code with Python

An overview of developing pipelines in Python.

Lakeflow Spark Declarative Pipelines Python language reference

Python reference documentation for the pipelines module.

Manage Python dependencies for pipelines

Instructions for managing Python libraries in pipelines.

Import Python modules from Git folders or workspace files

Instructions for using Python modules that you have stored in Databricks.

SQL development

Create pipelines using SQL code.

Topic

Description

Develop pipeline code with SQL

An overview of developing pipelines in SQL.

Pipeline SQL language reference

Reference documentation for SQL syntax for Lakeflow Spark Declarative Pipelines.

Use pipelines in Databricks SQL

Use Databricks SQL to work with pipelines.

Other development topics

The following topics describe other ways to develop piplines.

Topic

Description

Convert a pipeline into a Databricks Asset Bundle project

Convert an existing pipeline to a bundle, which allows you to manage your data processing configuration in a source-controlled YAML file for easier maintenance and automated deployments to target environments.

Create pipelines with dlt-meta

Use the open source dlt-meta library to automate the creation of pipelines with a metadata-driven framework.

Develop pipeline code in your local development environment

An overview of options for developing pipelines locally.