Procedural vs. declarative data processing in Databricks

This article covers the differences between procedural and declarative programming and their usage in Databricks.

Procedural and declarative programming are two fundamental programming paradigms in computer science. Each represents a different approach to structuring and executing instructions.

With procedural programming you specify how tasks should be accomplished by defining explicit sequences of operations.
Declarative programming focuses on what needs to be achieved, leaving the underlying system to determine the best way to execute the task.

When designing data pipelines, engineers must choose between procedural and declarative data processing models. This decision impacts workflow complexity, maintainability, and efficiency. This page explains these models' key differences, advantages and challenges, and when to use each approach.

What is procedural data processing?

Procedural data processing follows a structured approach where explicit steps are defined to manipulate data. This model is closely aligned with imperative programming, emphasizing a command sequence that dictates how the data should be processed.

Characteristics of procedural processing

The following are characteristics of procedural processing:

Step-by-step execution: The developer explicitly defines the order of operations.
Use of control structures: Loops, conditionals, and functions manage execution flow.
Detailed resource control: Enables fine-grained optimizations and manual performance tuning.
Related concepts: Procedural programming is a sub-class of imperative programming.

Common use cases for procedural processing

The following are everyday use cases for procedural processing:

Custom ETL pipelines requiring procedural logic.
Low-level performance optimizations in batch and streaming workflows.
Legacy systems or existing imperative scripts.

Procedural processing with Apache Spark and Lakeflow Jobs

Apache Spark primarily follows a procedural model for data processing. Use Lakeflow Jobs to add explicit execution logic to define step-by-step transformations and actions on distributed data.

What is declarative data processing?

Declarative data processing abstracts the how and focuses on defining the desired result. Instead of specifying step-by-step instructions, developers define transformation logic, and the system determines the most efficient execution plan.

Characteristics of declarative processing

The following are characteristics of declarative processing:

Abstraction of execution details: Users describe the desired outcome, not the steps to achieve it.
Automatic optimization: The system applies query planning and execution tuning.
Reduced complexity: Removes the need for explicit control structures, improving maintainability.
Related concepts: Declarative programming includes domain-specific and functional programming paradigms.

Common use cases for declarative processing

The following are common use cases for declarative processing:

SQL-based transformations in batch and streaming workflows.
High-level data processing frameworks such as Lakeflow Declarative Pipelines.
Scalable, distributed data workloads requiring automated optimizations.

Declarative processing with Lakeflow Declarative Pipelines

Lakeflow Declarative Pipelines is a declarative framework designed to simplify the creation of reliable and maintainable stream processing pipelines. By specifying what data to ingest and how to transform it, Lakeflow Declarative Pipelines automates key aspects of pipeline management, including orchestration, compute management, monitoring, data quality enforcement, and error handling.

Key differences: procedural vs. declarative processing

Aspect	Procedural processing	Declarative processing
Control	Full control over execution	Execution handled by system
Complexity	Can be complex and verbose	Generally simpler and more concise
Optimization	Requires manual tuning	System handles optimization
Flexibility	High, but requires expertise	Lower, but easier to use
Use Cases	Custom pipelines, performance tuning	SQL queries, managed pipelines

When to choose procedural or declarative processing

The following table outlines some of the key decision points for procedural and declarative processing:

Procedural processing	Declarative processing
Fine-grained control over execution logic is required.	Simplified development and maintenance are priorities.
Transformations involve complex business rules that are difficult to express declaratively.	SQL-based transformations or managed workflows eliminate the need for procedural control.
Performance optimizations necessitate manual tuning.	Data processing frameworks such as Lakeflow Declarative Pipelines provide built-in optimizations.

What is procedural data processing?​

Characteristics of procedural processing​

Common use cases for procedural processing​

Procedural processing with Apache Spark and Lakeflow Jobs​

What is declarative data processing?​

Characteristics of declarative processing​

Common use cases for declarative processing​

Declarative processing with Lakeflow Declarative Pipelines​

Key differences: procedural vs. declarative processing​

When to choose procedural or declarative processing​