Skip to main content

Procedural vs. declarative data processing in Databricks

This article covers the differences between procedural and declarative programming and their usage in Databricks.

Procedural and declarative programming are two fundamental programming paradigms in computer science. Each represents a different approach to structuring and executing instructions.

  • With procedural programming you specify how tasks should be accomplished by defining explicit sequences of operations.
  • Declarative programming focuses on what needs to be achieved, leaving the underlying system to determine the best way to execute the task.

When designing data pipelines, engineers must choose between procedural and declarative data processing models. This decision impacts workflow complexity, maintainability, and efficiency. This page explains these models' key differences, advantages and challenges, and when to use each approach.

What is procedural data processing?

Procedural data processing follows a structured approach where explicit steps are defined to manipulate data. This model is closely aligned with imperative programming, emphasizing a command sequence that dictates how the data should be processed.

Characteristics of procedural processing

The following are characteristics of procedural processing:

  • Step-by-step execution: The developer explicitly defines the order of operations.
  • Use of control structures: Loops, conditionals, and functions manage execution flow.
  • Detailed resource control: Enables fine-grained optimizations and manual performance tuning.
  • Related concepts: Procedural programming is a sub-class of imperative programming.

Common use cases for procedural processing

The following are everyday use cases for procedural processing:

  • Custom ETL pipelines requiring procedural logic.
  • Low-level performance optimizations in batch and streaming workflows.
  • Legacy systems or existing imperative scripts.

Procedural processing with Apache Spark and Databricks Jobs

Apache Spark primarily follows a procedural model for data processing. Use Databricks Jobs to add explicit execution logic to define step-by-step transformations and actions on distributed data.

What is declarative data processing?

Declarative data processing abstracts the how and focuses on defining the desired result. Instead of specifying step-by-step instructions, developers define transformation logic, and the system determines the most efficient execution plan.

Characteristics of declarative processing

The following are characteristics of declarative processing:

  • Abstraction of execution details: Users describe the desired outcome, not the steps to achieve it.
  • Automatic optimization: The system applies query planning and execution tuning.
  • Reduced complexity: Removes the need for explicit control structures, improving maintainability.
  • Related concepts: Declarative programming includes domain-specific and functional programming paradigms.

Common use cases for declarative processing

The following are common use cases for declarative processing:

  • SQL-based transformations in batch and streaming workflows.
  • High-level data processing frameworks such as Delta Live Tables (DLT).
  • Scalable, distributed data workloads requiring automated optimizations.

Declarative processing with DLT

DLT is a declarative framework designed to simplify the creation of reliable and maintainable stream processing pipelines. By specifying what data to ingest and how to transform it, DLT automates key aspects of pipeline management, including orchestration, compute management, monitoring, data quality enforcement, and error handling.

Key differences: procedural vs. declarative processing

Aspect

Procedural processing

Declarative processing

Control

Full control over execution

Execution handled by system

Complexity

Can be complex and verbose

Generally simpler and more concise

Optimization

Requires manual tuning

System handles optimization

Flexibility

High, but requires expertise

Lower, but easier to use

Use Cases

Custom pipelines, performance tuning

SQL queries, managed pipelines

When to choose procedural or declarative processing

The following table outlines some of the key decision points for procedural and declarative processing:

Procedural processing

Declarative processing

Fine-grained control over execution logic is required.

Simplified development and maintenance are priorities.

Transformations involve complex business rules that are difficult to express declaratively.

SQL-based transformations or managed workflows eliminate the need for procedural control.

Performance optimizations necessitate manual tuning.

Data processing frameworks such as DLT provide built-in optimizations.