Data engineering with Databricks

Databricks provides Lakeflow, an end-to-end data engineering solution that empowers data engineers, software developers, SQL developers, analysts, and data scientists to deliver high-quality data for downstream analytics, AI, and operational applications. Lakeflow is a unified solution for ingestion, transformation, and orchestration of your data, and includes Lakeflow Connect, Lakeflow Declarative Pipelines, and Lakeflow Jobs.

Lakeflow Connect

Lakeflow Connect simplifies data ingestion with connectors to popular enterprise applications, databases, cloud storage, message buses, and local files. See Lakeflow Connect.

Feature	Description
Managed connectors	Managed connectors provide a simple UI and a configuration-based ingestion service with minimum operational overhead, without requiring you to use the underlying Lakeflow Declarative Pipelines APIs and infrastructure.
Standard connectors	Standard connectors provide the ability to access data from a wider range of data sources from within your Lakeflow Declarative Pipelines or other queries.

Lakeflow Declarative Pipelines

Lakeflow Declarative Pipelines is a declarative framework that lowers the complexity of building and managing efficient batch and streaming data pipelines. Lakeflow Declarative Pipelines runs on the performance-optimized Databricks Runtime. In addition, Lakeflow Declarative Pipelines automatically orchestrates the execution of flows, sinks, streaming tables, and materialized views by encapsulating and running them as a pipeline. See Lakeflow Declarative Pipelines.

Feature	Description
Flows	Flows process data in Lakeflow Declarative Pipelines. The flows API uses the same DataFrame API as Apache Spark and Structured Streaming. A flow can write into streaming tables and sinks, such as a Kafka topic, using streaming semantics, or it can write to a materialized view using batch semantics.
Streaming tables	A streaming table is a Delta table with additional support for streaming or incremental data processing. It acts as a target for one or more flows in Lakeflow Declarative Pipelines.
Materialized views	A materialized view is a view with cached results for faster access. A materialized view acts as a target for Lakeflow Declarative Pipelines.
Sinks	Lakeflow Declarative Pipelines support external data sinks as targets. These sinks can include event streaming services, like Apache Kafka or Azure Event Hubs, as well as external tables managed by Unity Catalog.

Lakeflow Jobs

Lakeflow Jobs provide reliable orchestration and production monitoring for any data and AI workload. A job can consist of one or more tasks that run notebooks, pipelines, managed connectors, SQL queries, machine learning training, and model deployment and inference. Jobs also support custom control flow logic, such as branching with if / else statements, and looping with for each statements. See Lakeflow Jobs.

Feature	Description
Jobs	Jobs are the primary resource for orchestration. They represent a process that you want to perform on a scheduled basis.
Tasks	A specific unit of work within a job. There are a variety of task types that give you a range of options that can be performed within a job.
Control flow in jobs	Control flow tasks allow you to control whether to run other tasks, or the order of tasks to run.

Databricks Runtime for Apache Spark

The Databricks Runtime is a reliable and performance-optimized compute environment for running Spark workloads, including batch and streaming. Databricks Runtime provides Photon, a high-performance Databricks-native vectorized query engine, and various infrastructure optimizations like autoscaling. You can run your Spark and Structured Streaming workloads on the Databricks Runtime by building your Spark programs as notebooks, JARs, or Python wheels. See Databricks Runtime for Apache Spark.

Feature	Description
Apache Spark on Databricks	Spark is at the heart of the Databricks Data Intelligence Platform.
Structured Streaming	Structured Streaming is the Spark near real-time processing engine for streaming data.

What happened to Delta Live Tables (DLT)?

The product formerly known as Delta Live Tables (DLT) is now Lakeflow Declarative Pipelines. There is no migration required to use Lakeflow Declarative Pipelines.

note

There are still some references to the DLT name in Databricks. The classic SKUs for Lakeflow Declarative Pipelines still begin with DLT, and APIs with DLT in the name have not changed.

Additional resources

Data engineering concepts describes data engineering concepts in Databricks.
Delta Lake is the optimized storage layer that provides the foundation for tables in a lakehouse in Databricks.
Data engineering best practices teaches you about best practices for data engineering in Databricks.
Databricks notebooks are a popular tool for collaboration and development.
Databricks SQL describes using SQL queries and BI tools in Databricks.
Databricks Mosaic AI describes architecting machine learning solutions.

Lakeflow Connect​

Lakeflow Declarative Pipelines​

Lakeflow Jobs​

Databricks Runtime for Apache Spark​

What happened to Delta Live Tables (DLT)?​

Additional resources​