Databricks data engineering
Databricks data engineering features include a robust environment for collaboration among data scientists, engineers, and analysts. Data engineering tasks are also the backbone of Databricks machine learning solutions.
Note
If you are a data analyst who works primarily with SQL queries and BI tools, you might prefer Databricks SQL.
The data engineering documentation provides how-to guidance to help you get the most out of the Databricks collaborative analytics platform. For getting started tutorials and introductory information, see Get started: Account and workspace setup and What is Databricks?.
- Delta Live Tables
Learn how to build data pipelines for ingestion and transformation with Databricks Delta Live Tables.
- Structured Streaming
Learn about streaming, incremental, and real-time workloads powered by Structured Streaming on Databricks.
- Apache Spark
Learn how Apache Spark works on Databricks and the Databricks platform.
- Notebooks
Learn what a Databricks notebook is, and how to use and manage notebooks to process, analyze, and visualize your data.
- Work with files
Learn about options for working with files on Databricks.
- Git folders
Learn how to use Git to version control your notebooks and other files for development in Databricks.
- Libraries
Learn how to make third-party or custom code available in Databricks using libraries. Learn about the different modes for installing libraries on Databricks.
- Init scripts
Learn how to use initialization (init) scripts to install packages and libraries, set system properties and environment variables, modify Apache Spark config parameters, and set other configurations on Databricks clusters.
- Migration
Learn how to migrate data applications such as ETL jobs, enterprise data warehouses, ML, data science, and analytics to Databricks.
- Optimization & performance
Learn about optimizations and performance recommendations on Databricks.
- DBFS
Learn about Databricks File System (DBFS), a distributed file system mounted into a Databricks workspace and available on Databricks clusters