Introduction to Databricks

Databricks is a cloud-based collaborative data science, data engineering, and data analytics platform that combines the best of data warehouses and data lakes into a lakehouse architecture.

These articles can help you understand the key concepts and features of the Databricks platform

  • Lakehouse

    Use the Databricks Lakehouse for ACID transactions, data governance, ETL, BI, and machine learning.

  • Data objects

    Learn about catalogs, databases, tables, and views in Databricks.

  • Concepts

    Learn fundamental Databricks concepts such as workspaces, data objects, clusters, machine learning models, and access.

  • Architecture

    Get a high-level overview of Databricks and its enterprise architecture.

  • Language-specific overviews

    Learn how to use Python, SQL, R, and Scala to perform collaborative data science, data engineering, and data analysis in Databricks.

  • Supported clouds and regions

    Learn about the cloud platforms and regions supported by Databricks.

  • Sample datasets (databricks-datasets)

    Databricks includes a variety of datasets mounted to the Databricks File System (DBFS) that you can use to test your queries and models. These datasets are used in examples throughout the documentation.

  • Apache Spark

    New to Apache Spark? Write your first Apache Spark application, create a DataFrame and Dataset, do some basic machine learning, and learn how to handle streaming data.

  • Delta Lake

    Learn about the Delta Lake storage layer and optimizations available with Delta Lake on Databricks.

  • Integrations

    Learn how to connect Databricks to data sources, BI tools, and developer tools.

  • Support

    Learn about Databricks support options.

  • Free training

    Try out our self-paced training, free to Databricks customers.