Databricks Data Science & Engineering guide

Databricks Data Science & Engineering is the classic Databricks environment for collaboration among data scientists, data engineers, and data analysts. It also forms the backbone of the Databricks Machine Learning environment.

Note

If you are a data analyst who works primarily with SQL queries and BI tools, you may prefer the Databricks SQL persona-based environment.

The Databricks Data Science & Engineering guide provides how-to guidance to help you get the most out of the Databricks collaborative analytics platform. For getting started tutorials and introductory information, see Get started with Databricks and Introduction to Databricks.

  • Navigate the workspace

    Learn how to navigate a Databricks workspace and access the assets available in the workspace.

  • DataFrames and Datasets

    Learn how to use Apache Spark DataFrames and Datasets in Databricks.

  • What is Apache Spark Structured Streaming?

    Learn how to use Apache Spark Structured Streaming to express computation on streaming data in Databricks.

  • Runtimes

    Learn about the types of Databricks runtimes and runtime contents.

  • Clusters

    Learn about Databricks clusters and how to create and manage them.

  • Notebooks

    Learn how to manage and use notebooks in Databricks.

  • Workflows

    Learn how to work with data processing tools and frameworks in Databricks.

  • Libraries

    Learn how to use and manage libraries in Databricks.

  • Git integration with Databricks Repos

    Learn how to use Git to manage Databricks notebooks and workspace folders as co-versioned Databricks Repos.

  • Databricks File System (DBFS)

    Learn about Databricks File System (DBFS), a distributed file system mounted into a Databricks workspace and available on Databricks clusters

  • Migration

    Learn how to migrate workloads to Databricks.

  • Applications: Genomics

    Learn how to work with genomic data using Databricks and Glow.