Data guides

The Databricks Data Intelligence Platform enables data practitioners throughout your organization to collaborate and productionize data solutions using shared, securely governed data assets and tools.

This article seeks to help you identify the correct starting point for your use case.

Many tasks on Databricks require elevated permissions. Many organizations restrict these elevated permissions to a small number of users or teams. This article disambiguates actions that can be completed by most workspace users from actions that are restricted to privileged users.

Workspace administrators can help you determine if you should be requesting access to assets or requesting elevated permissions.

Find and access data

This section provides a brief overview of tasks to help you discover data assets available to you. Most of these tasks assume that an admin has configured permissions on data assets. See Configure data access.

Feature area	Resources
Data discovery	For a more detailed overview of data discovery tasks, see Discover data.
Catalogs	Catalogs are the top level object in the Unity Catalog data governance model. Use the Catalog Explorer to find table, views, and other data assets. See Explore database objects. Standard catalogs contain Unity Catalog schemas, tables, volumes, models, and other database objects. See Create catalogs. Foreign catalogs contain federated tables from external systems. See Manage and work with foreign catalogs. The `hive_metastore` catalog object contains tables that use the built-in legacy Hive metastore instead of Unity Catalog for data governance. See Work with the legacy Hive metastore alongside Unity Catalog.
Connected storage	If you have access to compute resources, you can use built-in commands to explore files in connected storage. See Explore storage and find data files.
Upload local files	By default, users have permissions to upload small data files from your local machine such as CSVs. See Create or modify a table using file upload.

Work with data

This section provides an overview of common data tasks and the tools used to perform those tasks.

For all of the tasks described, users must have proper permissions to tools, compute resources, data, and other workspace artifacts. See Configure data access and Configure workspaces and infrastructure.

Feature area	Resources
Database objects	In addition to tables and views, Databricks uses other securable database objects such as volumes to securely govern data. See Database objects in Databricks.
Data permissions	Unity Catalog governs all read and write operations in enabled workspaces. You must have adequate permissions to complete these operations. See Securable objects in Unity Catalog.
ETL	Extract, transform, and load (ETL) workloads are among the most common uses for Apache Spark and Databricks, and most of the platform has features built and optimized for ETL. See Tutorial: Build an ETL pipeline with Lakeflow Declarative Pipelines.
Queries	All transformations, reports, analyses, or model training runs begin with a query against a table, view, or data files. You can query data using either batch or stream processing. See Query data. Perform ad hoc queries using the SQL query editor or notebooks to query tables, views, and other data assets. See Write queries and explore data in the SQL editor and Introduction to Databricks notebooks.
Dashboards & insights	AI/BI dashboards allow you to extract and visualize insights easily in the UI. See Dashboards. Genie spaces use text prompts to answer questions and provide insights informed by your data. See What is an AI/BI Genie space.
Ingest	Auto Loader can be used with Lakeflow Declarative Pipelines or Structured Streaming jobs to incrementally ingest data from cloud object storage. See What is Auto Loader?. You can use Lakeflow Declarative Pipelines or Structured Streaming to ingest data from message queues including Kafka. See Query streaming data.
Transformations	Databricks uses common syntax and tooling for transformations that range in complexity from SQL CTAS statements to near real-time streaming applications. To learn about using SQL queries for DDL and DML, see Access and manage saved queries. For an overview of PySpark, see PySpark on Databricks. For details on Structured Streaming, see Structured Streaming concepts.
AI and machine learning	The Databricks Data Intelligence Platform provides a suite of tools for data science, machine learning, and AI applications. See AI and machine learning on Databricks.

:::

Configure data access

Most Databricks workspaces rely on a workspace admin or other power users to configure connections to external data sources and enforce privileges to data assets based on team membership, region, or roles. This section provides an overview of common tasks for configuring and controlling data acess that require elevated permissions.

note

Before requesting elevated permissions to configure a new connection to a data source, confirm whether you are just missing privileges on an existing connection, catalog, or table. If a data source is not available, consult with your organization for the policy for adding new data to your workspace.

Feature area	Resources
Unity Catalog	Unity Catalog powers the data governance features built into the Databricks Data Intelligence Platform. See What is Unity Catalog?. Databricks account admins, workspace admins, and metastore admins have default privileges to manage Unity Catalog data privileges for users. See Manage privileges in Unity Catalog.
Connections and access	Configuring secure connections to cloud object storage is a keystone activity, and a pre-requisite for nearly all admin and end user related tasks. See Connect to cloud object storage using Unity Catalog. Configure connections to external systems using Lakehouse Federation. See Overview of Lakehouse Federation setup. Unity Catalog extends data governance to provide access from external systems using open source APIs. See Access Databricks data using external systems. Service credentials allow admins to link permissions defined in cloud providers to Unity Catalog, allowing users to leverage these credentials when developing workloads with integrated systems. See Create service credentials.
Sharing	Delta Sharing is the core of the Databricks secure data sharing platform, which includes Databricks Marketplace and Clean Rooms. See Share data and AI assets securely with users in other organizations. Admins can create new catalogs. Catalogs provide a high-level abstraction for data isolation and can either be tied to individual workspaces or shared across all workspaces in an account. See Create catalogs.- AI/BI dashboards encourage owners to embed their credentials when publishing, ensuring that viewers can gain insights from shared results. For details, see Share a dashboard.

Configure workspaces and infrastructure

This section provides an overview of common tasks associated with adminstering workspace assets and infrastructure. Broadly defined, workspace assets include the following:

Compute resources: Compute resources include all-purpose interactive clusters, SQL warehouses, job clusters, and pipeline compute. A user or workload must have permissions to connect to running compute resources in order to process specified logic.

note
Users who do not have access to connect to any compute resources have very limited functionality on Databricks.
Platform tools: The Databricks Data Intelligence Platform provides a suite of tools tailored to different use cases and personas, such as notebooks, Databricks SQL, and Mosaic AI. Admins can customize settings that include default behaviors, optional features, and user access for many of these tools.
Artifacts: Artifacts include notebooks, queries, dashboards, files, libraries, pipelines, and jobs. Artifacts contain code and configurations that users author in order to perform desired actions on their data.

important

The user who creates a workspace asset is assigned the owner role by default. For most assets, owners can grant permissions to any other user or group in the workspace.

To ensure that data and code are secure, Databricks recommends configuring the owner role for all artifacts and compute resources deployed to a production workspace.

Feature area	Resources
Workspace entitlements	Workspace entitlements include basic workspace access, access to Databricks SQL, and unrestricted cluster creation. See Manage entitlements.
Compute resource access & policies	Most costs on Databricks are for compute resources. Controlling which users have the ability to configure, deploy, start, and use various resources is vital to controlling costs. See Connect to all-purpose and jobs compute. Compute policies work in tandem with workspace compute entitlements to ensure that entitled users only deploy compute resources following specified configuration rules. See Create and manage compute policies. Admins can configure default behaviors, data access policies, and user access to SQL warehouses. See SQL warehouse admin settings.
Platform tools	Use the admin console to configure behaviors ranging from customizing workspace appearance to enabling or disabling products and features. See Manage your workspace.
Workspace ACLs	Workspace access control lists (ACLs) govern how users and groups can interact with workspace assets including compute resources, code artifacts, and jobs. See Access control lists.

Productionize workloads

All Databricks products are built to accelerate the path from development to production, and for scale and stability. This section provides a brief introduction to the suite of tools recommended for getting workloads into production.

Feature area	Resources
ETL pipelines	Lakeflow Declarative Pipelines provides a declarative syntax for building and productionizing ETL pipelines. See Lakeflow Declarative Pipelines.
Orchestration	Jobs allows you to define complex workflows with dependencies, triggers, and schedules. See Lakeflow Jobs.
CI/CD	Databricks Asset Bundles make it easy to manage and deploy data, assets, and artifacts across workspaces. See What are Databricks Asset Bundles?.

Find and access data​

Work with data​

Configure data access​

Configure workspaces and infrastructure​

Productionize workloads​

Find and access data

Work with data

Configure data access

Configure workspaces and infrastructure

Productionize workloads