Data guides
The Databricks Data Intelligence Platform enables data practitioners throughout your organization to collaborate and productionize data solutions using shared, securely governed data assets and tools.
This article seeks to help you identify the correct starting point for your use case.
Many tasks on Databricks require elevated permissions. Many organizations restrict these elevated permissions to a small number of users or teams. This article disambiguates actions that can be completed by most workspace users from actions that are restricted to privileged users.
Workspace administrators can help you determine if you should be requesting access to assets or requesting elevated permissions.
Find and access data
This section provides a brief overview of tasks to help you discover data assets available to you. Most of these tasks assume that an admin has configured permissions on data assets. See Configure data access.
Feature area | Resources |
---|---|
Data discovery | For a more detailed overview of data discovery tasks, see Discover data. |
Catalogs | Catalogs are the top level object in the Unity Catalog data governance model. Use the Catalog Explorer to find table, views, and other data assets. See Explore database objects.
|
Connected storage | If you have access to compute resources, you can use built-in commands to explore files in connected storage. See Explore storage and find data files. |
Upload local files | By default, users have permissions to upload small data files from your local machine such as CSVs. See Create or modify a table using file upload. |
Work with data
This section provides an overview of common data tasks and the tools used to perform those tasks.
For all of the tasks described, users must have proper permissions to tools, compute resources, data, and other workspace artifacts. See Configure data access and Configure workspaces and infrastructure.
Feature area | Resources |
---|---|
Database objects | In addition to tables and views, Databricks uses other securable database objects such as volumes to securely govern data. See Database objects in Databricks. |
Data permissions | Unity Catalog governs all read and write operations in enabled workspaces. You must have adequate permissions to complete these operations. See Securable objects in Unity Catalog. |
ETL | Extract, transform, and load (ETL) workloads are among the most common uses for Apache Spark and Databricks, and most of the platform has features built and optimized for ETL. See Run your first ETL workload on Databricks. |
Queries |
|
Dashboards & insights |
|
Ingest |
|
Transformations | Databricks uses common syntax and tooling for transformations that range in complexity from SQL CTAS statements to near real-time streaming applications. For an overview of data transformations, see What is data transformation on Databricks?.
|
AI and machine learning | The Databricks Data Intelligence Platform provides a suite of tools for data science, machine learning, and AI applications. See AI and machine learning on Databricks. |
:::
Configure data access
Most Databricks workspaces rely on a workspace admin or other power users to configure connections to external data sources and enforce privileges to data assets based on team membership, region, or roles. This section provides an overview of common tasks for configuring and controlling data acess that require elevated permissions.
Before requesting elevated permissions to configure a new connection to a data source, confirm whether you are just missing privileges on an existing connection, catalog, or table. If a data source is not available, consult with your organization for the policy for adding new data to your workspace.
Feature area | Resources |
---|---|
Unity Catalog |
|
Connections and access |
|
Sharing |
|
Configure workspaces and infrastructure
This section provides an overview of common tasks associated with adminstering workspace assets and infrastructure. Broadly defined, workspace assets include the following:
-
Compute resources: Compute resources include all-purpose interactive clusters, SQL warehouses, job clusters, and pipeline compute. A user or workload must have permissions to connect to running compute resources in order to process specified logic.
noteUsers who do not have access to connect to any compute resources have very limited functionality on Databricks.
-
Platform tools: The Databricks Data Intelligence Platform provides a suite of tools tailored to different use cases and personas, such as notebooks, Databricks SQL, and Mosaic AI. Admins can customize settings that include default behaviors, optional features, and user access for many of these tools.
-
Artifacts: Artifacts include notebooks, queries, dashboards, files, libraries, pipelines, and jobs. Artifacts contain code and configurations that users author in order to perform desired actions on their data.
The user who creates a workspace asset is assigned the owner role by default. For most assets, owners can grant permissions to any other user or group in the workspace.
To ensure that data and code are secure, Databricks recommends configuring the owner role for all artifacts and compute resources deployed to a production workspace.
Feature area | Resources |
---|---|
Workspace entitlements | Workspace entitlements include basic workspace access, access to Databricks SQL, and unrestricted cluster creation. See Manage entitlements. |
Compute resource access & policies |
|
Platform tools | Use the admin console to configure behaviors ranging from customizing workspace appearance to enabling or disabling products and features. See Manage your workspace. |
Workspace ACLs | Workspace access control lists (ACLs) govern how users and groups can interact with workspace assets including compute resources, code artifacts, and jobs. See Access control lists. |
Productionize workloads
All Databricks products are built to accelerate the path from development to production, and for scale and stability. This section provides a brief introduction to the suite of tools recommended for getting workloads into production.
Feature area | Resources |
---|---|
ETL pipelines | DLT pipelines provides a declarative syntax for building and productionizing ETL pipelines. See What is DLT?. |
Orchestration | Jobs allows you to define complex workflows with dependencies, triggers, and schedules. See Orchestration using Databricks Jobs. |
CI/CD | Databricks Asset Bundles make it easy to manage and deploy data, assets, and artifacts across workspaces. See What are Databricks Asset Bundles?. |