Get started: Databricks workspace onboarding
This article provides you with a 30-minute setup guide for your first Databricks workspace. The steps in this article will show you how to do the following:
Create your first Databricks workspace.
Create your first compute resource.
Load data into Databricks from your cloud storage.
Add users to the workspace.
Give users access to data so they can start working.
Requirements
To complete the instructions in this article, you need the following:
Permission in your AWS account to provision IAM roles and S3 buckets.
Available service quotas in your AWS region for a Databricks deployment. You need an available VPC and NAT gateway. You can view your available quotas and request increases using the AWS Service Quotas console.
Access to data stored in cloud object storage. This article provides instructions for S3 buckets.
Note
If you decide at any point to cancel your Databricks subscription, delete all associated resources from your AWS console to prevent continued costs. For instructions, see Cancel your Databricks subscription.
Step 1: Create your first workspace
After you sign up for the free trial and verify your email address, you’ll have access to your Databricks account.
When you first log in to your account, follow the instructions to set up your workspace. These instructions use a quickstart to create the workspace, which quickly provisions the cloud resources for you.
Enter a human-readable name for your workspace. This cannot be changed later.
Select the AWS region you want to deploy the workspace in. Remember to verify that you have an available VPC and NAT gateway in your cloud region.
Click Start Quickstart. This opens up your AWS Console where a prepopulated CloudFormation template will deploy your resources and workspace for you.
Select the I acknowledge that AWS CloudFormation might create IAM resources with custom names checkbox.
Warning
Editing additional fields in the template could lead to a failed deployment.
Click Create stack.
Return to the Databricks account console and wait for the workspace to finish deploying. It should take a only few minutes.
If you encounter any errors in the deployment process, email onboarding-help@databricks.com for troubleshooting help.
Note
If you are your organization’s cloud admin but will not be the day-to-day admin of your Databricks deployment, add a workspace admin to the account to take over the rest of the onboarding steps. See Manage users in your account.
Step 2: Create a compute resource
To interact with your data, users in your workspace need running compute resources. There are a few different types of compute resources available in Databricks. These instructions create a serverless SQL warehouse that all workspace users can run SQL queries on.
Note
While Databricks does not charge you during your free trial, AWS will charge you for the compute resources Databricks deploys to your linked AWS account.
Open your new workspace.
On the sidebar, click SQL Warehouses.
Click the Create SQL warehouse button.
Give the SQL warehouse a name.
Click Create.
In the permissions modal, enter and select
All Users
, then click Add.
Your serverless SQL warehouse should be up and running immediately be available for you to run SQL queries.
Step 3: Connect your workspace to data sources
To connect your Databricks workspace to your cloud storage, you need to create an external location. An external location is an object that combines a cloud storage path with the credential that authorizes access to the storage path.
In your Databricks workspace, click Catalog on the sidebar.
At the top of the page, click + Add.
Click Add an external location.
Databricks recommends using the AWS Quickstart, which ensures that your workspace is given the correct permissions on the bucket.
In Bucket Name, enter the name of the bucket you want to import data from.
Click Generate New Token and copy the token.
Click Launch in Quickstart.
In your AWS console, enter the copied token in the Databricks Personal Access Token field.
Select the I acknowledge that AWS CloudFormation might create IAM resources with custom names checkbox.
Click Create stack.
To see the external locations in your workspace, click Catalog in the sidebar, at the bottom of the left navigation pane click External Data, and then click External Locations. Your new external location will have a name using the following syntax: db_s3_external_databricks-S3-ingest-<id>
.
Step 4: Add your data to Databricks
Now that your workspace has a connection to your S3 bucket, you can add your data.
Part of this step is choosing where to put your data. Databricks has a three-level namespace that organizes your data (catalog.schema.table
). For this exercise, you’ll import the data into the default catalog named after your workspace.
In the sidebar of your Databricks workspace, click New > Add data.
Click Amazon S3.
Select your external location from the drop-down menu.
Select all the files you want to add to your Databricks catalog.
Click Preview table.
Select the default catalog (named after your workspace), the default schema, and then enter a name for your table.
Click Create Table.
You can now use Catalog Explorer in your workspace to see your data in Databricks.
Step 5: Add users to your workspace
Now that you have a running compute resource, a connection to your data, and data in the platform, you can start adding more users to your account.
These instructions show you how to add individual users to your account and workspace.
In the top bar of the Databricks workspace, click your username and then click Settings.
In the sidebar, click Identity and access.
Next to Users, click Manage.
Click Add user, and then click Add new.
Enter the user’s email address, and then click Add.
Continue to add as many users to your account as you would like. New users receive an email prompting them to set up their account.
Step 6: Grant permissions to users
Now that you have users in your account, you must grant them access to the data and resources they will need. There are many ways you can do this, and your preferred method probably depends on your data governance strategy.
The following are common considerations when setting up permissions for your users:
Securable objects in Databricks are hierarchical and privileges are inherited downward. For example, granting the
SELECT
privilege on a catalog or schema automatically grants the privilege to all current and future objects within the catalog or schema.If you grant a user the
SELECT
permission on a schema or table, the user also needs theUSE
permission on the objects above the schema or table.If you want to grant other users permission to connect to external data sources, you can grant them the
CREATE EXTERNAL LOCATION
andCREATE STORAGE CREDENTIAL
permission.
For instructions on managing permissions in Databricks, see Unity Catalog privileges and securable objects.