Sign up for Databricks using your existing AWS account

Note

Databricks serverless compute for notebooks, jobs, and Delta Live Tables is not supported with this free trial type. If you would like to use these features, use the express setup or upgrade your trial and it will be available after the 14-day trial period. To learn more about Databricks serverless compute, see Connect to serverless compute.

This article explains how to sign up for a Databricks free trial and set up your first workspace. These steps apply to accounts created through the AWS Marketplace.

This article will show you how to do the following:

  • Sign up for the Databricks free trial.

  • Create your first Databricks workspace.

  • Create your first compute resource.

  • Load data into Databricks from your cloud storage.

  • Add users to the workspace.

  • Give users access to data so they can start working.

Requirements

To complete the instructions in this article, you need the following:

  • Permission in your AWS account to provision IAM roles and S3 buckets.

  • Available service quotas in your AWS region for a Databricks deployment. You need an available VPC and NAT gateway. You can view your available quotas and request increases using the AWS Service Quotas console.

  • Access to data stored in cloud object storage. This article provides instructions for S3 buckets.

Note

If you decide at any point to cancel your Databricks subscription, delete all associated resources from your AWS console to prevent continued costs. For instructions, see Cancel your Databricks subscription.

Step 1: Sign up for the free trial and create your first workspace

Note

While Databricks does not charge you during your free trial, AWS will charge you for AWS resources used during and after the free trial.

When you sign up for Databricks through the AWS Marketplace, AWS will create your account and deploy your first workspace for you.

  1. Go to Databricks page in AWS Marketplace.

  2. Follow the steps on AWS to sign up for the Databricks free trial and deploy your first workspace.

  3. Log in to your new workspace through the account console.

If you encounter any errors in the deployment process, email onboarding-help@databricks.com for troubleshooting help.

Step 2: Create a compute resource

To interact with your data, users in your workspace need running compute resources. There are a few different types of compute resources available in Databricks. These instructions create a serverless SQL warehouse that all workspace users can run SQL queries on.

  1. Open your new workspace.

  2. On the sidebar, click SQL Warehouses.

  3. Click the Create SQL warehouse button.

  4. Give the SQL warehouse a name.

  5. Click Create.

  6. In the permissions modal, enter and select All Users, then click Add.

Your serverless SQL warehouse should be up and running immediately be available for you to run SQL queries.

Step 3: Connect your workspace to data sources

To connect your Databricks workspace to your cloud storage, you need to create an external location. An external location is an object that combines a cloud storage path with the credential that authorizes access to the storage path.

  1. In your Databricks workspace, click Catalog on the sidebar.

  2. At the top of the page, click + Add.

  3. Click Add an external location.

  4. Databricks recommends using the AWS Quickstart, which ensures that your workspace is given the correct permissions on the bucket.

  5. In Bucket Name, enter the name of the bucket you want to import data from.

  6. Click Generate New Token and copy the token.

  7. Click Launch in Quickstart.

  8. In your AWS console, enter the copied token in the Databricks Personal Access Token field.

  9. Select the I acknowledge that AWS CloudFormation might create IAM resources with custom names checkbox.

  10. Click Create stack.

To see the external locations in your workspace, click Catalog in the sidebar, at the bottom of the left navigation pane click External Data, and then click External Locations. Your new external location will have a name using the following syntax: db_s3_external_databricks-S3-ingest-<id>.

Note

The other external location you see connects your workspace to the default S3 bucket provisioned alongside your workspace. This external location shares a name with your workspace.

Test your connection

To test that external locations have functioning connections, do the following:

  1. Click on the external location you want to test.

  2. Click Test connection.

Step 4: Add your data to Databricks

Now that your workspace has a connection to your S3 bucket, you can add your data.

Part of this step is choosing where to put your data. Databricks has a three-level namespace that organizes your data (catalog.schema.table). For this exercise, you’ll import the data into the default catalog named after your workspace.

  1. In the sidebar of your Databricks workspace, click New > Add data.

  2. Click Amazon S3.

  3. Select your external location from the drop-down menu.

  4. Select all the files you want to add to your Databricks catalog.

  5. Click Preview table.

  6. Select the default catalog (named after your workspace), the default schema, and then enter a name for your table.

  7. Click Create Table.

You can now use Catalog Explorer in your workspace to see your data in Databricks.

Step 5: Add users to your workspace

Now that you have a running compute resource, a connection to your data, and data in the platform, you can start adding more users to your account.

These instructions show you how to add individual users to your account and workspace.

  1. In the top bar of the Databricks workspace, click your username and then click Settings.

  2. In the sidebar, click Identity and access.

  3. Next to Users, click Manage.

  4. Click Add user, and then click Add new.

  5. Enter the user’s email address, and then click Add.

Continue to add as many users to your account as you would like. New users receive an email prompting them to set up their account.

Step 6: Grant permissions to users

Now that you have users in your account, you must grant them access to the data and resources they will need. There are many ways you can do this, and your preferred method probably depends on your data governance strategy.

The following are common considerations when setting up permissions for your users:

  • Securable objects in Databricks are hierarchical and privileges are inherited downward. For example, granting the SELECT privilege on a catalog or schema automatically grants the privilege to all current and future objects within the catalog or schema.

  • If you grant a user the SELECT permission on a schema or table, the user also needs the USE permission on the objects above the schema or table.

  • If you want to grant other users permission to connect to external data sources, you can grant them the CREATE EXTERNAL LOCATION and CREATE STORAGE CREDENTIAL permission.

For instructions on managing permissions in Databricks, see Unity Catalog privileges and securable objects.

Next steps

The users in your account should now be able to access and query data in your Databricks workspace.

From here, you can continue to explore Databricks and build out your data strategy. Popular topics include: