Connect to Amazon S3

This article explains how to connect to AWS S3 from Databricks.

Databricks recommends using Unity Catalog volumes or external locations to connect to S3. See Recommendations for using external locations.

Connect to S3 with Unity Catalog

External locations and storage credentials allow Unity Catalog to read and write data in S3 on behalf of users. Administrators primarily use external locations to configure Unity Catalog external tables.

A storage credential is a Unity Catalog object used for authentication to S3. It is an IAM role that authorizes reading from and writing to an S3 bucket path. An external location is an object that combines a cloud storage path with a storage credential.

Who can create and manage volumes?

To create volumes, you must have the following privileges:

  • USE SCHEMA and CREATE VOLUME on the schema

  • USE CATALOG on the catalog

  • (External volumes only) CREATE EXTERNAL LOCATION on the external location

After you create a volume, the following principals can manage volume privileges:

  • The owner of the parent catalog.

  • The owner of the parent schema.

  • The owner of the volume.

Who can create and manage external locations and storage credentials?

The AWS user who creates the IAM role for the storage credential must:

  • Be an AWS account user with permission to create or update IAM roles, IAM policies, S3 buckets, and cross-account trust relationships.

The Databricks user who creates the storage credential in Unity Catalog must:

  • Be a Databricks account admin, a metastore admin, or a user with the CREATE STORAGE CREDENTIAL privilege.

The Databricks user who creates the external location in Unity Catalog must:

  • Be a metastore admin or a user with the CREATE EXTERNAL LOCATION privilege.

After you create an external location in Unity Catalog, you can can grant the following permissions on it:




These permissions enable Databricks users to access data in S3 without managing cloud storage credentials for authentication.

For more information, see Manage external locations and storage credentials.

Access S3 buckets with Unity Catalog volumes or external locations

Use the volume path or the fully qualified S3 URI to access data secured with Unity Catalog. Because permissions are managed by Unity Catalog, you do not need to pass any additional options or configurations for authentication.

Volume paths follow the pattern /Volumes/<catalog>/<schema>/<volume>/<path>/<file-name>.

S3 URIs follow the pattern s3://<bucket>/<external-location>/<path>/<file-name>.


Unity Catalog ignores Spark configuration settings when accessing data managed by external locations.

Examples of reading:"s3://my-bucket/external-location/path/to/data")"parquet").load("s3://my-bucket/external-location/path/to/data")

spark.sql("SELECT * FROM parquet.`s3://my-bucket/external-location/path/to/data`")

Examples of writing:"s3://my-bucket/external-location/path/to/data", "s3://my-bucket/external-location/path/to/new-location")


Examples of creating external tables:

df.write.option("path", "s3://my-bucket/external-location/path/to/table").saveAsTable("my_table")

  CREATE TABLE my_table
  LOCATION "s3://my-bucket/external-location/path/to/table"
    FROM parquet.`s3://my-bucket/external-location/path/to/data`)

Access S3 buckets using instance profiles

You can load IAM roles as instance profiles in Databricks and attach instance profiles to clusters to control data access to S3. Databricks recommends using instance profiles when Unity Catalog is unavailable for your environment or workload. For a tutorial on using instance profiles with Databricks, see Configure S3 access with instance profiles.

The AWS user who creates the IAM role must:

  • Be an AWS account user with permission to create or update IAM roles, IAM policies, S3 buckets, and cross-account trust relationships.

The Databricks user who adds the IAM role as an instance profile in Databricks must:

  • Be a workspace admin

Once you add the instance profile to your workspace, you can grant users, groups, or service principals have permissions to launch clusters with the instance profile. See Manage instance profiles in Databricks.

Use both cluster access control and notebook access control together to protect access to the instance profile. See Cluster access control and Collaborate using Databricks notebooks.

Access S3 buckets with URIs and AWS keys

You can set Spark properties to configure a AWS keys to access S3.

Databricks recommends using secret scopes for storing all credentials. You can grant users, service principals, and groups in your workspace access to read the secret scope. This protects the AWS key while allowing users to access S3. To create a secret scope, see Secret scopes.

The credentials can be scoped to either a cluster or a notebook. Use both cluster access control and notebook access control together to protect access to S3. See Cluster access control and Collaborate using Databricks notebooks.

To set Spark properties, use the following snippet in a cluster’s Spark configuration to set the AWS keys stored in secret scopes as environment variables:


You can then read from S3 using the following commands:

aws_bucket_name = "my-s3-bucket"

df ="s3a://{aws_bucket_name}/flowers/delta/")

Access S3 with open-source Hadoop options

Databricks Runtime supports configuring the S3A filesystem using open-source Hadoop options. You can configure global properties and per-bucket properties.

Global configuration

# Global S3 configuration <aws-credentials-provider-class>
spark.hadoop.fs.s3a.endpoint <aws-endpoint>
spark.hadoop.fs.s3a.server-side-encryption-algorithm SSE-KMS

Per-bucket configuration

You configure per-bucket properties using the syntax spark.hadoop.fs.s3a.bucket.<bucket-name>.<configuration-key>. This lets you set up buckets with different credentials, endpoints, and so on.

For example, in addition to global S3 settings you can configure each bucket individually using the following keys:

# Set up authentication and endpoint for a specific bucket
spark.hadoop.fs.s3a.bucket.<bucket-name>.aws.credentials.provider <aws-credentials-provider-class>
spark.hadoop.fs.s3a.bucket.<bucket-name>.endpoint <aws-endpoint>

# Configure a different KMS encryption key for a specific bucket
spark.hadoop.fs.s3a.bucket.<bucket-name>.server-side-encryption.key <aws-kms-encryption-key>

Access Requester Pays buckets

To enable access to Requester Pays buckets, add the following line to your cluster’s Spark configuration:

spark.hadoop.fs.s3a.requester-pays.enabled true


Databricks does not support Delta Lake writes to Requester Pays buckets.

Deprecated patterns for storing and accessing data from Databricks

The following are deprecated storage patterns:


  • The S3A filesystem enables caching by default and releases resources on ‘FileSystem.close()’. To avoid other threads using a reference to the cached file system incorrectly, do not explicitly use the ‘FileSystem.close().

  • The S3A filesystem does not remove directory markers when closing an output stream. Legacy applications based on Hadoop versions that do not include HADOOP-13230 can misinterpret them as empty directories even if there are files inside.