Mounting cloud object storage on Databricks

important

Mounts are a legacy access pattern. Databricks recommends using Unity Catalog for managing all data access. See Connect to cloud object storage using Unity Catalog.

Databricks enables users to mount cloud object storage to the Databricks File System (DBFS) to simplify data access patterns for users that are unfamiliar with cloud concepts. Mounted data does not work with Unity Catalog, and Databricks recommends migrating away from using mounts and instead managing data governance with Unity Catalog.

How does Databricks mount cloud object storage?

Databricks mounts create a link between a workspace and cloud object storage, which enables you to interact with cloud object storage using familiar file paths relative to the Databricks file system. Mounts work by creating a local alias under the /mnt directory that stores the following information:

Location of the cloud object storage.
Driver specifications to connect to the storage account or container.
Security credentials required to access the data.

What is the syntax for mounting storage?

The source specifies the URI of the object storage (and can optionally encode security credentials). The mount_point specifies the local path in the /mnt directory. Some object storage sources support an optional encryption_type argument. For some access patterns you can pass additional configuration specifications as a dictionary to extra_configs.

note

Databricks recommends setting mount-specific Spark and Hadoop configuration as options using extra_configs. This ensures that configurations are tied to the mount rather than the cluster or session.

Python
dbutils.fs.mount(
  source: str,
  mount_point: str,
  encryption_type: Optional[str] = "",
  extra_configs: Optional[dict[str:str]] = None
)

Check with your workspace and cloud administrators before configuring or altering data mounts, as improper configuration can provide unsecured access to all users in your workspace.

note

In addition to the approaches described in this article, you can automate mounting a bucket with the Databricks Terraform provider and databricks_mount.

Unmount a mount point

To unmount a mount point, use the following command:

Python
dbutils.fs.unmount("/mnt/<mount-name>")

warning

To avoid errors, never modify a mount point while other jobs are reading or writing to it. After modifying a mount, always run dbutils.fs.refreshMounts() on all other running clusters to propagate any mount updates. See refreshMounts command (dbutils.fs.refreshMounts).

Mount an S3 bucket

You can mount an S3 bucket through What is DBFS?. The mount is a pointer to an S3 location, so the data is never synced locally.

After a mount point is created through a cluster, users of that cluster can immediately access the mount point. To use the mount point in another running cluster, you must run dbutils.fs.refreshMounts() on that running cluster to make the newly created mount point available.

You can use the following methods to mount an S3 bucket:

Mount a bucket using an AWS instance profile
Mount a bucket using AWS keys
Mount a bucket using instance profiles with the AssumeRole policy

Mount a bucket using an AWS instance profile

You can manage authentication and authorization for an S3 bucket using an AWS instance profile. Access to the objects in the bucket is determined by the permissions granted to the instance profile. If the role has write access, users of the mount point can write objects in the bucket. If the role has read access, users of the mount point will be able to read objects in the bucket.

Configure your cluster with an instance profile.

Mount the bucket.

Python
Scala

Python
aws_bucket_name = "<aws-bucket-name>"
mount_name = "<mount-name>"
dbutils.fs.mount(f"s3a://{aws_bucket_name}", f"/mnt/{mount_name}")
display(dbutils.fs.ls(f"/mnt/{mount_name}"))

Scala
val AwsBucketName = "<aws-bucket-name>"
val MountName = "<mount-name>"

dbutils.fs.mount(s"s3a://$AwsBucketName", s"/mnt/$MountName")
display(dbutils.fs.ls(s"/mnt/$MountName"))

Mount a bucket using AWS keys

You can mount a bucket using AWS keys.

important

When you mount an S3 bucket using keys, all users have read and write access to all the objects in the S3 bucket.

The following examples use Databricks secrets to store the keys. You must URL escape the secret key.

Python
Scala

Python
access_key = dbutils.secrets.get(scope = "aws", key = "aws-access-key")
secret_key = dbutils.secrets.get(scope = "aws", key = "aws-secret-key")
encoded_secret_key = secret_key.replace("/", "%2F")
aws_bucket_name = "<aws-bucket-name>"
mount_name = "<mount-name>"

dbutils.fs.mount(f"s3a://{access_key}:{encoded_secret_key}@{aws_bucket_name}", f"/mnt/{mount_name}")
display(dbutils.fs.ls(f"/mnt/{mount_name}"))

Scala
val AccessKey = dbutils.secrets.get(scope = "aws", key = "aws-access-key")
// Encode the Secret Key as that can contain "/"
val SecretKey = dbutils.secrets.get(scope = "aws", key = "aws-secret-key")
val EncodedSecretKey = SecretKey.replace("/", "%2F")
val AwsBucketName = "<aws-bucket-name>"
val MountName = "<mount-name>"

dbutils.fs.mount(s"s3a://$AccessKey:$EncodedSecretKey@$AwsBucketName", s"/mnt/$MountName")
display(dbutils.fs.ls(s"/mnt/$MountName"))

Mount a bucket using instance profiles with the `AssumeRole` policy

You must first configure Access cross-account S3 buckets with an AssumeRole policy.

Mount buckets while setting S3 options in the extraConfigs:

Python
Scala

Python
dbutils.fs.mount("s3a://<s3-bucket-name>", "/mnt/<s3-bucket-name>",
  extra_configs = {
    "fs.s3a.credentialsType": "AssumeRole",
    "fs.s3a.stsAssumeRole.arn": "arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB",
    "fs.s3a.canned.acl": "BucketOwnerFullControl",
    "fs.s3a.acl.default": "BucketOwnerFullControl"
  }
)

Scala
dbutils.fs.mount("s3a://<s3-bucket-name>", "/mnt/<s3-bucket-name>",
  extraConfigs = Map(
    "fs.s3a.credentialsType" -> "AssumeRole",
    "fs.s3a.stsAssumeRole.arn" -> "arn:aws:iam::<bucket-owner-acct-id>:role/MyRoleB",
    "fs.s3a.canned.acl" -> "BucketOwnerFullControl",
    "fs.s3a.acl.default" -> "BucketOwnerFullControl"
  )
)

Encrypt data in S3 buckets

Databricks supports encrypting data using server-side encryption. This section covers how to use server-side encryption when writing files in S3 through DBFS. Databricks supports Amazon S3-managed encryption keys (SSE-S3) and AWS KMS–managed encryption keys (SSE-KMS).

Write files using SSE-S3

To mount your S3 bucket with SSE-S3, run the following command:

Scala
dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName", "sse-s3")

To write files to the corresponding S3 bucket with SSE-S3, run:
Scala
```
dbutils.fs.put(s"/mnt/$MountName", "<file content>")
```

Write files using SSE-KMS

Mount a source directory passing in sse-kms or sse-kms:$KmsKey as the encryption type.
- To mount your S3 bucket with SSE-KMS using the default KMS master key, run:
  Scala
```
dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName", "sse-kms")
```
- To mount your S3 bucket with SSE-KMS using a specific KMS key, run:
  Scala
```
dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName", "sse-kms:$KmsKey")
```
To write files to the S3 bucket with SSE-KMS, run:
Scala
```
dbutils.fs.put(s"/mnt/$MountName", "<file content>")
```

Mounting S3 buckets with the Databricks commit service

If you plan to write to a given table stored in S3 from multiple clusters or workloads simultaneously, Databricks recommends that you Configure Databricks S3 commit services. Your notebook code must mount the bucket and add the AssumeRole configuration. This step is necessary only for DBFS mounts, not for accessing DBFS root storage in your workspace's root S3 bucket. The following example uses Python:

Python

# If other code has already mounted the bucket without using the new role, unmount it first
dbutils.fs.unmount("/mnt/<mount-name>")

# mount the bucket and assume the new role
dbutils.fs.mount("s3a://<bucket-name>/", "/mnt/<mount-name>", extra_configs = {
    "fs.s3a.credentialsType": "AssumeRole",
    "fs.s3a.stsAssumeRole.arn": "<role-arn>"
})

Mount ADLS or Blob Storage with ABFS

You can mount data in an Azure storage account using a Microsoft Entra ID application service principal for authentication. For more information, see Access storage using a service principal & Microsoft Entra ID(Azure Active Directory).

important

All users in the Databricks workspace have access to the mounted ADLS account. The service principal you use to access the ADLS account should be granted access only to that ADLS account; it should not be granted access to other Azure resources.
When you create a mount point through a cluster, cluster users can immediately access the mount point. To use the mount point in another running cluster, you must run dbutils.fs.refreshMounts() on that running cluster to make the newly created mount point available for use.
Unmounting a mount point while jobs are running can lead to errors. Ensure that production jobs do not unmount storage as part of processing.
Mount points that use secrets are not automatically refreshed. If mounted storage relies on a secret that is rotated, expires, or is deleted, errors can occur, such as 401 Unauthorized. To resolve such an error, you must unmount and remount the storage.
Hierarchical namespace (HNS) must be enabled to successfully mount an Azure Data Lake Storage storage account using the ABFS endpoint.

Run the following in your notebook to authenticate and create a mount point.

Python
configs = {"fs.azure.account.auth.type": "OAuth",
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
          "fs.azure.account.oauth2.client.id": "<application-id>",
          "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
          "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}

# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mount_point = "/mnt/<mount-name>",
  extra_configs = configs)

Scala
val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> "<application-id>",
  "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
  "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/<directory-id>/oauth2/token")
// Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = configs)

Replace

<application-id> with the Application (client) ID for the Azure Active Directory application.
<scope-name> with the Databricks secret scope name.
<service-credential-key-name> with the name of the key containing the client secret.
<directory-id> with the Directory (tenant) ID for the Azure Active Directory application.
<container-name> with the name of a container in the ADLS storage account.
<storage-account-name> with the ADLS storage account name.
<mount-name> with the name of the intended mount point in DBFS.

How does Databricks mount cloud object storage?​

What is the syntax for mounting storage?​

Unmount a mount point​

Mount an S3 bucket​

Mount a bucket using an AWS instance profile​

Mount a bucket using AWS keys​

Mount a bucket using instance profiles with the AssumeRole policy​

Encrypt data in S3 buckets​

Write files using SSE-S3​

Write files using SSE-KMS​

Mounting S3 buckets with the Databricks commit service​

Mount ADLS or Blob Storage with ABFS​

How does Databricks mount cloud object storage?

What is the syntax for mounting storage?

Unmount a mount point

Mount an S3 bucket

Mount a bucket using an AWS instance profile

Mount a bucket using AWS keys

Mount a bucket using instance profiles with the `AssumeRole` policy

Encrypt data in S3 buckets

Write files using SSE-S3

Write files using SSE-KMS

Mounting S3 buckets with the Databricks commit service

Mount ADLS or Blob Storage with ABFS