Accessing Azure Data Lake Storage Gen2 and Blob Storage with Databricks

Use the Azure Blob Filesystem driver (ABFS) to connect to Azure Blob Storage and Azure Data Lake Storage Gen2 from Databricks. Databricks recommends securing access to Azure storage containers by using Azure service principals set in cluster configurations.

This article details how to access Azure storage containers using:

  • Unity Catalog managed external locations

  • Azure service principals

  • SAS tokens

  • Account keys

You will set Spark properties to configure these credentials for a compute environment, either:

  • Scoped to a Databricks cluster

  • Scoped to a Databricks notebook

Azure service principals can also be used to access Azure storage from Databricks SQL; see Configure access to cloud storage.

Databricks recommends using secret scopes for storing all credentials.

Deprecated patterns for storing and accessing data from Databricks

The following are deprecated storage patterns:

Direct access using ABFS URI for Blob Storage or Azure Data Lake Storage Gen2

If you have properly configured credentials to access your Azure storage container, you can interact with resources in the storage account using URIs. Databricks recommends using the abfss driver for greater security.

spark.read.load("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")

dbutils.fs.ls("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")
CREATE TABLE <database-name>.<table-name>;

COPY INTO <database-name>.<table-name>
FROM 'abfss://container@storageAccount.dfs.core.windows.net/path/to/folder'
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true');

Access Azure Data Lake Storage Gen2 or Blob Storage using OAuth 2.0 with an Azure service principal

You can securely access data in an Azure storage account using OAuth 2.0 with an Azure Active Directory (Azure AD) application service principal for authentication; see Configure access to Azure storage with an Azure Active Directory service principal.

service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

Replace

  • <scope> with the Databricks secret scope name.

  • <service-credential-key> with the name of the key containing the client secret.

  • <storage-account> with the name of the Azure storage account.

  • <application-id> with the Application (client) ID for the Azure Active Directory application.

  • <directory-id> with the Directory (tenant) ID for the Azure Active Directory application.

Access Azure Data Lake Storage Gen2 or Blob Storage using a SAS token

You can use storage shared access signatures (SAS) to access an Azure Data Lake Storage Gen2 storage account directly. With SAS, you can restrict access to a storage account using temporary tokens with fine-grained access control.

You can configure SAS tokens for multiple storage accounts in the same Spark session.

Note

SAS support is available in Databricks Runtime 7.5 and above.

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", "<token>")

Access Azure Data Lake Storage Gen2 or Blob Storage using the account key

You can use storage account access keys to manage access to Azure Storage.

spark.conf.set(
    "fs.azure.account.key.<storage-account>.dfs.core.windows.net",
    dbutils.secrets.get(scope="<scope>", key="<storage-account-access-key>"))

Replace

  • <storage-account> with the Azure Storage account name.

  • <scope> with the Databricks secret scope name.

  • <storage-account-access-key> with the name of the key containing the Azure storage account access key.

Example notebook

ADLS Gen2 OAuth 2.0 with Azure service principals notebook

Open notebook in new tab

Azure Data Lake Storage Gen2 FAQs and known issues

See Azure Data Lake Storage Gen2 frequently asked questions and known issues.