Connect to Azure Data Lake Storage Gen2 and Blob Storage

This article explains how to connect to Azure Data Lake Storage Gen2 and Blob Storage from Databricks. For Azure Data Lake Storage Gen2 FAQs and known issues, see Azure Data Lake Storage Gen2 FAQ.

Note

Connect to Azure Data Lake Storage Gen2 or Blob Storage using Azure credentials

The following credentials can be used to access Azure Data Lake Storage Gen2 or Blob Storage:

  • OAuth 2.0 with an Azure service principal: Databricks recommends using Azure service principals to connect to Azure storage. To create an Azure service principal and provide it access to Azure storage accounts, see Access storage with Azure Active Directory.

    To create an Azure service principal, you must have the Application Administrator role or the Application.ReadWrite.All permission in Azure Active Directory. To assign roles on a storage account you must be an Owner or a user with the User Access Administrator Azure RBAC role on the storage account.

  • Shared access signatures (SAS): You can use storage SAS tokens to access Azure storage. With SAS, you can restrict access to a storage account using temporary tokens with fine-grained access control.

    You can only grant a SAS token permissions that you have on the storage account, container, or file yourself.

  • Account keys: You can use storage account access keys to manage access to Azure Storage. Storage account access keys provide full access to the configuration of a storage account, as well as the data. Databricks recommends using an Azure service principal or a SAS token to connect to Azure storage instead of account keys.

    To view an account’s access keys, you must have the Owner, Contributor, or Storage Account Key Operator Service role on the storage account.

Databricks recommends using secret scopes for storing all credentials. You can grant users, service principals, and groups in your workspace access to read the secret scope. This protects the Azure credentials while allowing users to access Azure storage. To create a secret scope, see Secret scopes.

Set Spark properties to configure Azure credentials to access Azure storage

You can set Spark properties to configure a Azure credentials to access Azure storage. The credentials can be scoped to either a cluster or a notebook. Use both cluster access control and notebook access control together to protect access to Azure storage. See Cluster access control and Workspace object access control.

To set Spark properties, use the following snippet in a cluster’s Spark configuration or a notebook:

Use the following format to set the cluster Spark configuration:

fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net OAuth
fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net <application-id>
fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net {{secrets/<secret-scope>/<service-credential-key>}}
fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net https://login.microsoftonline.com/<directory-id>/oauth2/token

You can use spark.conf.set in notebooks, as shown in the following example:

service_credential = dbutils.secrets.get(scope="<secret-scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

Replace

  • <secret-scope> with the Databricks secret scope name.

  • <service-credential-key> with the name of the key containing the client secret.

  • <storage-account> with the name of the Azure storage account.

  • <application-id> with the Application (client) ID for the Azure Active Directory application.

  • <directory-id> with the Directory (tenant) ID for the Azure Active Directory application.

You can configure SAS tokens for multiple storage accounts in the same Spark session.

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", dbutils.secrets.get(scope="<scope>", key="<sas-token-key>"))

Replace

  • <storage-account> with the Azure Storage account name.

  • <scope> with the Databricks secret scope name.

  • <sas-token-key> with the name of the key containing the Azure storage SAS token.

spark.conf.set(
    "fs.azure.account.key.<storage-account>.dfs.core.windows.net",
    dbutils.secrets.get(scope="<scope>", key="<storage-account-access-key>"))

Replace

  • <storage-account> with the Azure Storage account name.

  • <scope> with the Databricks secret scope name.

  • <storage-account-access-key> with the name of the key containing the Azure storage account access key.

Access Azure storage

Once you have properly configured credentials to access your Azure storage container, you can interact with resources in the storage account using URIs. Databricks recommends using the abfss driver for greater security.

spark.read.load("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")

dbutils.fs.ls("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")
CREATE TABLE <database-name>.<table-name>;

COPY INTO <database-name>.<table-name>
FROM 'abfss://container@storageAccount.dfs.core.windows.net/path/to/folder'
FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true');

Example notebook

ADLS Gen2 OAuth 2.0 with Azure service principals notebook

Open notebook in new tab

Deprecated patterns for storing and accessing data from Databricks

The following are deprecated storage patterns: