Get started with Azure Data Lake Storage Gen2

You can easily authenticate and access Azure Data Lake Storage Gen2 (ADLS Gen2) storage accounts using an Azure storage account access key.

Using an access key is less secure than using a service principal but can be convenient for non-production scenarios such as developing or testing notebooks.

Although you can use an access key directly from your Databricks workspace, storing the key in a secret scope provides an additional security layer. Secret scopes provide secure storage and management of secrets and allow you to use the access key for authentication without including it directly in your Databricks workspace.

This article explains how to obtain an Azure storage account access key, save that access key in a Databricks-backed secret scope, and use it to access an ADLS Gen2 storage account from a Databricks notebook. The following is an overview of the tasks to configure an access key and use it to access an ADLS Gen2 storage account:

  1. Obtain an access key from the Azure storage account.
  2. Create a secret scope in your Databricks workspace.
  3. Add the Azure access key to the secret scope.
  4. Use the access key from the secret scope to authenticate to the storage account.

Requirements

Get an Azure ADLS access key

You obtain an access key for the ADLS Gen2 storage account using the Azure portal:

  1. Go to your ADLS Gen2 storage account in the Azure portal.

  2. Under Settings, select Access keys.

  3. Copy the value for one of the available access keys.

    Get access key

Add the storage account access key to a secret scope

Use the Databricks CLI to Create a Databricks-backed secret scope and create a new secret for the storage account access key:

databricks secrets create-scope --scope <scope-name>
databricks secrets put --scope <scope-name> --key <key-name>

Replace

  • <scope-name> with a name for the new scope.
  • <key-name> with a name for the secret value.

Running the put command will open an editor. Enter the value for the storage account access key above the line and save and exit the editor.

Authenticate with the access key

The way you set credentials for authentication depends on whether you plan to use the DataFrame or Dataset API, or the RDD API.

DataFrame or DataSet API

If you are using Spark DataFrame or Dataset APIs, Databricks recommends that you set your account credentials in your notebook’s session configs:

spark.conf.set(
    "fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
    dbutils.secrets.get(scope="<scope-name>",key="<storage-account-access-key-name>"))

Replace

  • <storage-account-name> with the ADLS Gen2 storage account name.
  • <scope-name> with the Databricks secret scope name.
  • <storage-account-access-key-name> with the name of the key containing the Azure storage account access key.

RDD API

If you’re using the RDD API to access ADLS Gen2, you cannot access Hadoop configuration options set using spark.conf.set(). You must set the credentials using one of the following methods:

  • Specify the Hadoop configuration options as Spark configs when you create the cluster. You must add the spark.hadoop. prefix to the corresponding Hadoop configuration keys to propagate them to the Hadoop configurations for your RDD jobs:

    spark.hadoop.fs.azure.account.key.<storage-account-name>.dfs.core.windows.net <storage-account-access-key>
    

    Replace

    • <storage-account-name> with the ADLS Gen2 storage account name.
    • <storage-account-access-key> with the access key you retrieved in Get an Azure ADLS access key

    Warning

    These credentials are available to all users who access the cluster.

  • Scala users can set the credentials in spark.sparkContext.hadoopConfiguration:

    spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", "<storage-account-access-key>")
        dbutils.secrets.get(scope="<scope-name>",
        key="<storage-account-access-key-name>")
    )
    

Replace

  • <storage-account-name> with the ADLS Gen2 storage account name.
  • <scope-name> with the Databricks secret scope name.
  • <storage-account-access-key-name> with the name of the key containing the Azure storage account access key.

Create a container

Like directories in a filesystem, containers provide a way to organize objects in an Azure storage account. You must create one or more containers before you can access an ADLS Gen2 storage account.

You can create a container directly from a Databricks notebook by running the following commands. Remove the first statement if you’ve already followed the instructions in Authenticate with the access key.

spark.conf.set(
   "fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
   dbutils.secrets.get(scope="<scope-name>",
   key="<storage-account-access-key-name>"))
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
dbutils.fs.ls("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")

Replace

  • <storage-account-name> with the ADLS Gen2 storage account name.
  • <scope-name> with the Databricks secret scope name.
  • <storage-account-access-key-name> with the name of the key containing the Azure storage account access key.
  • <container-name> with the name for the new container.

You can also a create container through the Azure command-line interface, the Azure API, or the Azure portal. To create a container in the portal:

  1. In the Azure portal, go to Storage accounts.

  2. Select your ADLS Gen2 account and click Containers.

  3. Click + Container.

  4. Enter a name for your container and click Create.

    Create a container

Access ADLS Gen2 storage

After authenticating to the ADLS Gen2 storage account, you can use standard Spark and Databricks APIs to read from the account:

val df = spark.read.parquet("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>")
dbutils.fs.ls("abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/<directory-name>")

Example notebook

This notebook demonstrates using a storage account access key to:

  1. Authenticate to an ADLS Gen2 storage account.
  2. Create a new container in the storage account.
  3. Write a JSON file containing internet of things (IoT) data to the new container.
  4. List files in the container.
  5. Read and display the IoT file from the container.

Getting started with ADLS Gen2 notebook

Open notebook in new tab