Azure Data Lake Store Example

0 - Setup

To get set up, do these tasks first:

Get service credentials: Client ID <aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee> and Client Credential <NzQzY2QzYTAtM2I3Zi00NzFmLWI3MGMtMzc4MzRjZmk=>. Follow the instructions in Create service principal with portal.
Get directory ID <ffffffff-gggg-hhhh-iiii-jjjjjjjjjjjj>: This is also referred to as tenant ID. Follow the instructions in Get tenant ID.
If you haven't set up the service app, follow this tutorial. Set access at the root directory or desired folder level to the service or everyone.

val configs = Map(
  "dfs.adls.oauth2.access.token.provider.type" -> "ClientCredential",
  "dfs.adls.oauth2.client.id" -> "<aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee>",
  "dfs.adls.oauth2.credential" -> "<NzQzY2QzYTAtM2I3Zi00NzFmLWI3MGMtMzc4MzRjZmk=>",
  "dfs.adls.oauth2.refresh.url" -> "https://login.microsoftonline.com/<ffffffff-gggg-hhhh-iiii-jjjjjjjjjjjj>/oauth2/token")

dbutils.fs.mount(
  source = "adl://kpadls.azuredatalakestore.net/",
  mountPoint = "/mnt/kp-adls",
  extraConfigs = configs)

%fs ls /mnt/kp-adls-testing

spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "<aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee>")
spark.conf.set("dfs.adls.oauth2.credential", "<NzQzY2QzYTAtM2I3Zi00NzFmLWI3MGMtMzc4MzRjZmk=>")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/<ffffffff-gggg-hhhh-iiii-jjjjjjjjjjjj>/oauth2/token")

%fs ls adl://kpadls.azuredatalakestore.net/testing/

spark.read.parquet("dbfs:/mnt/my-datasets/datasets/iot/events").write.mode("overwrite").parquet("adl://kpadls.azuredatalakestore.net/testing/tmp/kp/v1")

%fs ls adl://kpadls.azuredatalakestore.net/testing/tmp/kp/v1

0 - Setup

1 - DBFS mount points

2 - Spark Configs