Azure SQL Data Warehouse

Azure SQL Data Warehouse is a cloud-based enterprise data warehouse that leverages massively parallel processing (MPP) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse as a key component of a big data solution. Import big data into SQL Data Warehouse with simple PolyBase T-SQL queries, and then use the power of MPP to run high-performance analytics. As you integrate and analyze, the data warehouse will become the single version of truth your business can count on for insights.

You can access Azure SQL Data Warehouse (SQL DW) from Databricks using the SQL Data Warehouse connector (referred to as the SQL DW connector), a data source implementation for Apache Spark that uses Azure Blob Storage, and PolyBase in SQL DW to transfer large volumes of data efficiently between a Databricks cluster and a SQL DW instance.

Both the Databricks cluster and the SQL DW instance access a common Blob Storage container to exchange data between these two systems. In Databricks, Spark jobs are triggered by the SQL DW connector to read data from and write data to the Blob Storage container. On the SQL DW side, data loading and unloading operations performed by PolyBase are triggered by the SQL DW connector through JDBC.

The SQL DW connector is more suited to ETL than to interactive queries, because each query execution can extract large amounts of data to Blob Storage. If you plan to perform several queries against the same SQL DW table, we recommend that you save the extracted data in a format such as Parquet.

Requirements

  • The Azure SQL Data Warehouse connector requires Databricks Runtime 4.0 or above. To verify that the data source class for the connector is present in your cluster’s class path, run the following code:

    Class.forName("com.databricks.spark.sqldw.DefaultSource")
    

    If this command fails with a ClassNotFoundException, you are not using a Databricks Runtime containing the SQL DW connector.

  • A database master key for the Azure SQL Data Warehouse.

Authentication

The connector uses several network connections, as illustrated in the following diagram:

                            ┌───────┐
       ┌───────────────────>│ BLOB  │<─────────────────┐
       │  storage acc key   └───────┘  storage acc key │
       │                        ^                      │
       │                        │ storage acc key      │
       v                        v               ┌──────v────┐
┌────────────┐            ┌───────────┐         │┌──────────┴┐
│   SQL DW   │            │   Spark   │         ││   Spark   │
│            │<──────────>│  Driver   │<────────>| Executors │
└────────────┘            └───────────┘          └───────────┘
               JDBC with               Configured
               username /                  in
               password                  Spark

There are three kinds of connections:

  • Spark driver to SQL DW
  • Spark driver and executors to Azure Blob Storage
  • SQL DW to Azure Blob Storage

The following describes each connection’s authentication configuration options.

Spark driver to SQL DW

The Spark driver connects to SQL DW via JDBC using a username and password. We recommended that you use the connection string provided by Azure portal, which enables Secure Sockets Layer (SSL) encryption for all data sent between the Spark driver and the SQL DW instance through the JDBC connection. To verify that the SSL encryption is enabled, you can search for encrypt=true in the connection string. To allow the Spark driver to reach SQL DW, we recommend that you set Allow access to Azure services to ON on the firewall pane of the SQL DW server through Azure portal. This setting allows communications from all Azure IP addresses and all Azure subnets, which allows Spark drivers to reach the SQL DW instance.

Spark driver and executors to Azure Blob Storage

The Azure Blob Storage container acts as an intermediary to store bulk data when reading from or writing to SQL DW. Spark connects to the Blob Storage container using the Azure Blob Storage connector bundled in Databricks Runtime. The URI scheme for specifying this connection must be wasbs, which makes the connection use SSL encrypted HTTPS access.

Also, the credential used for setting up this connection must be a storage account access key. There are two ways of providing the storage account access key: notebook session configuration and global Hadoop configuration.

  • Notebook session configuration (preferred) - Using this approach, the account access key is set in the session configuration associated with the notebook that runs the command. This configuration does not affect other notebooks attached to the same cluster. spark is the SparkSession object provided in the notebook.

    spark.conf.set(
      "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
      "<your-storage-account-access-key>")
    
  • Global Hadoop configuration - This approach updates the global Hadoop configuration associated with the SparkContext object shared by all of notebooks.

    Scala
    sc.hadoopConfiguration.set(
      "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
      "<your-storage-account-access-key>")
    
    Python

    Python users must use a slightly different method to modify the hadoopConfiguration, since this field is not exposed in all versions of PySpark. Although the following command relies on some Spark internals, it should work with all PySpark versions and is unlikely to break or change in the future:

    sc._jsc.hadoopConfiguration().set(
      "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
      "<your-storage-account-access-key>")
    

SQL DW to Azure Blob Storage

SQL DW also connects to the Blob Storage container during loading and unloading of temporary data. To set up the credential for the Blob Storage container in the connected SQL DW instance, you must set forwardSparkAzureStorageCredentials to true. SQL DW connector automatically discovers the account access key set in the notebook session configuration or the global Hadoop configuration and forwards the storage account access key to the connected SQL DW instance over JDBC.

The forwarded storage access key is represented by a temporary database scoped credential in the SQL DW instance. SQL DW connector creates a database scoped credential before asking SQL DW to load or unload data. Then it deletes the database scoped credential once the loading or unloading operation is done.

Streaming Support

The SQL DW connector offers efficient and scalable Structured Streaming write support for SQL DW that provides consistent user experience with batch writes, and uses PolyBase for large data transfers between a Databricks cluster and SQL DW instance. Similar to the batch writes, streaming is designed largely for ETL, thus providing higher latency that may not be suitable for real-time data processing in some cases.

The SQL DW connector supports Append and Complete output modes for record appends and aggregations. See the Structured Streaming guide for more details on output modes and compatibility matrix.

Important

This feature is available in Databricks Runtime 4.3 and above. This a beta feature. All parameters and underlying semantics for streaming may change in the final version. We do not guarantee backwards compatibility.

Usage (Batch)

You can use this connector via the data source API in Scala, Python, SQL, and R notebooks.

Scala
// Set up the Blob Storage account access key in the notebook session conf.
spark.conf.set(
  "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
  "<your-storage-account-access-key>")

// Get some data from a SQL DW table.
val df: DataFrame = spark.read
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>")
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")
  .option("forwardSparkAzureStorageCredentials", "true")
  .option("dbTable", "my_table_in_dw")
  .load()

// Load data from a SQL DW query.
val df: DataFrame = spark.read
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>")
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")
  .option("forwardSparkAzureStorageCredentials", "true")
  .option("query", "select x, count(*) as cnt from my_table_in_dw group by x")
  .load()

// Apply some transformations to the data, then use the
// Data Source API to write the data back to another table in SQL DW.

df.write
  .format("com.databricks.spark.sqldw")
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>")
  .option("forwardSparkAzureStorageCredentials", "true")
  .option("dbTable", "my_table_in_dw_copy")
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")
  .save()
Python
# Set up the Blob Storage account access key in the notebook session conf.
spark.conf.set(
  "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
  "<your-storage-account-access-key>")

# Get some data from a SQL DW table.
df = spark.read \
  .format("com.databricks.spark.sqldw") \
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", "my_table_in_dw") \
  .load()

# Load data from a SQL DW query.
df = spark.read \
  .format("com.databricks.spark.sqldw") \
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("query", "select x, count(*) as cnt from my_table_in_dw group by x") \
  .load()

# Apply some transformations to the data, then use the
# Data Source API to write the data back to another table in SQL DW.

df.write \
  .format("com.databricks.spark.sqldw") \
  .option("url", "jdbc:sqlserver://<the-rest-of-the-connection-string>") \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", "my_table_in_dw_copy") \
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
  .save()
SQL
-- Set up the Blob Storage account access key in the notebook session conf.
SET fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net=<your-storage-account-access-key>;

-- Read data using SQL.
CREATE TABLE my_table_in_spark_read
USING com.databricks.spark.sqldw
OPTIONS (
  url 'jdbc:sqlserver://<the-rest-of-the-connection-string>',
  forwardSparkAzureStorageCredentials 'true',
  dbTable 'my_table_in_dw',
  tempDir 'wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>'
);

-- Write data using SQL.
-- Create a new table, throwing an error if a table with the same name already exists:

CREATE TABLE my_table_in_spark_write
USING com.databricks.spark.sqldw
OPTIONS (
  url 'jdbc:sqlserver://<the-rest-of-the-connection-string>',
  forwardSparkAzureStorageCredentials 'true',
  dbTable 'my_table_in_dw_copy',
  tempDir 'wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>'
)
AS SELECT * FROM table_to_save_in_spark;
R
# Load SparkR
library(SparkR)

# Set up the Blob Storage account access key in the notebook session conf.
conf <- sparkR.callJMethod(sparkR.session(), "conf")
sparkR.callJMethod(conf, "set", "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net", "<your-storage-account-access-key>")

# Get some data from a SQL DW table.
df <- read.df(
   source = "com.databricks.spark.sqldw",
   url = "jdbc:sqlserver://<the-rest-of-the-connection-string>",
   forwardSparkAzureStorageCredentials = "true",
   dbTable = "my_table_in_dw",
   tempDir = "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")

# Load data from a SQL DW query.
df <- read.df(
   source = "com.databricks.spark.sqldw",
   url = "jdbc:sqlserver://<the-rest-of-the-connection-string>",
   forwardSparkAzureStorageCredentials = "true",
   query = "select x, count(*) as cnt from my_table_in_dw group by x",
   tempDir = "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")

# Apply some transformations to the data, then use the
# Data Source API to write the data back to another table in SQL DW.

write.df(
  df,
  source = "com.databricks.spark.sqldw",
  url = "jdbc:sqlserver://<the-rest-of-the-connection-string>",
  forwardSparkAzureStorageCredentials = "true",
  dbTable = "my_table_in_dw_copy",
  tempDir = "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")

Usage (Streaming)

You can write data using Structured Streaming in Scala and Python notebooks.

Scala
// Set up the Blob Storage account access key in the notebook session conf.
spark.conf.set(
  "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
  "<your-storage-account-access-key>")

// Prepare streaming source; this could be Kafka, Kinesis, or a simple rate stream.
val df: DataFrame = spark.readStream
  .format("rate")
  .option("rowsPerSecond", "100000")
  .option("numPartitions", "16")
  .load()

// Apply some transformations to the data then use
// Structured Streaming API to continuously write the data to a table in SQL DW.

df.writeStream
  .format("com.databricks.spark.sqldw")
  .option("url", <azure-sqldw-jdbc-url>)
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>")
  .option("forwardSparkAzureStorageCredentials", "true")
  .option("dbTable", <table-name>)
  .option("checkpointLocation", "/tmp_checkpoint_location")
  .start()
Python
# Set up the Blob Storage account access key in the notebook session conf.
spark.conf.set(
  "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
  "<your-storage-account-access-key>")

# Prepare streaming source; this could be Kafka, Kinesis, or a simple rate stream.
df = spark.readStream \
  .format("rate") \
  .option("rowsPerSecond", "100000") \
  .option("numPartitions", "16") \
  .load()

# Apply some transformations to the data then use
# Structured Streaming API to continuously write the data to a table in SQL DW.

df.writeStream \
  .format("com.databricks.spark.sqldw") \
  .option("url", <azure-sqldw-jdbc-url>) \
  .option("tempDir", "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/<your-directory-name>") \
  .option("forwardSparkAzureStorageCredentials", "true") \
  .option("dbTable", <table-name>) \
  .option("checkpointLocation", "/tmp_checkpoint_location") \
  .start()

Configuration

Required SQL DW permissions

The SQL DW connector requires that the JDBC connection user has permission to run the following commands in the connected SQL DW instance:

As a prerequisite for the first command, the connector expects that a database master key already exists for the specified DW instance. If not, a key can be created using the CREATE MASTER KEY command.

Additionally, to read the SQL DW table set through dbTable or tables referred in query, the JDBC user must have permission to access needed SQL DW tables. To write data back to a SQL DW table set through dbTable, the JDBC user must have permission to write to this SQL DW table.

Parameters

The parameter map or OPTIONS provided in Spark SQL support the following settings:

Parameter Required Default Notes
dbTable Yes, unless query is specified No default

The table to create or read from in SQL DW. This parameter is required when saving data back to SQL DW.

You can also use {SCHEMA NAME}.{TABLE NAME} to access a table in a given schema. If schema name is not provided, the default schema associated with the JDBC user is used.

The previously supported dbtable variant is deprecated and will be ignored in future releases. Consider using the “camel case” name instead. The configuration name change is available in Databricks Runtime 4.3 and above.

query Yes, unless dbTable is specified No default

The query to read from in SQL DW.

For tables referred in the query, you can also use {SCHEMA NAME}.{TABLE NAME} to access a table in a given schema. If schema name is not provided, the default schema associated with the JDBC user is used.

user No No default The SQL DW username. Must be used in tandem with password option. Can only be used if the user and password are not passed in the URL. Passing both will result in an error.
password No No default The SQL DW password. Must be used in tandem with user option. Can only be used if the user and password are not passed in the URL. Passing both will result in an error.
url Yes No default A JDBC URL with sqlserver set as the subprotocol. It is recommended to use the connection string provided by Azure portal. Setting encrypt=true is strongly recommended, because it enables SSL encryption of the JDBC connection. If user and password are set separately, you do not need to include them in the URL.
jdbcDriver No Determined by the JDBC URL’s subprotocol

The class name of the JDBC driver to use. This class must be on the classpath. In most cases, it should not be necessary to specify this option, as the appropriate driver classname should automatically be determined by the JDBC URL’s subprotocol.

The previously supported jdbc_driver variant is deprecated and will be ignored in future releases. Consider using the “camel case” name instead. The configuration name change is available in Databricks Runtime 4.3 and above.

tempDir Yes No default

A wasbs URI. We recommend you use a dedicated Blob Storage container for the SQL DW.

The previously supported tempdir variant is deprecated and will be ignored in future releases. Consider using the “camel case” name instead. The configuration name change is available in Databricks Runtime 4.3 and above.

tempFormat No PARQUET The format in which to save temporary files to the blob store when writing to SQL DW. Defaults to PARQUET; no other values are allowed right now.
tempCompression No SNAPPY The compression algorithm to be used to encode/decode temporary by both Spark and SQL DW. Currently supported values are: UNCOMPRESSED, SNAPPY and GZIP.
forwardSparkAzureStorageCredentials No false

If true, the library automatically discovers the credentials that Spark is using to connect to the Blob Storage container and forwards those credentials to SQL DW over JDBC. These credentials are sent as part of the JDBC query. Therefore it is strongly recommended that you enable SSL encryption of the JDBC connection when you use this option.

The current version of SQL DW connector requires forwardSparkAzureStorageCredentials to be explicitly set to true.

The previously supported forward_spark_azure_storage_credentials variant is deprecated and will be ignored in future releases. Consider using the “camel case” name instead. The configuration name change is available in Databricks Runtime 4.3 and above.

tableOptions No CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = ROUND_ROBIN

A string used to specify table options when creating the SQL DW table set through dbTable. This string is passed literally to the WITH clause of the CREATE TABLE SQL statement that is issued against SQL DW.

The previously supported table_options variant is deprecated and will be ignored in future releases. Consider using the “camel case” name instead. The configuration name change is available in Databricks Runtime 4.3 and above.

preActions No No default (empty string)

A ; separated list of SQL commands to be executed in SQL DW before writing data to the DW instance. These SQL commands are required to be valid commands accepted by SQL DW.

If any of these commands fail, it is treated as an error and the write operation is not executed.

postActions No No default (empty string)

A ; separated list of SQL commands to be executed in SQL DW after the connector successfully writes data to the SQL DW instance. These SQL commands are required to be valid commands accepted by SQL DW.

If any of these commands fail, it is treated as an error and you’ll get an exception after the data is successfully written to the SQL DW instance.

maxStrLength No 256

StringType in Spark is mapped to the NVARCHAR(maxStrLength) type in SQL DW. You can use maxStrLength to set the string length for all NVARCHAR(maxStrLength) type columns that are in the table with name dbTable in SQL DW.

The previously supported maxstrlength variant is deprecated and will be ignored in future releases. Consider using the “camel case” name instead. The configuration name change is available in Databricks Runtime 4.3 and above.
checkpointLocation Yes No default Location on DBFS that will be used by Structured Streaming to write metadata and checkpoint information. See Recovering from Failures with Checkpointing in Structured Streaming programming guide.
numStreamingTempDirsToKeep No 0 Indicates how many (latest) temporary directories to keep for periodic cleanup of micro batches in streaming. When set to 0, directory deletion is triggered immediately after micro batch is committed, otherwise provided number of latest micro batches is kept and the rest of directories is removed. Use -1 to disable periodic cleanup.

Note

  • tableOptions, preActions, postActions, and maxStrLength are relevant only when writing data from Databricks to a new table in SQL DW.
  • checkpointLocation and numStreamingTempDirsToKeep are relevant only for streaming writes from Databricks to a new table in SQL DW.
  • Even though all data source option names are case-insensitive, we recommend that you specify them in “camel case” for clarity.

Additional configuration options

Query pushdown into SQL DW

The SQL DW connector implements a set of optimization rules to push the following operators down into SQL DW:

  • Filter
  • Project
  • Limit

The Project and Filter operators support the following expressions:

  • Most boolean logic operators
  • Comparisons
  • Basic arithmetic operations
  • Numeric and string casts

For the Limit operator, pushdown is supported only when there is no ordering specified. For example:

SELECT TOP(10) * FROM table, but not SELECT TOP(10) * FROM table ORDER BY col.

Note

The SQL DW connector does not push down expressions operating on strings, dates, or timestamps.

Query pushdown built with the SQL DW connector is enabled by default. You can disable it by setting spark.databricks.sqldw.pushdown to false.

Temporary data management

The SQL DW connector does not delete the temporary files that it creates in the Blob Storage container. Therefore we recommend that you periodically delete temporary files under the user-supplied tempDir location.

To facilitate data cleanup, the SQL DW connector does not store data files directly under tempDir, but instead creates a subdirectory of the form: <tempDir>/<yyyy-MM-dd>/<HH-mm-ss-SSS>/<randomUUID>/. You can set up periodic jobs (using the Databricks jobs feature or otherwise) to recursively delete any subdirectories that are older than a given threshold (for example, 2 days), with the assumption that there cannot be Spark jobs running longer than that threshold.

A simpler alternative is to periodically drop the whole container and create a new one with the same name. This requires that you use a dedicated container for the temporary data produced by the SQL DW connector and that you can find a time window in which you can guarantee that no queries involving the connector are running.

Temporary object management

The SQL DW connector automates data transfer between a Databricks cluster and a SQL DW instance. For reading data from a SQL DW table or query or writing data to a SQL DW table, the SQL DW connector creates temporary objects, including DATABASE SCOPED CREDENTIAL, EXTERNAL DATA SOURCE, EXTERNAL FILE FORMAT, and EXTERNAL TABLE behind the scenes. These objects live only throughout the duration of the corresponding Spark job and should automatically be dropped thereafter.

When a cluster is running a query using the SQL DW connector, if the Spark driver process crashes or is forcefully restarted, or if the cluster is forcefully terminated or restarted, temporary objects might not be dropped. To facilitate identification and manual deletion of these objects, SQL DW connector prefixes the names of all intermediate temporary objects created in the SQL DW instance with a tag of the form: tmp_<yyyy_MM_dd_HH_mm_ss_SSS>_<randomUUID>_.

We recommend that you periodically look for leaked objects using queries such as the following:

  • SELECT * FROM sys.database_scoped_credentials WHERE name LIKE 'tmp_databricks_%'
  • SELECT * FROM sys.external_data_sources WHERE name LIKE 'tmp_databricks_%'
  • SELECT * FROM sys.external_file_formats WHERE name LIKE 'tmp_databricks_%'
  • SELECT * FROM sys.external_tables WHERE name LIKE 'tmp_databricks_%'

Frequently asked questions (FAQ)

I hit an error while using the SQL DW connector. How can I tell if this error is from SQL DW or Databricks?

To help users debug errors, any exception thrown by code that is specific to the SQL DW connector is wrapped in an exception extending the SqlDWException trait. Exceptions also make the following distinction:

  • SqlDWConnectorException represents an error thrown by the SQL DW connector
  • SqlDWSideException represents an error thrown by the connected SQL DW instance
What should I do if my query failed with the error “No access key found in the session conf or the global Hadoop conf”?
This error means that SQL DW connector could not find the storage account access key in the notebook session configuration or global Hadoop configuration for the storage account specified in tempDir. See Usage (Batch) for examples of how to set the storage access key properly. It is worth noting that if a Spark table is created using SQL DW connector, you must still provide the storage account access key in order to read or write to the Spark table.
Can I use a Shared Access Signature (SAS) to access the Blob Storage container specified by tempDir?
SQL DW does not support using SAS to access Blob Storage. Therefore the SQL DW connector does not support SAS to access the Blob Storage container specified by tempDir.
I created a Spark table using SQL DW connector with the dbTable option, wrote some data to this Spark table, and then dropped this Spark table. Will the table created at the SQL DW side be dropped?
No. SQL DW is considered an external data source. The SQL DW table with the name set through dbTable is not dropped when the Spark table is dropped.
When writing a DataFrame to SQL DW, why do I need to say .option("dbTable", tableName).save() instead of just .saveAsTable(tableName)?

That is because we want to make the following distinction clear: .option("dbTable", tableName) refers to the database (that is, SQL DW) table, whereas .saveAsTable(tableName) refers to the Spark table. In fact, you could even combine the two: df.write. ... .option("dbTable", tableNameDW).saveAsTable(tableNameSpark) which creates a table in DW called tableNameDW and an external table in Spark called tableNameSpark that is backed by the DW table.

Warning

Beware of the following difference between .save() and .saveAsTable():

  • For df.write. ... .option("dbTable", tableNameDW).mode(writeMode).save(), writeMode acts on the SQL DW table, as expected.
  • For df.write. ... .option("dbTable", tableNameDW).mode(writeMode).saveAsTable(tableNameSpark), writeMode acts on the Spark table, whereas tableNameDW is silently overwritten if it already exists in SQL DW.

This behavior is no different from writing to any other data source. It is just a caveat of Spark’s DataFrameWriter API.