Skip to main content

Serverless write options for bundled connectors

note

Writes to external data sources using bundled Spark connectors on serverless compute are in Public Preview.

When writing to an external data source using a bundled connector on serverless compute, only a subset of connector options are supported. The following tables list the supported options per connector.

For setup instructions and examples, see Spark data sources.

PostgreSQL

The following options are supported when writing to PostgreSQL on serverless compute.

Option

Description

host

Hostname of the PostgreSQL server.

port

Port number. Default: 5432.

database

Name of the database to connect to.

connectTimeout

Maximum time in seconds to wait for a connection. 0 disables the timeout.

user

Database username.

password

Database password.

dbtable

Target table name. Supports schema-qualified names (for example, myschema.mytable).

batchsize

Number of rows to insert per batch. Default: 1000.

numPartitions

Number of Spark partitions for parallel write operations.

queryTimeout

Maximum time in seconds to wait for a query to complete. 0 disables the timeout.

isolationLevel

Transaction isolation level: NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE. Default: READ_UNCOMMITTED.

truncate

If true, truncates the target table in overwrite mode instead of dropping and recreating it. Default: false.

cascadeTruncate

If true, cascades the truncation to tables with foreign key references to the target table. Default: false.

SQL Server

The following options are supported when writing to SQL Server on serverless compute.

Option

Description

host

Hostname of the SQL Server instance.

port

Port number. Default: 1433.

database

Name of the database to connect to.

connectionTimeout

Maximum time in seconds to wait for a connection. 0 disables the timeout.

encrypt

If true, encrypts all data sent between the client and server using TLS. Default: false.

trustServerCertificate

If true, trusts the server's TLS certificate without validation. For development environments only. Default: false.

debug

If true, enables verbose debug logging for the connector. Default: false.

user

Database username.

password

Database password.

authentication

Authentication type. Supported values: SqlPassword, ActiveDirectoryPassword, ActiveDirectoryMSI.

dbtable

Target table name. Supports schema-qualified names (for example, myschema.mytable).

batchsize

Number of rows to insert per batch. Default: 1000.

numPartitions

Number of Spark partitions for parallel write operations.

queryTimeout

Maximum time in seconds to wait for a query to complete. 0 disables the timeout.

isolationLevel

Transaction isolation level: NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE. Default: READ_UNCOMMITTED.

truncate

If true, truncates the target table in overwrite mode instead of dropping and recreating it. Default: false.

MySQL

The following options are supported when writing to MySQL on serverless compute.

Option

Description

host

Hostname of the MySQL server.

port

Port number. Default: 3306.

database

Name of the database to connect to.

connectionTimeout

Maximum time in seconds to wait for a connection. 0 disables the timeout.

requireSSL

If true, requires an SSL-encrypted connection to the server. Default: false.

useSSL

If true, enables SSL for the connection when supported by the server. Default: false.

user

Database username.

password

Database password.

dbtable

Target table name. Supports schema-qualified names (for example, myschema.mytable).

batchsize

Number of rows to insert per batch. Default: 1000.

numPartitions

Number of Spark partitions for parallel write operations.

queryTimeout

Maximum time in seconds to wait for a query to complete. 0 disables the timeout.

isolationLevel

Transaction isolation level: NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE. Default: READ_UNCOMMITTED.

truncate

If true, truncates the target table in overwrite mode instead of dropping and recreating it. Default: false.

cascadeTruncate

If true, cascades the truncation to tables with foreign key references to the target table. Default: false.

Snowflake

The following sections list the supported options for the Snowflake connector, organized by function.

Connection

The following options configure the connection to Snowflake and control session behavior.

Option

Description

host

Snowflake account hostname (for example, <account>.snowflakecomputing.com).

port

Port number. Default: 443.

sfaccount

Snowflake account identifier.

sfauthenticator

Authentication method: snowflake (password), oauth (token), or snowflake_jwt (key pair). Default: snowflake.

networktimeout

Timeout in seconds for network operations.

sftimezone

Timezone for timestamp operations (for example, America/New_York).

client_session_keep_alive

If true, sends keepalive signals to prevent session timeout during long-running operations. Default: false.

ocspfailopen

If true, allows connections to proceed when OCSP certificate validation is unavailable (fail-open mode). Default: true.

Authentication

The following options supply credentials for the authentication method configured in sfauthenticator. Staging credentials (temporary_aws_*, awsaccesskey, temporary_azure_sas_token) are required when Snowflake stages write data through cloud storage.

Option

Description

sfuser

Snowflake username.

sfpassword

Snowflake password. Used when sfauthenticator is snowflake.

sfToken

OAuth access token. Used when sfauthenticator is oauth.

pem_private_key

Private key in PEM format for key-pair authentication. Used when sfauthenticator is snowflake_jwt.

temporary_aws_access_key_id

Temporary AWS access key ID for S3 staging. Preferred over awsaccesskey when using short-lived credentials.

temporary_aws_secret_access_key

Temporary AWS secret access key for S3 staging.

temporary_aws_session_token

Temporary AWS session token for S3 staging.

temporary_azure_sas_token

Temporary Azure SAS token for Azure Blob Storage staging.

awsaccesskey

AWS access key for S3 staging.

awssecretkey

AWS secret key for S3 staging.

Target

The following options specify the Snowflake database, schema, warehouse, and table to write to.

Option

Description

sfdatabase

Snowflake database name.

sfschema

Snowflake schema name.

sfwarehouse

Snowflake virtual warehouse used for query execution.

sfrole

Snowflake role for the session.

dbtable

Target table name.

Write behavior

The following options control how data is written to the target Snowflake table.

Option

Description

column_mapping

How DataFrame columns are matched to Snowflake table columns: name (by column name) or position (by column order). Default: name.

column_mismatch_behavior

Behavior when DataFrame and table columns don't align: error or ignore. Default: error.

truncate_table

If true, truncates the target table before writing. Default: false.

usestagingtable

If true, stages data in a temporary table before swapping into the target, enabling atomic writes. Default: true.

internal_execute_query_in_sync_mode

If true, executes Snowflake queries synchronously. Default: false.

autopushdown

If true, pushes filter and aggregation operations down to Snowflake for execution. Default: true.

Redshift

The following sections list the supported options for the Redshift connector, organized by function.

Connection

The following options configure the connection to the Redshift cluster.

Option

Description

host

Redshift cluster endpoint hostname.

port

Port number. Default: 5439.

database

Redshift database name.

connectionTimeout

Maximum time in seconds to wait for a connection.

Authentication

The following options configure credentials for Redshift and for the S3 staging location that Redshift uses during write operations.

Option

Description

user

Redshift username.

password

Redshift password.

aws_iam_role

ARN of the IAM role that Redshift uses to access S3 for staging data.

temporary_aws_access_key_id

Temporary AWS access key ID for S3 staging. Preferred over long-lived credentials.

temporary_aws_secret_access_key

Temporary AWS secret access key for S3 staging.

temporary_aws_session_token

Temporary AWS session token for S3 staging.

forward_spark_s3_credentials

If true, forwards Spark's S3 credentials to Redshift for staging. Use only when Spark and Redshift share the same S3 credentials. Default: false.

Write behavior

The following options control how data is written to the target Redshift table, including distribution, sort keys, and staging format.

Option

Description

dbtable

Target table name. Supports schema-qualified names (for example, myschema.mytable).

batchsize

Number of rows per batch insert. Default: 1000.

numPartitions

Number of Spark partitions for parallel write operations.

queryTimeout

Maximum time in seconds to wait for a query to complete.

isolationLevel

Transaction isolation level: NONE, READ_COMMITTED, READ_UNCOMMITTED, REPEATABLE_READ, or SERIALIZABLE. Default: READ_UNCOMMITTED.

diststyle

Redshift distribution style: EVEN, KEY, or ALL.

distkey

Column to use as the distribution key. Required when diststyle is KEY.

sortkeyspec

Sort key specification for the Redshift table (for example, SORTKEY(col1, col2)).

csvnullstring

String written in staging CSV files to represent NULL values. Default: empty string.

tempformat

Staging file format: CSV or AVRO. Default: CSV.

truncate

If true, truncates the target table in overwrite mode instead of dropping and recreating it. Default: false.

Write to PostgreSQL on serverless compute

This example uses append mode and retrieves credentials from a Databricks secret scope.

Python
df.write \
.format("postgresql") \
.option("host", dbutils.secrets.get(scope="<scope>", key="<host>")) \
.option("port", "<port>") \
.option("database", "<database-name>") \
.option("dbtable", "<table-name>") \
.option("user", dbutils.secrets.get(scope="<scope>", key="<user>")) \
.option("password", dbutils.secrets.get(scope="<scope>", key="<password>")) \
.mode("append") \
.save()

Next steps

  • Spark data sources: Setup instructions, code examples, and a comparison of Spark integration strategies.
  • JDBC connection: Use a Unity Catalog connection with a JDBC driver for options not supported by bundled connectors on serverless, or for data sources without a bundled connector.
  • Spark API options reference: Reference for DataFrameReader, DataFrameWriter, and streaming options for file formats and streaming sources.