Skip to main content

Driver capability settings for the Databricks ODBC Driver

This page describes how to configure special and advanced driver capability settings for the Databricks ODBC Driver.

The Databricks ODBC Driver provides the following special and advanced driver capability settings.

Set the initial schema in ODBC

The ODBC driver allows you to specify the schema by setting Schema=<schema-name> as a connection configuration. This is equivalent to running USE <schema-name>.

Query tags for tracking

Preview

This feature is in Private Preview. To request access, contact your account team.

Attach key-value tags to your SQL queries for tracking and analytics purposes. Query tags appear in the system.query.history table for query identification and analysis.

To add query tags to your connection, include the ssp_query_tags parameter in your ODBC connection configuration:

Define query tags as comma-separated key-value pairs, where each key and value is separated by a colon. For example, ssp_query_tags=team:engineering,env:prod.

ANSI SQL-92 query support in ODBC

Legacy Spark ODBC drivers accept SQL queries in ANSI SQL-92 dialect and translate them to Databricks SQL before sending them to the server.

If your application generates Databricks SQL directly or uses non-ANSI SQL-92 syntax specific to Databricks, set UseNativeQuery=1 in your connection configuration. This setting passes SQL queries verbatim to Databricks without translation.

Extract large query results in ODBC

To achieve the best performance when you extract large query results, use the latest version of the ODBC driver, which includes the following optimizations.

Arrow serialization in ODBC

ODBC driver version 2.6.15 and above supports an optimized query results serialization format that uses Apache Arrow.

Cloud Fetch in ODBC

ODBC driver version 2.6.17 and above supports Cloud Fetch, a capability that fetches query results through the cloud storage configured in your Databricks deployment.

When you run a query, Databricks stores the results in your workspace's cloud storage as Arrow-serialized files of up to 20 MB. After the query completes, the driver sends fetch requests, and Databricks returns presigned URLs to the result files. The driver then uses these URLs to download results directly from Amazon S3.

Cloud Fetch only applies to query results larger than 1 MB. The driver retrieves smaller results directly from Databricks.

Databricks automatically garbage collects accumulated files by marking them for deletion after 24 hours and permanently removing them 24 hours later.

Cloud Fetch requires an E2 workspace and an Amazon S3 bucket without versioning enabled. If you have versioning enabled, see Advanced configurations to enable Cloud Fetch.

Network prerequisites

Cloud Fetch downloads result files directly from Amazon S3. If the client machine can't reach the S3 endpoint, Cloud Fetch fails. Verify the following:

  • The client machine must have network access to the workspace's root S3 bucket.
  • If you use VPC endpoints, verify that the S3 endpoint is accessible from the client machine.
  • Proxy and firewall rules must allow HTTPS traffic to *.s3.amazonaws.com or the specific S3 bucket endpoint for your workspace.

Diagnose slow downloads

Set LogLevel to 4 (INFO) and LogPath to the full path of a log folder to see Cloud Fetch download speed metrics. The driver logs download speed per chunk, so large result sets generate multiple log lines. The driver also logs a warning when speed falls below approximately 1 MB/s. This feature is available in ODBC driver versions released after November 2025.

If downloads are slow or stalled, presigned URLs can expire before the driver finishes downloading all result files. Check for bandwidth throttling or network congestion between the client and Amazon S3.

S3 bucket versioning considerations

Cloud Fetch writes temporary result sets to your workspace's cloud storage. If you enable S3 bucket versioning, Databricks cannot garbage collect older versions of these files after the 24-hour retention period. This can lead to exponential storage growth as non-current file versions accumulate.

Databricks recommends configuring a one-day S3 lifecycle policy to automatically purge non-current versions.

To set a lifecycle policy:

  1. In the AWS console, go to the S3 service.
  2. Click on the S3 bucket that you use for your workspace's root storage.
  3. Open the Management tab and click Create lifecycle rule.
  4. Enter a name for the Lifecycle rule name.
  5. Keep the prefix field empty.
  6. Under Lifecycle rule actions select Permanently delete noncurrent versions of objects.
  7. Set a value under Days after objects become noncurrent. Databricks recommends using 1 day.
  8. Click Create rule.

Enable logging

To enable logging in the ODBC driver, set the LogLevel property to a value between 1 (severe events only) and 6 (all driver activity). Set the LogPath property to the full path of the folder where you want to save log files.

For more information, see Configuring Logging Options in a Non-Windows Machine in the Databricks ODBC Driver Guide.