Driver capability settings for the Databricks ODBC Driver

This article describes how to configure special and advanced driver capability settings for the Databricks ODBC Driver.

This article supplements the information in the following Databricks ODBC Driver articles:

To configure a Databricks connection for the Databricks ODBC Driver, you must combine your compute resource settings, authentication settings, and any of the following driver capability settings, into into an ODBC Data Source Name (DSN) or a DSN-less connection string.

Whether you use a DSN or DSN-less connection string will depend on the requirements for your target app, tool, client, SDK, or API. For examples of DSNs and DSN-less connection strings, see Authentication settings for the Databricks ODBC Driver.

The Datbricks ODBC Driver provides the following special and advanced driver capability settings.

Set the initial schema in ODBC

The ODBC driver allows you to specify the schema by setting Schema=<schema-name> as a connection configuration. This is equivalent to running USE <schema-name>.

ANSI SQL-92 query support in ODBC

The ODBC driver accepts SQL queries in ANSI SQL-92 dialect and translates the queries to the Databricks SQL dialect. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Databricks, Databricks recommends that you set UseNativeQuery=1 as a connection configuration. With that setting, the driver passes the SQL queries verbatim to Databricks.

Extract large query results in ODBC

To achieve the best performance when you extract large query results, use the latest version of the ODBC driver that includes the following optimizations.

Arrow serialization in ODBC

ODBC driver version 2.6.15 and above supports an optimized query results serialization format that uses Apache Arrow.

Cloud Fetch in ODBC

ODBC driver version 2.6.17 and above supports Cloud Fetch, a capability that fetches query results through the cloud storage that is set up in your Databricks deployment. To use Cloud Fetch to extract query results using this capability, use Databricks Runtime 8.3 or above.

Query results are uploaded to an internal DBFS storage location as Arrow-serialized files of up to 20 MB. When the driver sends fetch requests after query completion, Databricks generates and returns presigned URLs to the uploaded files. The ODBC driver then uses the URLs to download the results directly from DBFS.

Cloud Fetch is only used for query results larger than 1 MB. Smaller results are retrieved directly from Databricks.

Databricks automatically garbage collects the accumulated files, which are marked for deletion after 24 hours. These marked files are completely deleted after an additional 24 hours.

Cloud Fetch is only available for E2 workspaces. Also, your corresponding Amazon S3 buckets must not have versioning enabled. If you have versioning enabled, you can still enable Cloud Fetch by following the instructions in Advanced configurations.

To learn more about the Cloud Fetch architecture, see How We Achieved High-bandwidth Connectivity With BI Tools.

Advanced configurations

If you have enabled S3 bucket versioning on your DBFS root, then Databricks cannot garbage collect older versions of uploaded query results. We recommend setting an S3 lifecycle policy first that purges older versions of uploaded query results.

To set a lifecycle policy follow the steps below:

  1. In the AWS console, go to the S3 service.

  2. Click on the S3 bucket that you use for your workspace’s root storage.

  3. Open the Management tab and choose Create lifecycle rule.

  4. Choose any name for the Lifecycle rule name.

  5. Keep the prefix field empty.

  6. Under Lifecycle rule actions select Permanently delete noncurrent versions of objects.

  7. Set a value under Days after objects become noncurrent. We recommend using the value 1 here.

  8. Click Create rule.

Lifecycle policy