Databricks SQL Connector for Python

The Databricks SQL Connector for Python is a Python library that allows you to use Python code to run SQL commands on Databricks all-purpose compute and Databricks SQL warehouses. The Databricks SQL Connector for Python is easier to set up and use than similar Python libraries such as pyodbc. This library follows PEP 249 – Python Database API Specification v2.0.

important

Databricks SQL Connector for Python version 3.0.0 and above supports native parameterized query execution, which prevents SQL injection and can improve query performance. Previous versions used inline parameterized execution, which is not safe from SQL injection and has other drawbacks. For more information, see Using Native Parameters.

The Databricks SQL Connector for Python also supports the SQLAlchemy dialect for Databricks, but it must be installed to use these features. See Use SQLAlchemy with Databricks.

Requirements

A development machine running Python 3.8 and above.
Databricks recommends that you use Python virtual environments, such as those provided by venv that are included with Python. Virtual environments help to ensure that you are using the correct versions of Python and the Databricks SQL Connector for Python together. Setting up and using virtual environments is outside of the scope of this article. For more information, see Creating virtual environments.
An existing all-purpose compute or SQL warehouse.

Get started

Install the Databricks SQL Connector for Python. PyArrow is an optional dependency of the Databricks SQL Connector for Python and is not installed by default in version 4.0.0 and above of the connector. If PyArrow is not installed, features such as CloudFetch and other Apache Arrow functionality is not available, which may impact the performance for large volumes of data.
- To install the lean connector, use:
```
pip install databricks-sql-connector
```
- To install the complete connector, including PyArrow, use:
```
pip install databricks-sql-connector[pyarrow]
```
Gather the following information for the all-purpose compute or SQL warehouse that you want to use:
- All-purpose compute
- SQL warehouse
- The server hostname of the all-purpose compute. You can get this from the Server Hostname value in the Advanced Options > JDBC/ODBC tab for your all-purpose compute.
- The HTTP path of the all-purpose compute. You can get this from the HTTP Path value in the Advanced Options > JDBC/ODBC tab for your all-purpose compute.
note
The SQL connector does not support connecting to jobs compute.
- The server hostname of the SQL warehouse. You can get this from the Server Hostname value in the Connection Details tab for your SQL warehouse.
- The HTTP path of the SQL warehouse. You can get this from the HTTP Path value in the Connection Details tab for your SQL warehouse.

Authentication

The Databricks SQL Connector for Python supports the following Databricks authentication types:

Databricks personal access token authentication
OAuth machine-to-machine (M2M) authentication
OAuth user-to-machine (U2M) authentication

note

Basic authentication using a Databricks username and password reached end of life on July 10, 2024. See End of life for Databricks-managed passwords.

Databricks personal access token authentication

To use the Databricks SQL Connector for Python with Databricks personal access token authentication, you must first create a Databricks personal access token. To do so, follow the steps in Create personal access tokens for workspace users.

To authenticate the Databricks SQL Connector for Python, use the following code snippet. This snippet assumes that you have set the following environment variables:

DATABRICKS_SERVER_HOSTNAMEset to the Server Hostname value for your all-purpose compute or SQL warehouse.
DATABRICKS_HTTP_PATH, set to HTTP Path value for your all-purpose compute or SQL warehouse.
DATABRICKS_TOKEN, set to the Databricks personal access token.

To set environment variables, see your operating system's documentation.

Python
from databricks import sql
import os

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
                 access_token    = os.getenv("DATABRICKS_TOKEN")) as connection:
# ...

OAuth machine-to-machine (M2M) authentication

Databricks SQL Connector for Python versions 2.5.0 and above support OAuth machine-to-machine (M2M) authentication. You must also install the Databricks SDK for Python (for example by running pip install databricks-sdk or python -m pip install databricks-sdk).

To use the Databricks SQL Connector for Python with OAuth M2M authentication, you must do the following:

Create a Databricks service principal in your Databricks workspace, and create an OAuth secret for that service principal.

To create the service principal and its OAuth secret, see Authorize service principal access to Databricks with OAuth. Make a note of the service principal's UUID or Application ID value, and the Secret value for the service principal's OAuth secret.
Give that service principal access to your all-purpose compute or warehouse.

To give the service principal access to your all-purpose compute or warehouse, see Compute permissions or Manage a SQL warehouse.

To authenticate the Databricks SQL Connector for Python, use the following code snippet. This snippet assumes that you have set the following environment variables:

DATABRICKS_SERVER_HOSTNAME set to the Server Hostname value for your all-purpose compute or SQL warehouse.
DATABRICKS_HTTP_PATH, set to HTTP Path value for your all-purpose compute or SQL warehouse.
DATABRICKS_CLIENT_ID, set to the service principal's UUID or Application ID value.
DATABRICKS_CLIENT_SECRET, set to the Secret value for the service principal's OAuth secret.

To set environment variables, see your operating system's documentation.

Python
from databricks.sdk.core import Config, oauth_service_principal
from databricks import sql
import os

server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")

def credential_provider():
  config = Config(
    host          = f"https://{server_hostname}",
    client_id     = os.getenv("DATABRICKS_CLIENT_ID"),
    client_secret = os.getenv("DATABRICKS_CLIENT_SECRET"))
  return oauth_service_principal(config)

with sql.connect(server_hostname      = server_hostname,
                 http_path            = os.getenv("DATABRICKS_HTTP_PATH"),
                 credentials_provider = credential_provider) as connection:
# ...

OAuth user-to-machine (U2M) authentication

Databricks SQL Connector for Python versions 2.1.0 and above support OAuth user-to-machine (U2M) authentication.

To authenticate the Databricks SQL Connector for Python with OAuth U2M authentication, use the following code snippet. OAuth U2M authentication uses real-time human sign-in and consent to authenticate the target Databricks user account. This snippet assumes that you have set the following environment variables:

Set DATABRICKS_SERVER_HOSTNAME to the Server Hostname value for your all-purpose compute or SQL warehouse.
Set DATABRICKS_HTTP_PATH to HTTP Path value for your all-purpose compute or SQL warehouse.

To set environment variables, see your operating system's documentation.

Python
from databricks import sql
import os

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
                 auth_type       = "databricks-oauth") as connection:
# ...

Examples

The following code examples demonstrate how to use the Databricks SQL Connector for Python to query and insert data, query metadata, manage cursors and connections, manage files in Unity Catalog, and configure logging.

note

The following code examples demonstrate how to use a Databricks personal access token for authentication. To use a different authentication type, see Authentication.

These code example retrieve their server_hostname, http_path, and access_token connection variable values from these environment variables:

DATABRICKS_SERVER_HOSTNAME, which represents the Server Hostname value from the requirements.
DATABRICKS_HTTP_PATH, which represents the HTTP Path value from the requirements.
DATABRICKS_TOKEN, which represents your access token from the requirements.

Set User-Agent

The following code example demonstrates how to set the User-Agent application product_name for usage tracking.

Python
from databricks import sql
import os

with sql.connect(server_hostname   = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                 http_path         = os.getenv("DATABRICKS_HTTP_PATH"),
                 access_token      = os.getenv("DATABRICKS_TOKEN"),
                 user_agent_entry = "product_name") as connection:
  with connection.cursor() as cursor:
    cursor.execute("SELECT 1 + 1")
    result = cursor.fetchall()

    for row in result:
      print(row)

Query data

The following code example demonstrates how to call the Databricks SQL Connector for Python to run a basic SQL command on all-purpose compute or SQL warehouse. This command returns the first two rows from the trips table in the samples catalog's nyctaxi schema.

Python
from databricks import sql
import os

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
                 access_token    = os.getenv("DATABRICKS_TOKEN")) as connection:

  with connection.cursor() as cursor:
    cursor.execute("SELECT * FROM samples.nyctaxi.trips LIMIT ?", [2])
    result = cursor.fetchall()

    for row in result:
      print(row)

Query tags

Preview

This feature is in Private Preview. To request access, contact your account team.

The following example shows how to attach key-value tags to your SQL queries for tracking and analytics purposes. Query tags appear in the system.query.history table.

Python
from databricks import sql
import os

with sql.connect(
    server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
    http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
    access_token    = os.getenv("DATABRICKS_TOKEN"),
    session_configuration = {
        'query_tags': 'team:engineering,dashboard:abc123,env:prod'
    }
) as connection:
    with connection.cursor() as cursor:
        cursor.execute("SELECT * FROM samples.nyctaxi.trips LIMIT ?", [2])
        result = cursor.fetchall()
        # Query is now tagged and trackable in system.query.history
        for row in result:
            print(row)

Insert data

The following example demonstrates how to insert small amounts of data (thousands of rows):

Python
from databricks import sql
import os

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
                 access_token    = os.getenv("DATABRICKS_TOKEN")) as connection:

  with connection.cursor() as cursor:
    cursor.execute("CREATE TABLE IF NOT EXISTS squares (x int, x_squared int)")

    squares = [(i, i * i) for i in range(100)]

    cursor.executemany("INSERT INTO squares VALUES (?, ?)", squares)

    cursor.execute("SELECT * FROM squares LIMIT ?", [10])

    result = cursor.fetchall()

    for row in result:
      print(row)

For large amounts of data, you should first upload the data to cloud storage and then execute the COPY INTO command.

Query metadata

There are dedicated methods for retrieving metadata. The following example retrieves metadata about columns in a sample table:

Python
from databricks import sql
import os

with sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                 http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
                 access_token    = os.getenv("DATABRICKS_TOKEN")) as connection:

  with connection.cursor() as cursor:
    cursor.columns(schema_name="default", table_name="squares")
    print(cursor.fetchall())

Manage cursors and connections

It is a best practice to close any connections and cursors that are no longer in use. This frees resources on Databricks all-purpose compute and Databricks SQL warehouses.

You can use a context manager (the with syntax used in previous examples) to manage the resources, or explicitly call close:

Python
from databricks import sql
import os

connection = sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                         http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
                         access_token    = os.getenv("DATABRICKS_TOKEN"))

cursor = connection.cursor()

cursor.execute("SELECT * from range(10)")
print(cursor.fetchall())

cursor.close()
connection.close()

Manage files in Unity Catalog volumes

The Databricks SQL Connector enables you to write local files to Unity Catalog volumes, download files from volumes, and delete files from volumes, as shown in the following example:

Python
from databricks import sql
import os

# For writing local files to volumes and downloading files from volumes,
# you must set the staging_allowed_local_path argument to the path to the
# local folder that contains the files to be written or downloaded.
# For deleting files in volumes, you must also specify the
# staging_allowed_local_path argument, but its value is ignored,
# so in that case its value can be set for example to an empty string.
with sql.connect(server_hostname            = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                 http_path                  = os.getenv("DATABRICKS_HTTP_PATH"),
                 access_token               = os.getenv("DATABRICKS_TOKEN"),
                 staging_allowed_local_path = "/tmp/") as connection:

  with connection.cursor() as cursor:

    # Write a local file to the specified path in a volume.
    # Specify OVERWRITE to overwrite any existing file in that path.
    cursor.execute(
      "PUT '/tmp/my-data.csv' INTO '/Volumes/main/default/my-volume/my-data.csv' OVERWRITE"
    )

    # Download a file from the specified path in a volume.
    cursor.execute(
      "GET '/Volumes/main/default/my-volume/my-data.csv' TO '/tmp/my-downloaded-data.csv'"
    )

    # Delete a file from the specified path in a volume.
    cursor.execute(
      "REMOVE '/Volumes/main/default/my-volume/my-data.csv'"
    )

Configure logging

The Databricks SQL Connector uses Python's standard logging module. The following example configures the logging level and generates a debug log:

Python
from databricks import sql
import os, logging

logging.getLogger("databricks.sql").setLevel(logging.DEBUG)
logging.basicConfig(filename = "results.log",
                    level    = logging.DEBUG)

connection = sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
                         http_path       = os.getenv("DATABRICKS_HTTP_PATH"),
                         access_token    = os.getenv("DATABRICKS_TOKEN"))

cursor = connection.cursor()

cursor.execute("SELECT * from range(10)")

result = cursor.fetchall()

for row in result:
   logging.debug(row)

cursor.close()
connection.close()

Testing

To test your code, use Python test frameworks such as pytest. To test your code under simulated conditions without calling Databricks REST API endpoints or changing the state of your Databricks accounts or workspaces, you can use Python mocking libraries such as unittest.mock.

For example, given the following file named helpers.py containing a get_connection_personal_access_token function that uses a Databricks personal access token to return a connection to a Databricks workspace, and a select_nyctaxi_trips function that uses the connection to get the specified number of data rows from the trips table in the samples catalog's nyctaxi schema:

Python
# helpers.py

from databricks import sql
from databricks.sql.client import Connection, List, Row, Cursor

def get_connection_personal_access_token(
  server_hostname: str,
  http_path: str,
  access_token: str
) -> Connection:
  return sql.connect(
    server_hostname = server_hostname,
    http_path = http_path,
    access_token = access_token
  )

def select_nyctaxi_trips(
  connection: Connection,
  num_rows: int
) -> List[Row]:
  cursor: Cursor = connection.cursor()
  cursor.execute("SELECT * FROM samples.nyctaxi.trips LIMIT ?", [num_rows])
  result: List[Row] = cursor.fetchall()
  return result

And given the following file named main.py that calls the get_connection_personal_access_token and select_nyctaxi_trips functions:

Python
# main.py

from databricks.sql.client import Connection, List, Row
import os
from helpers import get_connection_personal_access_token, select_nyctaxi_trips

connection: Connection = get_connection_personal_access_token(
  server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
  http_path = os.getenv("DATABRICKS_HTTP_PATH"),
  access_token = os.getenv("DATABRICKS_TOKEN")
)

rows: List[Row] = select_nyctaxi_trips(
  connection = connection,
  num_rows = 2
)

for row in rows:
  print(row)

The following file named test_helpers.py tests whether the select_nyctaxi_trips function returns the expected response. Rather than creating a real connection to the target workspace, this test mocks a Connection object. The test also mocks some data that conforms to the schema and values that are in the real data. The test returns the mocked data through the mocked connection and then checks whether one of the mocked data rows' values matches the expected value.

Python
# test_helpers.py

import pytest
from databricks.sql.client import Connection, List, Row
from datetime import datetime
from helpers import select_nyctaxi_trips
from unittest.mock import create_autospec

@pytest.fixture
def mock_data() -> List[Row]:
  return [
    Row(
      tpep_pickup_datetime = datetime(2016, 2, 14, 16, 52, 13),
      tpep_dropoff_datetime = datetime(2016, 2, 14, 17, 16, 4),
      trip_distance = 4.94,
      fare_amount = 19.0,
      pickup_zip = 10282,
      dropoff_zip = 10171
    ),
    Row(
      tpep_pickup_datetime = datetime(2016, 2, 4, 18, 44, 19),
      tpep_dropoff_datetime = datetime(2016, 2, 4, 18, 46),
      trip_distance = 0.28,
      fare_amount = 3.5,
      pickup_zip = 10110,
      dropoff_zip = 10110
    )
  ]

def test_select_nyctaxi_trips(mock_data: List[Row]):
  # Create a mock Connection.
  mock_connection = create_autospec(Connection)

  # Set the mock Connection's cursor().fetchall() to the mock data.
  mock_connection.cursor().fetchall.return_value = mock_data

  # Call the real function with the mock Connection.
  response: List[Row] = select_nyctaxi_trips(
    connection = mock_connection,
    num_rows = 2)

  # Check the value of one of the mocked data row's columns.
  assert response[1].fare_amount == 3.5

Because the select_nyctaxi_trips function contains a SELECT statement and therefore does not change the state of the trips table, mocking is not absolutely required in this example. However, mocking enables you to quickly run your tests without waiting for an actual connection to be made with the workspace. Also, mocking enables you to run simulated tests multiple times for functions that might change a table's state, such as INSERT INTO, UPDATE, and DELETE FROM.

API reference

This section contains API reference for the databricks-sql-connector package. See databricks-sql-connector in the Python Package Index (PyPI).

Module

The databricks.sql module of the databricks-sql-connector package contains the method to initialize a connection to a SQL warehouse.

connect method

Initializes a connection to a SQL warehouse. Returns a Connection object.

Parameter	Type	Description
`server_hostname`	`str`	Required. The server hostname for the all-purpose compute or SQL warehouse, for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. To get the server hostname, see the instructions in Get started.
`http_path`	`str`	Required. The HTTP path of the all-purpose compute or SQL warehouse, for example `sql/protocolv1/o/1234567890123456/1234-567890-test123` for all-purpose compute or `/sql/1.0/warehouses/a1b234c567d8e9fa` for a SQL warehouse. To get the HTTP path, see the instructions in Get started.
`access_token`, `auth_type`, `credentials_provider`, `password`, `username`	`str`	Information about Databricks authentication settings. For details see Authentication.
`session_configuration`	`dict[str, Any]`	A dictionary of Spark session configuration parameters. Setting a configuration is equivalent to using the `SET key=val` SQL command. Run the SQL command `SET -v` to get a full list of available configurations. Defaults to `None`. Example: `{"spark.sql.variable.substitute": True}`
`http_headers`	`List[Tuple[str, str]]]`	Optional. Additional (key, value) pairs to set in HTTP headers on every RPC request the client makes. Typical usage will not set any extra HTTP headers. Defaults to `None`.
`catalog`	`str`	Optional. Initial catalog to use for the connection. Defaults to `None` (in which case the default catalog, typically `hive_metastore` will be used).
`schema`	`str`	Optional. Initial schema to use for the connection. Defaults to `None` (in which case the default schema `default` will be used). Since version 2.0
`use_cloud_fetch`	`bool`	Optional. Whether to send fetch requests directly to the cloud object store to download chunks of data. Defaults to `True`. Set to `False` to send fetch requests directly to Databricks. If `use_cloud_fetch` is set to `True` but network access is blocked, then the fetch requests will fail. Since version 2.8
`user_agent_entry`	`str`	Optional. The User-Agent entry to include in the HTTP request header for usage tracking. Defaults to `PyDatabricksSqlConnector`.

`Connection` class

Represents a connection to compute or a SQL warehouse.

Methods

The Connection class provides the following methods.

Method	Description
`close`	Closes the connection to the database and releases all associated resources on the server. Any additional calls to this connection will throw an `Error`. No parameters. No return value.
`cursor`	Returns a new Cursor object that enables traversal over the records in a database. No parameters.

Method

Description

close

Closes the connection to the database and releases all associated resources on the server. Any additional calls to this connection will throw an Error.

No parameters.

No return value.

cursor

Returns a new Cursor object that enables traversal over the records in a database.

No parameters.

`Cursor` class

Represents a mechanism for traversing over data records.

To create a Cursor object, call the cursor method of the Connection` class.

Attributes

Selected Cursor attributes include the following:

Attribute	Description
`arraysize`	Used with the `fetchmany` method, specifies the internal buffer size, which is also how many rows are actually fetched from the server at a time. The default value is `10000`. For narrow results (results in which each row does not contain a lot of data), you should increase this value for better performance. Read-write access.
`description`	Contains a Python `list` of `tuple` objects. Each of these `tuple` objects contains 7 values, with the first 2 items of each `tuple` object containing information describing a single result column as follows: `name`: The name of the column. `type_code`: A string representing the type of the column. For example, an integer column will have a type code of `int`. The remaining 5 items of each 7-item `tuple` object are not implemented, and their values are not defined. They will typically be returned as 4 `None` values followed by a single `True` value. Read-only access.

Attribute

Description

arraysize

Used with the fetchmany method, specifies the internal buffer size, which is also how many rows are actually fetched from the server at a time. The default value is 10000. For narrow results (results in which each row does not contain a lot of data), you should increase this value for better performance. Read-write access.

description

Contains a Python list of tuple objects. Each of these tuple objects contains 7 values, with the first 2 items of each tuple object containing information describing a single result column as follows:

name: The name of the column.
type_code: A string representing the type of the column. For example, an integer column will have a type code of int. The remaining 5 items of each 7-item tuple object are not implemented, and their values are not defined. They will typically be returned as 4 None values followed by a single True value. Read-only access.

Methods

Selected Cursor methods include the following:

Method	Description
`cancel`	Interrupts the running of any database query or command that the cursor has started. To release the associated resources on the server, call the `close` method after calling the `cancel` method. No parameters. No return value.
`close`	Closes the cursor and releases the associated resources on the server. Closing an already closed cursor might throw an error. No parameters. No return value.
`execute`	Prepares and then runs a database query or command. Parameters: `operation`: Required. The query or command to prepare and then run. Type: `str` Example without the `parameters` parameter: `cursor.execute('SELECT * FROM samples.nyctaxi.trips LIMIT 2')` Example with the `parameters` parameter (using native positional parameters): `cursor.execute('SELECT * FROM samples.nyctaxi.trips WHERE pickup_zip = ? LIMIT ?', ['10019', 2])` `parameters`: Optional. A sequence of parameters to use with the `operation` parameter. The default is `None`. Type: `dictionary` No return value.
`executemany`	Prepares and then runs a database query or command using all parameter sequences in the `seq_of_parameters` argument. Only the final result set is retained. Parameters: `operation`: Required. The query or command to prepare and then run. Type: `str` `seq_of_parameters`: Required. A sequence of many sets of parameter values to use with the `operation` parameter. Type: `list` of `dict` No return value.
`catalogs`	Execute a metadata query about the catalogs. Actual results should then be fetched using `fetchmany` or `fetchall`. Important fields in the result set include: Field name: `TABLE_CAT`. The name of the catalog. Type: `str` No parameters. No return value. Since version 1.0
`schemas`	Execute a metadata query about the schemas. Actual results should then be fetched using `fetchmany` or `fetchall`. Important fields in the result set include: Field name: `TABLE_SCHEM`. The name of the schema. Type: `str` Field name: `TABLE_CATALOG`. The catalog to which the schema belongs. Type: `str` Parameters: `catalog_name`: Optional. A catalog name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` `schema_name`: Optional. A schema name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` No return value. Since version 1.0
`tables`	Execute a metadata query about tables and views. Actual results should then be fetched using `fetchmany` or `fetchall`. Important fields in the result set include: Field name: `TABLE_CAT`. The catalog to which the table belongs. Type: `str` Field name: `TABLE_SCHEM`. The schema to which the table belongs. Type: `str` Field name: `TABLE_NAME`. The name of the table. Type: `str` Field name: `TABLE_TYPE`. The kind of relation, for example `VIEW` or `TABLE` (applies to Databricks Runtime 10.4 LTS and above as well as to Databricks SQL; prior versions of the Databricks Runtime return an empty string). Type: `str` Parameters: `catalog_name`: Optional. A catalog name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` `schema_name`: Optional. A schema name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` `table_name`: Optional. A table name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` `table_types`: Optional. A list of table types to match, for example `TABLE` or `VIEW`. Type: `List[str]` No return value. Since version 1.0
`columns`	Execute a metadata query about the columns. Actual results should then be fetched using `fetchmany` or `fetchall`. Important fields in the result set include: Field name: `TABLE_CAT`. The catalog to which the column belongs. Type: `str` Field name: `TABLE_SCHEM`. The schema to which the column belongs. Type: `str` Field name: `TABLE_NAME`. The name of the table to which the column belongs. Type: `str` Field name: `COLUMN_NAME`. The name of the column. Type: `str` Parameters: `catalog_name`: Optional. A catalog name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` `schema_name`: Optional. A schema name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` `table_name`: Optional. A table name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` `column_name`: Optional. A column name to retrieve information about. The `%` character is interpreted as a wildcard. Type: `str` No return value. Since version 1.0
`fetchall`	Gets all (or all remaining) rows of a query. No parameters. Returns all (or all remaining) rows of the query as a Python `list` of `Row` objects. Throws an `Error` if the previous call to the `execute` method did not return any data or no `execute` call has yet been made.
`fetchmany`	Gets the next rows of a query. Parameters: `size`: Optional. The number of next rows to get. If not specified, the value of the `arraysize` attribute is used. Type: `int`. Example: `cursor.fetchmany(10)` Returns up to `size` (or the `arraysize` attribute if `size` is not specified) of the next rows of a query as a Python `list` of `Row` objects. If there are fewer than `size` rows left to be fetched, all remaining rows will be returned. Throws an `Error` if the previous call to the `execute` method did not return any data or no `execute` call has yet been made.
`fetchone`	Gets the next row of the dataset. No parameters. Returns the next row of the dataset as a single sequence as a Python `tuple` object, or returns `None` if there is no more available data. Throws an `Error` if the previous call to the `execute` method did not return any data or no `execute` call has yet been made.
`fetchall_arrow`	Gets all (or all remaining) rows of a query, as a PyArrow `Table` object. Queries returning very large amounts of data should use `fetchmany_arrow` instead to reduce memory consumption. No parameters. Returns all (or all remaining) rows of the query as a PyArrow table. Throws an `Error` if the previous call to the `execute` method did not return any data or no `execute` call has yet been made. Since version 2.0
`fetchmany_arrow`	Gets the next rows of a query as a PyArrow `Table` object. Parameters: `size`: Optional. The number of next rows to get. If not specified, the value of the `arraysize` attribute is used. Type: `int`. Example: `cursor.fetchmany_arrow(10)` Returns up to the `size` argument (or the `arraysize` attribute if `size` is not specified) of the next rows of a query as a Python PyArrow `Table` object. Throws an `Error` if the previous call to the `execute` method did not return any data or no `execute` call has yet been made. Since version 2.0

`Row` class

The row class is a tuple-like data structure that represents an individual result row in a SQL query result. If the row contains a column with the name "my_column", you can access the "my_column" field of row via row.my_column. You can also use numeric indicies to access fields, for example row[0]. If the column name is not allowed as an attribute method name (for example, it begins with a digit), then you can access the field as row["1_my_column"].

Since version 1.0

Selected Row methods include:

Methods

Method	Description
`asDict`	Returns a dictionary representation of the row, which is indexed by field names. If there are duplicate field names, one of the duplicate fields (but only one) will be returned in the dictionary. Which duplicate field is returned is not defined.

Type conversions

The following table maps Apache Spark SQL data types to their Python data type equivalents.

Apache Spark SQL data type	Python data type
`array`	`numpy.ndarray`
`bigint`	`int`
`binary`	`bytearray`
`boolean`	`bool`
`date`	`datetime.date`
`decimal`	`decimal.Decimal`
`double`	`float`
`int`	`int`
`map`	`str`
`null`	`NoneType`
`smallint`	`int`
`string`	`str`
`struct`	`str`
`timestamp`	`datetime.datetime`
`tinyint`	`int`

Telemetry collection

The Databricks SQL Connector for Python collects telemetry data to help Databricks improve reliability and troubleshoot issues. Telemetry is enabled by default and collects the following operational data:

Client environment details, such as driver version, Python runtime, and operating system
Driver connection configurations (excluding any personally identifiable information)
Operation latency measurements
Execution result formats, such as inline JSON or Apache Arrow
Operation types, such as query execution, metadata queries, or volume operations
Error classification data
Retry counts

important

Databricks doesn't collect query content, query results, or any personally identifiable information (PII) through telemetry.

To disable telemetry collection, set the enable_telemetry parameter to 0 when creating a connection.

Troubleshooting

`tokenAuthWrapperInvalidAccessToken: Invalid access token` message

Issue: When you run your code, you see a message similar to Error during request to server: tokenAuthWrapperInvalidAccessToken: Invalid access token.

Possible cause: The value passed to access_token is not a valid Databricks personal access token.

Recommended fix: Check that the value passed to access_token is correct and try again.

`gaierror(8, 'nodename nor servname provided, or not known')` message

Issue: When you run your code, you see a message similar to Error during request to server: gaierror(8, 'nodename nor servname provided, or not known').

Possible cause: The value passed to server_hostname is not the correct host name.

Recommended fix: Check that the value passed to server_hostname is correct and try again.

For more information on finding the server hostname, see Get connection details for a Databricks compute resource.

`IpAclError` message

Issue: When you run your code, you see the message Error during request to server: IpAclValidation when you try to use the connector on a Databricks notebook.

Possible cause: You may have IP allow listing enabled for the Databricks workspace. With IP allow listing, connections from Spark clusters back to the control plane are not allowed by default.

Recommended fix: Ask your administrator to add the compute plane subnet to the IP allow list.

Additional resources

For more information, see:

The Databricks SQL Connector for Python repository on GitHub
Data types
Built-in Types (for bool, bytearray, float, int, and str) on the Python website
datetime (for datetime.date and datatime.datetime) on the Python website
decimal (for decimal.Decimal) on the Python website
Built-in Constants (for NoneType) on the Python website

Requirements​

Get started​

Authentication​

Databricks personal access token authentication​

OAuth machine-to-machine (M2M) authentication​

OAuth user-to-machine (U2M) authentication​

Examples​

Set User-Agent​

Query data​

Query tags​

Insert data​

Query metadata​

Manage cursors and connections​

Manage files in Unity Catalog volumes​

Configure logging​

Testing​

API reference​

Module​

connect method​

Connection class​

Methods​

Cursor class​

Attributes​

Methods​

Row class​

Methods​

Type conversions​

Telemetry collection​

Troubleshooting​

tokenAuthWrapperInvalidAccessToken: Invalid access token message​

gaierror(8, 'nodename nor servname provided, or not known') message​

IpAclError message​

Additional resources​

Requirements

Get started

Authentication

Databricks personal access token authentication

OAuth machine-to-machine (M2M) authentication

OAuth user-to-machine (U2M) authentication

Examples

Set User-Agent

Query data

Query tags

Insert data

Query metadata

Manage cursors and connections

Manage files in Unity Catalog volumes

Configure logging

Testing

API reference

Module

connect method

`Connection` class

Methods

`Cursor` class

Attributes

Methods

`Row` class

Methods

Type conversions

Telemetry collection

Troubleshooting

`tokenAuthWrapperInvalidAccessToken: Invalid access token` message

`gaierror(8, 'nodename nor servname provided, or not known')` message

`IpAclError` message

Additional resources