Databricks SQL Driver for Node.js
The Databricks SQL Driver for Node.js is a Node.js library that allows you to use JavaScript code to run SQL commands on Databricks compute resources.
Requirements
A development machine running Node.js, version 14 or higher. To print the installed version of Node.js, run the command
node -v
. To install and use different versions of Node.js, you can use tools such as Node Version Manager (nvm).Node Package Manager (
npm
). Later versions of Node.js already includenpm
. To check whethernpm
is installed, run the commandnpm -v
. To installnpm
if needed, you can follow instructions such as the ones at Download and install npm.The @databricks/sql package from npm. To install the
@databricks/sql
package in your Node.js project as a dependency, usenpm
to run the following command from within the same directory as your project:npm i @databricks/sql
If you want to install and use TypeScript in your Node.js project as
devDependencies
, usenpm
to run the following commands from within the same directory as your project:npm i -D typescript npm i -D @types/node
An existing cluster or SQL warehouse.
The Server Hostname and HTTP Path value for the existing cluster or SQL warehouse.
Authentication
The Databricks SQL Driver for Node.js supports the following Databricks authentication types:
Note
As a security best practice, you should not hard code connection variable values into your code. Instead, you should retrieve these connection variable values from a secure location. For example, the code snippets and examples in this article use environment variables.
Databricks personal access token authentication
To use the Databricks SQL Driver for Node.js with Databricks personal access token authentication, you must first create a Databricks personal access token, as follows:
In your Databricks workspace, click your Databricks username in the top bar, and then select Settings from the drop down.
Click Developer.
Next to Access tokens, click Manage.
Click Generate new token.
(Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
Click Generate.
Copy the displayed token to a secure location, and then click Done.
Note
Be sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the trash can (Revoke) icon next to the token on the Access tokens page.
If you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator or the following:
To authenticate the Databricks SQL Driver for Node.js, use the following code snippet. This snippet assumes that you have set the following environment variables:
DATABRICKS_SERVER_HOSTNAME
set to the Server Hostname value for your cluster or SQL warehouse.DATABRICKS_HTTP_PATH
, set to HTTP Path value for your cluster or SQL warehouse.DATABRICKS_TOKEN
, set to the Databricks personal access token.
To set environment variables, see your operating system’s documentation.
const { DBSQLClient } = require('@databricks/sql');
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
const token = process.env.DATABRICKS_TOKEN;
if (!token || !serverHostname || !httpPath) {
throw new Error("Cannot find Server Hostname, HTTP Path, or " +
"personal access token. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
"DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}
const client = new DBSQLClient();
const connectOptions = {
token: token,
host: serverHostname,
path: httpPath
};
client.connect(connectOptions)
// ...
import { DBSQLClient } from "@databricks/sql";
const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
const token: string = process.env.DATABRICKS_TOKEN || '';
if (token == '' || serverHostname == '' || httpPath == '') {
throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
"DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}
const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
token: token,
host: serverHostname,
path: httpPath
};
client.connect(connectOptions)
// ...
Username and password authentication
To authenticate the Databricks SQL Driver for Node.js with username and password authentication, also known as basic authentication, use the following code snippet. This snippet assumes that you have set the following environment variables:
DATABRICKS_SERVER_HOSTNAME
set to the Server Hostname value for your cluster or SQL warehouse.DATABRICKS_HTTP_PATH
, set to HTTP Path value for your cluster or SQL warehouse.DATABRICKS_USERNAME
, set to the Databricks user account’s username.DATABRICKS_PASSWORD
, set to the Databricks user account’s password.
To set environment variables, see your operating system’s documentation.
Username and password authentication is possible only if single sign-on is disabled.
const { DBSQLClient } = require('@databricks/sql');
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
const username = process.env.DATABRICKS_USERNAME;
const password = process.env.DATABRICKS_PASSWORD;
if (!token || !serverHostname || !httpPath) {
throw new Error("Cannot find Server Hostname, HTTP Path, username, or " +
"password. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
"DATABRICKS_HTTP_PATH, DATABRICKS_USERNAME, and DATABRICKS_PASSWORD.");
}
const client = new DBSQLClient();
const connectOptions = {
host: serverHostname,
path: httpPath,
username: username,
password: password
};
client.connect(connectOptions)
// ...
import { DBSQLClient } from "@databricks/sql";
const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
const username: string = process.env.DATABRICKS_USERNAME || '';
const password: string = process.env.DATABRICKS_PASSWORD || '';
if (serverHostname == '' || httpPath == '' || username == '' || password == '') {
throw new Error("Cannot find Server Hostname, HTTP Path, username, or " +
"password. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
"DATABRICKS_HTTP_PATH, DATABRICKS_USERNAME, and DATABRICKS_PASSWORD.");
}
const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
host: serverHostname,
path: httpPath,
username: username,
password: password
};
client.connect(connectOptions)
// ...
OAuth user-to-machine (U2M) authentication
Databricks SQL Driver for Node.js versions 1.3.0 and above support OAuth user-to-machine (U2M) authentication.
To authenticate the Databricks SQL Driver for Node.js with OAuth U2M authentication, use the following code snippet. This snippet assumes that you have set the following environment variables:
DATABRICKS_SERVER_HOSTNAME
set to the Server Hostname value for your cluster or SQL warehouse.DATABRICKS_HTTP_PATH
, set to HTTP Path value for your cluster or SQL warehouse.
To set environment variables, see your operating system’s documentation.
const { DBSQLClient } = require('@databricks/sql');
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
if (!serverHostname || !httpPath) {
throw new Error("Cannot find Server Hostname or HTTP Path. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME " +
"and DATABRICKS_HTTP_PATH.");
}
const client = new DBSQLClient();
const connectOptions = {
authType: "databricks-oauth",
host: serverHostname,
path: httpPath
};
client.connect(connectOptions)
// ...
import { DBSQLClient } from "@databricks/sql";
const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
if (serverHostname == '' || httpPath == '') {
throw new Error("Cannot find Server Hostname or HTTP Path. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME " +
"and DATABRICKS_HTTP_PATH.");
}
const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
authType: "databricks-oauth",
host: serverHostname,
path: httpPath
};
client.connect(connectOptions)
// ...
OAuth machine-to-machine (M2M) authentication
Databricks SQL Driver for Node.js versions 1.5.0 and above support OAuth machine-to-machine (M2M) authentication.
To use the Databricks SQL Driver for Node.js with OAuth M2M authentication, you must do the following:
Create a Databricks service principal in your Databricks workspace, and create an OAuth secret for that service principal.
To create the service principal and its OAuth secret, see OAuth machine-to-machine (M2M) authentication. Make a note of the service principal’s UUID or Application ID value, and the Secret value for the service principal’s OAuth secret.
Give the service principal access to your cluster or warehouse. See Compute permissions or Manage a SQL warehouse.
To authenticate the Databricks SQL Driver for Node.js, use the following code snippet. This snippet assumes that you have set the following environment variables:
DATABRICKS_SERVER_HOSTNAME
set to the Server Hostname value for your cluster or SQL warehouse.DATABRICKS_HTTP_PATH
, set to HTTP Path value for your cluster or SQL warehouse.DATABRICKS_CLIENT_ID
, set to the service principal’s UUID or Application ID value.DATABRICKS_CLIENT_SECRET
, set to the Secret value for the service principal’s OAuth secret.
To set environment variables, see your operating system’s documentation.
const { DBSQLClient } = require('@databricks/sql');
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
const clientId = process.env.DATABRICKS_CLIENT_ID;
const clientSecret = process.env.DATABRICKS_CLIENT_SECRET;
if (!serverHostname || !httpPath || !clientId || !clientSecret) {
throw new Error("Cannot find Server Hostname, HTTP Path, or " +
"service principal ID or secret. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
"DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, and " +
"DATABRICKS_CLIENT_SECRET.");
}
const client = new DBSQLClient();
const connectOptions = {
authType: "databricks-oauth",
host: serverHostname,
path: httpPath,
oauthClientId: clientId,
oauthClientSecret: clientSecret
};
client.connect(connectOptions)
// ...
import { DBSQLClient } from "@databricks/sql";
const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
const clientId: string = process.env.DATABRICKS_CLIENT_ID || '';
const clientSecret: string = process.env.DATABRICKS_CLIENT_SECRET || '';
if (serverHostname == '' || httpPath == '' || clientId == '' || clientSecret == '') {
throw new Error("Cannot find Server Hostname, HTTP Path, or " +
"service principal ID or secret. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
"DATABRICKS_HTTP_PATH, DATABRICKS_CLIENT_ID, and " +
"DATABRICKS_CLIENT_SECRET.");
}
const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
authType: "databricks-oauth",
host: serverHostname,
path: httpPath,
oauthClientId: clientId,
oauthClientSecret: clientSecret
};
client.connect(connectOptions)
// ...
Query data
The following code example demonstrates how to call the Databricks SQL Driver for Node.js to run a basic SQL query on a Databricks compute resource. This command returns the first two rows from the trips
table in the samples
catalog’s nyctaxi
schema.
Note
The following code example demonstrates how to use a Databricks personal access token for authentication. To use other available Databricks authentication types instead, see Authentication.
This code example retrieves the token
, server_hostname
and http_path
connection variable values from a set of Databricks environment variables. These environment variables have the following environment variable names:
DATABRICKS_TOKEN
, which represents your Databricks personal access token from the requirements.DATABRICKS_SERVER_HOSTNAME
, which represents the Server Hostname value from the requirements.DATABRICKS_HTTP_PATH
, which represents the HTTP Path value from the requirements.
You can use other approaches to retrieving these connection variable values. Using environment variables is just one approach among many.
The following code example demonstrates how to call the Databricks SQL Connector for Node.js to run a basic SQL command on a cluster or SQL warehouse. This command returns the first two rows from the trips
table.
const { DBSQLClient } = require('@databricks/sql');
const token = process.env.DATABRICKS_TOKEN;
const serverHostname = process.env.DATABRICKS_SERVER_HOSTNAME;
const httpPath = process.env.DATABRICKS_HTTP_PATH;
if (!token || !serverHostname || !httpPath) {
throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
"Check the environment variables DATABRICKS_TOKEN, " +
"DATABRICKS_SERVER_HOSTNAME, and DATABRICKS_HTTP_PATH.");
}
const client = new DBSQLClient();
const connectOptions = {
token: token,
host: serverHostname,
path: httpPath
};
client.connect(connectOptions)
.then(async client => {
const session = await client.openSession();
const queryOperation = await session.executeStatement(
'SELECT * FROM samples.nyctaxi.trips LIMIT 2',
{
runAsync: true,
maxRows: 10000 // This option enables the direct results feature.
}
);
const result = await queryOperation.fetchAll();
await queryOperation.close();
console.table(result);
await session.close();
await client.close();
})
.catch((error) => {
console.log(error);
});
import { DBSQLClient } from '@databricks/sql';
import IDBSQLSession from '@databricks/sql/dist/contracts/IDBSQLSession';
import IOperation from '@databricks/sql/dist/contracts/IOperation';
const serverHostname: string = process.env.DATABRICKS_SERVER_HOSTNAME || '';
const httpPath: string = process.env.DATABRICKS_HTTP_PATH || '';
const token: string = process.env.DATABRICKS_TOKEN || '';
if (serverHostname == '' || httpPath == '' || token == '') {
throw new Error("Cannot find Server Hostname, HTTP Path, or personal access token. " +
"Check the environment variables DATABRICKS_SERVER_HOSTNAME, " +
"DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN.");
}
const client: DBSQLClient = new DBSQLClient();
const connectOptions = {
host: serverHostname,
path: httpPath,
token: token
};
client.connect(connectOptions)
.then(async client => {
const session: IDBSQLSession = await client.openSession();
const queryOperation: IOperation = await session.executeStatement(
'SELECT * FROM samples.nyctaxi.trips LIMIT 2',
{
runAsync: true,
maxRows: 10000 // This option enables the direct results feature.
}
);
const result = await queryOperation.fetchAll();
await queryOperation.close();
console.table(result);
await session.close();
client.close();
})
.catch((error) => {
console.log(error);
});
Output:
┌─────────┬─────┬────────┬───────────┬───────┬─────────┬────────┬───────┬───────┬────────┬────────┬────────┐
│ (index) │ _c0 │ carat │ cut │ color │ clarity │ depth │ table │ price │ x │ y │ z │
├─────────┼─────┼────────┼───────────┼───────┼─────────┼────────┼───────┼───────┼────────┼────────┼────────┤
│ 0 │ '1' │ '0.23' │ 'Ideal' │ 'E' │ 'SI2' │ '61.5' │ '55' │ '326' │ '3.95' │ '3.98' │ '2.43' │
│ 1 │ '2' │ '0.21' │ 'Premium' │ 'E' │ 'SI1' │ '59.8' │ '61' │ '326' │ '3.89' │ '3.84' │ '2.31' │
└─────────┴─────┴────────┴───────────┴───────┴─────────┴────────┴───────┴───────┴────────┴────────┴────────┘
Sessions
All IDBSQLSession
methods that return IOperation
objects in the API Reference have the following common parameters that affect their behavior:
Setting
runAsync
totrue
starts asynchronous mode.IDBSQLSession
methods put operations into the queue and return as quickly as possible. The current state of the returnedIOperation
object might vary, and the client is responsible for checking its status before using the returnedIOperation
. See Operations. SettingrunAsync
tofalse
means thatIDBSQLSession
methods wait for operations to complete. Databricks recommends always settingrunAsync
totrue
.Setting
maxRows
to a non-null value enables direct results. With direct results, the server tries to wait for operations to complete and then fetches a portion of the data. Depending on how much work the server was able to complete within the defined time,IOperation
objects return in some intermediate state instead of in some pending state. Very often all the metadata and query results are returned within a single request to the server. The server usesmaxRows
to determine how many records it can return immediately. However, the actual chunk may be of a different size; seeIDBSQLSession.fetchChunk
. Direct results are enabled by default. Databricks recommends against disabling direct results.
Operations
As described in Sessions, IOperation
objects that are returned by IDBSQLSession
session methods in the API Reference are not fully populated. The related server operation might still be in progress, such as waiting for the Databricks SQL warehouse to start, running the query, or fetching the data. The IOperation
class hides these details from users. For example, methods such fetchAll
, fetchChunk
, and getSchema
wait internally for operations to complete and then return results. You can use the IOperation.finished()
method to explicitly wait for operations to complete. These methods take a callback that is periodically called while waiting for operations to complete. Setting the progress
option to true
attempts to request extra progress data from the server and pass it to that callback.
The close
and cancel
methods can be called at any time. When called, they immediately invalidate the IOperation
object; all pending calls such as fetchAll
, fetchChunk
, and getSchema
are immediately canceled and an error is returned. In some cases, the server operation might have already completed and the cancel
method affects only the client.
The fetchAll
method calls fetchChunk
internally and collects all of the data into an array. While this is convenient, it might cause out of memory errors when used on large datasets. fetchAll
options are typically passed to fetchChunk
.
Fetch chunks of data
Fetching data chunks uses the following code pattern:
do {
const chunk = await operation.fetchChunk();
// Process the data chunk.
} while (await operation.hasMoreRows());
The fetchChunk
method in the API Reference processes data in small portions to reduce memory consumption. fetchChunk
first waits for operations to complete if they have not already completed, then calls a callback during the wait cycle, and then fetches the next data chunk.
You can use the maxRows
option to specify the desired chunk size. However, the returned chunk might have a different size, smaller or even sometimes larger. fetchChunk
does not attempt to prefetch data internally, in order to slice it into the requested portions. It sends the maxRows
option to then server and returns whatever the server returns. Do not confuse this maxRows
option with the one in IDBSQLSession
. maxRows
passed to fetchChunk
defines the size of each chunk and does not do anything else.
Configure logging
The logger provides information for debugging problems with the connector. All DBSQLClient
objects are instantiated with a logger that prints to the console, but by passing in a custom logger, you can send this information to a file. The following example shows how to configure a logger and change its level.
const { DBSQLLogger, LogLevel } = require('@databricks/sql');
const logger = new DBSQLLogger({
filepath: 'log.txt',
level: LogLevel.info
});
// Set logger to different level.
logger.setLevel(LogLevel.debug);
import { DBSQLLogger, LogLevel } from '@databricks/sql';
const logger = new DBSQLLogger({
filepath: 'log.txt',
level: LogLevel.info,
});
// Set logger to different level.
logger.setLevel(LogLevel.debug);
For additional examples, see the examples folder in the databricks/databricks-sql-nodejs repository on GitHub.
Additional resources
The Databricks SQL Driver for Node.js repository on GitHub
Getting started with the Databricks SQL Driver for Node.js
Troubleshooting the Databricks SQL Driver for Node.js
API reference
Classes
DBSQLClient
class
Main entry point for interacting with a database.
Methods
connect
method
Opens a connection to the database.
Parameters |
---|
options Type: The set of options used to connect to the database. The Example: const client: DBSQLClient = new DBSQLClient();
client.connect(
{
host: serverHostname,
path: httpPath,
// ...
}
)
|
Returns:
Promise<IDBSQLClient>
openSession
method
Opens session between DBSQLClient and database.
Parameters |
---|
request Type: A set of optional parameters for specifying initial schema and initial catalog Example: const session = await client.openSession(
{initialCatalog: 'catalog'}
);
|
Returns:
Promise<IDBSQLSession>
getClient
method
Returns internal thrift TCLIService.Client object. Must be called after DBSQLClient has connected.
No parameters
Returns TCLIService.Client
close
method
Closes the connection to the database and releases all associated resources on the server. Any additional calls to this client will throw error.
No parameters.
No return value.
DBSQLSession
class
DBSQLSessions are primarily used for the execution of statements against the databbase as well as various metadata fetching operations.
Methods
executeStatement
method
Executes a statement with the options provided.
Parameters |
---|
statement Type: The statement to be executed. |
options Type: A set of optional parameters for determining query timeout, max
rows for direct results, and whether to run the query
asynchronously. By default Example: const session = await client.openSession(
{initialCatalog: 'catalog'}
);
queryOperation = await session.executeStatement(
'SELECT "Hello, World!"', { runAsync: true }
);
|
Returns:
Promise<IOperation>
getTypeInfo
method
Returns information about supported data types.
Parameters |
---|
request Type: Request parameters. |
Returns:
Promise<IOperation>
getCatalogs
method
Gets list of catalogs.
Parameters |
---|
request Type: Request parameters. |
Returns:
Promise<IOperation>
getSchemas
method
Gets list of schemas.
Parameters |
---|
request Type: Request parameters. Fields |
Returns:
Promise<IOperation>
getTables
method
Gets list of tables.
Parameters |
---|
request Type: Request parameters. Fields |
Returns:
Promise<IOperation>
getFunctions
method
Gets list of tables.
Parameters |
---|
request Type: Request parameters. Field |
Returns:
Promise<IOperation>
getPrimaryKeys
method
Gets list of primary keys.
Parameters |
---|
request Type: Request parameters. Fields |
Returns:
Promise<IOperation>
getCrossReference
method
Gets information about foreign keys between two tables.
Parameters |
---|
request Type: Request parameters. Schema, Parent, and Catalog name must be specified for both tables. |
Returns:
Promise<IOperation>
DBSQLOperation
class
DBSQLOperations are created by DBSQLSessions and can be used to fetch the results of statements and check up on their execution. Data is fetched through functions fetchChunk and fetchAll.
Methods
fetchAll
method
Waits for operation completion, then fetches all rows from operation.
Parameters: None
Returns:
Promise<Array<object>>
fetchChunk
method
Waits for operation completion, then fetches up to a specified number of rows from an operation.
Parameters |
---|
options Type: Options used to fetch. Currently, the only option is maxRows, which corresponds to the max number of data objects to be returned in any given array. |
Returns:
Promise<Array<object>>
close
method
Closes the operation and releases all associated resources. Must be done after no longer using operation.
No parameters.
No return value.