SQL endpoints

Preview

This feature is in Public Preview.

A SQL endpoint is a computation resource that lets you run SQL commands on data objects within the Databricks environment.

By default, a SQL endpoint uses compute resources in your AWS account. This is called a Classic SQL endpoint. You can also create Databricks-managed SQL endpoints, called Serverless SQL endpoints (Public Preview), that use compute resources in the Databricks cloud account. Serverless SQL endpoints simplify SQL endpoint management and accelerate launch times. See Enable Serverless SQL endpoints.

SQL endpoints appear in query history and record the user that ran the query.

SQL endpoints support the SQL commands in SQL reference for Databricks SQL.

This section describes how to work with SQL endpoints using the UI. To work with SQL endpoints using the API, see SQL Endpoints APIs.

Queries through SQL endpoints are managed by the Query Watchdog with the defaults as described in that article.

Requirements

  • To create a SQL endpoint you must have cluster creation permission in Databricks Data Science & Engineering.
  • To manage a SQL endpoint you must have Can Manage permission in Databricks SQL.

View SQL endpoints

Click Endpoints Icon SQL Endpoints in the sidebar.

By default, endpoints are sorted in alphabetical order. You can reorder the list by clicking the column headings.

To help you get started quickly, such as when you follow Quickstart: Learn about Databricks SQL by importing dashboards from the Sample Dashboard Gallery, Databricks creates a small SQL endpoint called Starter Endpoint automatically. You can edit or delete this SQL endpoint.

To filter the list of endpoints, enter text in the search box:

Filter endpoints

Create a SQL endpoint

You can choose to create a SQL endpoint using the web application or the SQL Endpoint API.

Perform the following steps to create a SQL endpoint using the web application.

  1. Click Endpoints Icon SQL Endpoints in the sidebar.

  2. Click + New SQL Endpoint.

    Create endpoint

    The page looks slightly different if Serverless SQL endpoints are enabled. There is no Spot instance policy field, and the label New Classic SQL Endpoint or Back to New Serverless SQL Endpoint appears at the bottom of the page, depending on which type of SQL endpoint you are creating.

    Serverless blue toggle text
  3. (Optional) Switch the SQL endpoint type (Classic or Serverless).

    Classic SQL endpoint endpoints use compute resources in your AWS account. Serverless SQL endpoints use compute resources in the Databricks cloud account. Serverless SQL endpoints simplify SQL endpoint configuration and usage and accelerate launch times. The Serverless option is available only if it has been enabled for the workspace. For more information, see Enable Serverless SQL endpoints.

    • If Serverless SQL endpoints are enabled, the default is to create a Serverless SQL endpoint. If you prefer a Classic SQL endpoint, click New Classic SQL Endpoint at the bottom of the page. To switch back to Serverless, click Back to New Serverless SQL Endpoint.
    • If Serverless SQL endpoints are not enabled, the SQL endpoint is always a Classic SQL endpoint. You cannot change the endpoint type.
  4. Enter a name for the endpoint.

  5. Accept or edit the endpoint properties.

    For Serverless SQL endpoints, keep the Auto Stop default of 10 minutes.

  6. Click Create.

  7. Do one of the following:

    The endpoint is created and started.

Start, stop, or delete a SQL endpoint

  1. Click Endpoints Icon SQL Endpoints in the sidebar.
  2. In the Actions column, click the vertical ellipsis Vertical Ellipsis and select Start, Stop, or Delete.

Also see Configure an instance profile.

Edit a SQL endpoint

You can choose to edit a SQL endpoint using the web application or the SQL Endpoint API.

Perform the following steps to edit a SQL endpoint using the web application.

  1. Click Endpoints Icon SQL Endpoints in the sidebar.

  2. Click an endpoint.

  3. Click the Edit button.

  4. Edit endpoint properties.

    • Serverless: If Serverless SQL endpoints are enabled for your workspace, you have the option to convert a Classic SQL endpoint to a Serverless SQL endpoint or convert a Serverless SQL endpoint to a Classic SQL endpoint. Click the Serverless (preview) toggle to change its type.

      For a comparison of SQL endpoint types, see Enable Serverless SQL endpoints and Serverless compute.

    • Cluster Size: the number of cluster workers and size of the coordinator. The default is X-Large. To reduce query latency, increase the size. A larger size has a larger coordinator and doubles the number of cluster workers. For details, see Cluster size.

    • Auto Stop: whether the endpoint stops if it’s idle for the specified number of minutes.

      For Classic SQL endpoints, the default is On and set to 120 minutes.

      For Serverless SQL endpoints, the default is On and set to 10 minutes. If you are converting a Classic SQL endpoint to Serverless, set this value to 10 minutes.

      Note

      Idle SQL endpoints continue to accumulate DBU and cloud instance charges until they are stopped.

    • Multi-cluster Load Balancing: the minimum and maximum number of clusters over which queries sent to the endpoint are distributed.

      The default is Off with a maximum of 1 cluster. When set to On, the default is minimum 1 cluster and maximum 2 clusters.

      To handle more concurrent users for a given query, turn on load balancing and increase the cluster count. For details of how Databricks adds clusters to and removes clusters from an endpoint, see Enable Serverless SQL endpoints.

    • Photon: whether queries are executed on the Photon native vectorized engine that speeds up query execution. The default is On.

      To view how much of a query ran using Photon:

      1. Click History Icon Query History in the sidebar.
      2. Click a query.
      3. In the Query Details pane, click Execution Details.
      4. Look at the Task Time in Photon metric at the bottom.

      In general, the larger the percentage of Task Time in Photon, the larger the performance benefit from Photon.

    • Spot Instance Policy: whether all on-demand instances or spot instances are used for workers and on-demand for the driver. The default is Cost Optimized. Reliability Optimized uses all on-demand instances. Cost Optimized uses mostly spot instances and one on-demand instance.

    • Permissions: users and groups that can manage and use the endpoint.

    • Tags: key-value pair annotations that can identify who uses the endpoint.

  5. Click Save or Save and restart.

Convert a Classic SQL endpoint to a Serverless SQL endpoint

To learn about Classic and Serverless SQL endpoints, see Enable Serverless SQL endpoints.

Preview

This feature is in Public Preview. To sign up for access, see Request Access to Serverless SQL.

  1. Click Endpoints Icon SQL Endpoints in the sidebar.
  2. Click a Classic SQL endpoint.
  3. Click the Edit button.
  4. Click the Serverless toggle to change its type.
  5. Set the Auto Stop value. Databricks recommends 10 minutes.
  6. Click Save.

Edit endpoint permissions

To edit endpoint permissions:

  1. Click the Permissions Button button.

    The SQL endpoint permissions display. The endpoint creator and Databricks admins have Can Manage permission.

    Add permission
  2. Select a user or group and a permission.

  3. Click Add.

  4. Click Save.

Add an endpoint tag

Tags allow you to easily monitor the cost of cloud resources used by various groups in your organization. You can specify tags as key-value pairs when you create an endpoint, and Databricks applies these tags to cloud resources.

To add an endpoint tag:

  1. Click Endpoints Icon SQL Endpoints in the sidebar.

  2. Click an endpoint.

  3. Click the Edit button.

  4. In the Tags row, enter a tag key and value.

    Add tag
  5. Click Save and restart.

Monitor a SQL endpoint

You can examine the number of queries handled by the endpoint and the number of clusters allocated to the endpoint.

  1. Click Endpoints Icon SQL Endpoints in the sidebar.

  2. Click an endpoint.

  3. Click Monitoring.

    A chart showing the number of queries handled by the endpoint and the number of clusters allocated to the endpoint over the last 6 hours displays.

    Click a timescale button at the top right of the chart to change the displayed period. For example, the following screenshot shows these statistics over 7 days:

    Monitor endpoint

    Note

    The Cluster Count can be greater than one only if multi-cluster load balancing is enabled and configured.

Cluster size

The table in this section maps SQL endpoint cluster sizes to Databricks cluster driver size and worker counts.

Cluster size Driver size Worker count
2X-Small i3.2xlarge 1
X-Small i3.2xlarge 2
Small i3.4xlarge 4
Medium i3.8xlarge 8
Large i3.8xlarge 16
X-Large i3.16xlarge 32
2X-Large i3.16xlarge 64
3X-Large i3.16xlarge 128
4X-Large i3.16xlarge 256

The instance size of all workers is i3.2xlarge.

Queueing and autoscaling

Databricks limits the number of queries on a cluster assigned to a SQL endpoint to 10. Upscaling of clusters per endpoint is based on query throughput, the rate of incoming queries, and the queue size. Databricks adds clusters based on the time it would take to process all currently running queries, all queued queries, and the incoming queries expected in the next two minutes as follows:

  • Less than 2 minutes, don’t upscale.
  • 2 to 6 minutes, add 1 cluster.
  • 6 to 12 minutes, add 2 clusters.
  • 12 to 22 minutes, add 3 clusters.

Otherwise, Databricks adds 3 clusters plus 1 cluster for every additional 15 minutes of expected query load.

In addition, an endpoint is always upscaled if a query waits for 5 minutes in the queue.

If the load is low for 15 minutes, Databricks downscales the SQL endpoint. It keeps enough clusters to handle the peak load over the last 15 minutes. For example, if the peak load was 25 concurrent queries, Databricks keeps 3 clusters.

Query queuing

Databricks queues queries when all clusters assigned to the endpoint are currently executing 10 queries or when the endpoint is in the STARTING state.

Metadata queries (for example, DESCRIBE <table>) and state modifying queries (for example SET) are never queued, unless the endpoint is in the STARTING state.

Query routing

Databricks routes queries as follows:

  • New session: to the cluster with the least load.
  • Existing session: to the cluster that ran the previous query for that session. If that cluster does not have available capacity, the query is routed to the cluster with the least load.

Enable Serverless SQL endpoints

Preview

This feature is in Public Preview. To sign up for access, see Request Access to Serverless SQL.

By default, a SQL endpoint uses compute resources in your AWS account. This is called a Classic SQL endpoint. You can also create Databricks-managed SQL endpoints, called Serverless SQL endpoints, that use compute resources in the Databricks cloud account. Serverless SQL endpoints simplify SQL endpoint management and accelerate launch times. You can use them to run Databricks SQL queries just like you do with a Classic SQL endpoint.

As of this release, Databricks supports Serverless compute in AWS regions us-east-1, us-west-2, and ap-southeast-2. See Supported Databricks regions.

You can create new Serverless SQL endpoints using the SQL endpoints UI or the SQL Endpoints API. You can convert a Classic SQL endpoint to a Serverless SQL endpoint. You can have Serverless SQL endpoints and Classic SQL endpoints running at the same time.

For a more thorough comparison of Serverless and Classic SQL endpoints, see Serverless compute.

Before you can create Serverless SQL endpoints, a workspace administrator must enable Serverless SQL Endpoints for your workspace. This section describes how.

Enable Serverless SQL endpoints for a workspace

  1. Ask your Databricks representative to enable your account for the Serverless compute preview.

    You will be asked for your workspace ID. To learn how to get your workspace ID, see Workspace instance names, URLs, and IDs.

    Important

    Do not proceed with SQL endpoint enablement until you get your confirmation email. It may take 1-3 workdays for confirmation.

  2. Enable Serverless compute for your workspace.

    1. As a Databricks workspace administrator, go to the SQL admin console in Databricks SQL.

      If you are in the Data Science & Engineering or Databricks Machine Learning workspace environment, you may need to select SQL from the sidebar. Click the icon below the Databricks logo.

      Once you are in Databricks SQL, click User Settings Icon Settings at the bottom of the sidebar and select SQL Admin Console.

      SQL Admin Console

      If you do not see the SQL Admin Console menu item, your user account is not an admin for this workspace.

    2. In the SQL admin console, click the SQL Endpoint Settings tab.

    3. Select Serverless SQL Endpoints.

      Enable Serverless Compute
    4. If this is the first time this feature is being enabled on this workspace, read and accept the Service Specific Terms that appear.

  3. If you use an AWS instance profile to connect to AWS data sources other than your workspace’s root S3 bucket, confirm that your AWS instance profile and its associated IAM role are set up correctly. See (Optional) Set up an AWS instance profile to use with your Serverless SQL endpoints.

  4. If you need to specify an AWS Glue metastore or add additional data source configurations, update the Data Access Configuration field. See Data access configuration.

    Serverless SQL endpoints support the default Databricks metastore and AWS Glue as a metastore, but do not support external Hive metastores.

(Optional) Set up an AWS instance profile to use with your Serverless SQL endpoints

If you use an AWS instance profile to connect to AWS data sources other than your workspace’s root S3 bucket, you may need to update your AWS instance profile and its associated IAM role. For more information about using instance profiles, see Secure access to S3 buckets using instance profiles. See also Configure an instance profile.

  1. As a Databricks workspace administrator, go to the SQL admin console in Databricks SQL.

    If you are in the Data Science & Engineering or Databricks Machine Learning workspace environment, you may need to select SQL from the sidebar. Click the icon below the Databricks logo.

    Once you are in Databricks SQL, click User Settings Icon Settings at the bottom of the sidebar and select SQL Admin Console.

    SQL Admin Console
  2. In the SQL admin console, click the SQL Endpoint Settings tab.

  3. Confirm that you are using an instance profile.

    If you are using an instance profile, it will be selected in the Instance Profile field. Make a note of it for the next step.

    If the field value is None, you are not using instance profiles, and you can skip this entire procedure.

  4. In your AWS account, go to the AWS console, select the IAM service, and click the Roles tab.

  5. Select the role for the instance profile that is configured in your SQL endpoint settings.

  6. In the summary area, review the Role ARN and Instance Profile ARNs fields. Confirm that the last part of those two fields have matching names after the final slash. For example:

    enable preview

    If these do not match, contact your Databricks representative.

  7. In the role editor for the role that you want to modify, go to the details page, click the Trust relationships tab, and click Edit trust relationship.

  8. Under Policy Document, add the following statement to your instance profile’s trust policy.

    Replace the sts:ExternalId array with one or more workspace IDs (as shown in the example) for all workspaces for which you will use Serverless SQL endpoints. Each workspace ID must appear in two separate elements in the sts:ExternalId array, one with the prefix db-cloud- and the other with the prefix databricks-serverless-.

    {
    "Effect": "Allow",
    "Principal": {
      "AWS": [
        "arn:aws:iam::790110701330:role/serverless-customer-resource-role"
      ]
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {
          "sts:ExternalId": [
            "databricks-serverless-<YOUR_WORKSPACE_ID1>",
            "databricks-serverless-<YOUR_WORKSPACE_ID2>"
          ]
        }
      }
      }
    

    For example:

    Trust relationship policy
  9. Click Update Trust Policy.

Important

Any time your instance profile changes, you may need to repeat this procedure.