Skip to main content

Use the Genie API to integrate Genie into your applications

Preview

This feature is in Public Preview.

This page explains how to use the Genie API to enable Genie capabilities in your own chatbot, agent, or application.

Overview

The Genie API provides two types of capabilities:

  • Conversation APIs: Enable natural language data querying in applications, chatbots, and AI agent frameworks. These APIs support stateful conversations where users can ask follow-up questions and explore data naturally over time.
  • Management APIs: Enable programmatic creation, configuration, and deployment of Genie spaces across workspaces. Use these APIs for CI/CD pipelines, version control, and automated space management.

This page describes both conversation and management APIs. Before calling the conversation APIs, prepare a well-curated Genie space. The space provides the context that Genie uses to interpret questions and generate answers. If the space is incomplete or untested, users might still receive incorrect results even with a correct API integration. This guide explains the minimum setup needed to create a space that works effectively with the Genie API.

Prerequisites

To use the Genie API, you must have:

  • Access to a Databricks workspace with the Databricks SQL entitlement.
  • At least CAN USE privileges on a SQL pro or serverless SQL warehouse.

Step 1: Configure Databricks authentication

For production use cases where a user with access to a browser is present, use OAuth for users (OAuth U2M). In situations where browser-based authentication is not possible, use a service principal to authenticate with the API. See OAuth for service principals (OAuth M2M). Service principals must have permissions to access the required data and SQL warehouses.

Step 2: Gather details

  • Workspace instance name: Find and copy your workspace instance name from your Databricks workspace URL. For details about the workspace identifiers in your URL, see Get identifiers for workspace objects.

    Example: https://cust-success.cloud.databricks.com/

  • Warehouse ID: You need the ID of a SQL warehouse that you have at least CAN USE privileges on. To find your warehouse ID:

    1. Go to SQL Warehouses in your workspace.
    2. Select the warehouse you want to use.
    3. Copy the warehouse ID from the URL or the warehouse details page.

    Alternatively, use the List warehouses endpoint GET /api/2.0/sql/warehouses to programmatically retrieve a list of all SQL warehouses that you have permissions to access. The response includes the warehouse ID.

Step 3: Create a Genie space

A well-structured Genie space has the following characteristics:

  • Uses well-annotated data: Genie relies on table metadata and column comments. Verify that your Unity Catalog data sources have clear, descriptive comments.
  • Is user tested: Test your space by asking questions you expect from end users. Use testing to create and refine example SQL queries.
  • Includes company-specific context: Add instructions, example SQL, and functions. See Add SQL examples and instructions. Aim for at least five tested example SQL queries.
  • Uses benchmarks to test accuracy: Add at least five benchmark questions based on anticipated user questions. See Use benchmarks in a Genie space.

For more information on creating a space, see Set up and manage an AI/BI Genie space and Curate an effective Genie space.

Create a Genie space programmatically using the Create Genie space API. The following example demonstrates a well-structured space that follows best practices. Replace the placeholders with your values:

POST /api/2.0/genie/spaces
Host: <DATABRICKS_INSTANCE>
Authorization: Bearer <your_authentication_token>
{
"description": "Space for analyzing sales performance and trends",
"parent_path": "/Workspace/Users/<username>",
"serialized_space": "{\"version\":1,\"config\":{\"sample_questions\":[{\"id\":\"a1b2c3d4e5f6\",\"question\":[\"What were total sales last month?\"]},{\"id\":\"b2c3d4e5f6g7\",\"question\":[\"Show top 10 customers by revenue\"]},{\"id\":\"c3d4e5f6g7h8\",\"question\":[\"Compare sales by region for Q1 vs Q2\"]},{\"id\":\"d4e5f6g7h8i9\",\"question\":[\"Which products have the highest return rate?\"]},{\"id\":\"e5f6g7h8i9j0\",\"question\":[\"Show monthly revenue trend for the past year\"]}],\"instructions\":\"This space analyzes sales data from our e-commerce platform. All monetary values are in USD. Use the orders and customers tables for transactional data.\"},\"data_sources\":{\"tables\":[{\"identifier\":\"sales.analytics.orders\"},{\"identifier\":\"sales.analytics.customers\"},{\"identifier\":\"sales.analytics.products\"}]}}",
"title": "Sales Analytics Space",
"warehouse_id": "<warehouse-id>"
}

Response:
{
"space_id": "3c409c00b54a44c79f79da06b82460e2",
"title": "Sales Analytics Space",
"description": "Space for analyzing sales performance and trends",
"warehouse_id": "<warehouse-id>",
"serialized_space": "{\n \"version\": 1,\n \"config\": {\n \"sample_questions\": [\n {\n \"id\": \"a1b2c3d4e5f600000000000000000000\",\n \"question\": [\n \"Show orders by date\"\n ]\n }\n ]\n },\n \"data_sources\": {\n \"tables\": [\n {\n \"identifier\": \"samples.tpch.orders\"\n }\n ]\n }\n}\n"
}

You can also use the Genie API to list Genie spaces or update an existing one.

Understanding the serialized_space field

The serialized_space field is a JSON string that defines the configuration and data sources for your Genie space. In the API request, this JSON must be escaped as a string. The field contains:

  • version: Schema version (currently 1)

  • config: Space configuration including:

    • sample_questions: Example questions to guide users. Each question requires:

      • id: A unique identifier for the question. You can generate any unique string (such as short alphanumeric strings or UUIDs). The system normalizes these to 32-character identifiers.
      • question: An array containing the question text.

      Include at least five diverse questions that represent common use cases.

    • instructions: Context about the data, business rules, and how to interpret results. This helps Genie provide more accurate responses.

  • data_sources: Data sources available to the space including:

    • tables: Array of table identifiers in three-level namespace format (catalog.schema.table). Include all relevant tables users will query.

The unescaped version of the example above looks like:

JSON
{
"version": 1,
"config": {
"sample_questions": [
{
"id": "a1b2c3d4e5f6",
"question": ["What were total sales last month?"]
},
{
"id": "b2c3d4e5f6g7",
"question": ["Show top 10 customers by revenue"]
},
{
"id": "c3d4e5f6g7h8",
"question": ["Compare sales by region for Q1 vs Q2"]
},
{
"id": "d4e5f6g7h8i9",
"question": ["Which products have the highest return rate?"]
},
{
"id": "e5f6g7h8i9j0",
"question": ["Show monthly revenue trend for the past year"]
}
],
"instructions": "This space analyzes sales data from our e-commerce platform. All monetary values are in USD. Use the orders and customers tables for transactional data."
},
"data_sources": {
"tables": [
{
"identifier": "sales.analytics.orders"
},
{
"identifier": "sales.analytics.customers"
},
{
"identifier": "sales.analytics.products"
}
]
}
}

When constructing your space, create this JSON structure and then escape it as a string for the API request. Start with a minimal configuration and expand as needed. For complete schema details, see the Create Genie space API reference.

Step 4: Start a conversation

The Start conversation endpoint POST /api/2.0/genie/spaces/{space_id}/start-conversation starts a new conversation in your Genie space.

Replace the placeholders with your Databricks instance, Genie space ID, and authentication token. An example of a successful response follows the request. It includes details that you can use to access this conversation again for follow-up questions.

POST /api/2.0/genie/spaces/{space_id}/start-conversation

HOST= <DATABRICKS_INSTANCE>
Authorization: <your_authentication_token>
{
"content": "<your question>",
}


Response:

{
"conversation": {
"created_timestamp": 1719769718,
"id": "6a64adad2e664ee58de08488f986af3e",
"last_updated_timestamp": 1719769718,
"space_id": "3c409c00b54a44c79f79da06b82460e2",
"title": "Give me top sales for last month",
"user_id": 12345
},
"message": {
"attachments": null,
"content": "Give me top sales for last month",
"conversation_id": "6a64adad2e664ee58de08488f986af3e",
"created_timestamp": 1719769718,
"error": null,
"id": "e1ef34712a29169db030324fd0e1df5f",
"last_updated_timestamp": 1719769718,
"query_result": null,
"space_id": "3c409c00b54a44c79f79da06b82460e2",
"status": "IN_PROGRESS",
"user_id": 12345
}
}

Step 5: Retrieve generated SQL

Use the conversation_id and message_id in the response to poll to check the message's generation status and retrieve the generated SQL from Genie. See GET /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id} for complete request and response details.

note

Only POST requests count toward the queries-per-minute throughput limit. GET requests used to poll results are not subject to this limit.

Substitute your values into the following request:

GET /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}
HOST= <DATABRICKS_INSTANCE>
Authorization: Bearer <your_authentication_token>

The following example response reports the message details:

Response:

{
"attachments": null,
"content": "Give me top sales for last month",
"conversation_id": "6a64adad2e664ee58de08488f986af3e",
"created_timestamp": 1719769718,
"error": null,
"id": "e1ef34712a29169db030324fd0e1df5f",
"last_updated_timestamp": 1719769718,
"query_result": null,
"space_id": "3c409c00b54a44c79f79da06b82460e2",
"status": "IN_PROGRESS",
"user_id": 12345
}

When the status field is COMPLETED the response is populated in the attachments array.

Step 6: Retrieve query results

The attachments array contains Genie's response. It includes the generated text response (text), the query statement if it exists (query), and an identifier that you can use to get the associated query results (attachment_id). Replace the placeholders in the following example to retrieve the generated query results:

GET /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}/query-result/{attachment_id}
Authorization: Bearer <your_authentication_token>

See GET /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}/attachments/{attachment_id}/query-result.

Step 7: Ask follow-up questions

After you receive a response, use the conversation_id to continue the conversation. Context from previous messages is retained and used in follow-up responses. For complete request and response details, see POST /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages.

POST /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages
HOST= <DATABRICKS_INSTANCE>
Authorization: <your_authentication_token>
{
"content": "Which of these customers opened and forwarded the email?",
}

Reference and retrieve historical data

The Genie API provides additional endpoints for managing existing conversations and retrieving historical data for analysis.

Reference old conversation threads

To allow users to refer to old conversation threads, use the List conversation messages endpoint GET /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages to retrieve all messages from a specific conversation thread.

Retrieve conversation data for analysis

Space managers can programmatically retrieve all previous messages asked across all users of a space for analysis. To retrieve this data:

  1. Use the GET /api/2.0/genie/spaces/{space_id}/conversations endpoint to get all existing conversation threads in a space.
  2. For each conversation ID returned, use the GET /api/2.0/genie/spaces/{space_id}/conversations endpoint to retrieve the list of messages for that conversation.

Best practices for using the Genie API

To maintain performance and reliability when using the Genie API:

  • Implement retry logic with exponential backoff: The API doesn't retry failed requests for you, so add your own queuing and exponential backoff. This helps your application handle transient failures, avoid unnecessary repeat requests, and stay within throughput limits as it grows.
  • Log API responses: Implement comprehensive logging of API requests and responses to help with debugging, monitoring usage patterns, and tracking costs.
  • Poll for status updates every 1 to 5 seconds: Continue polling until a conclusive message status, such as COMPLETED, FAILED, or CANCELLED, is received. Limit polling to 10 minutes for most queries. If there is no conclusive response after 10 minutes, stop polling and return a timeout error or prompt the user to manually check the query status later.
  • Use exponential backoff for polling: Increase the delay between polls up to a maximum of one minute. This reduces unnecessary requests for long-running queries while still allowing low latency for fast ones.
  • Start a new conversation for each session: Avoid reusing conversation threads across sessions, as this can reduce accuracy due to unintended context reuse.
  • Maintain conversation limits: To manage old conversations and stay under the 10,000 conversation limit:
    1. Use the GET /api/2.0/genie/spaces/{space_id}/conversations endpoint to see all existing conversation threads in a space.
    2. Identify conversations that are no longer needed, such as older conversations or test conversations.
    3. Use the DELETE /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id} endpoint to remove conversations programmatically.
  • Be aware of query result limit: The Genie API returns a maximum of 5,000 rows per query result. If your data analysis requires more rows, consider refining your question to focus on a specific subset of data or use filters to narrow the results.

Monitor the space

After your application is set up, you can monitor questions and responses in the Databricks UI.

Encourage users to test the space so that you learn about the types of questions they are likely to ask and the responses they receive. Provide users with guidance to help them start testing the space. Use the Monitoring tab to view questions and responses. See Monitor the space.

You can also use audit logs to monitor activity in a Genie space. See AI/BI Genie events.

Throughput limit

During the Public Preview period, the throughput rates for the Genie API free tier are best-effort and depend on system capacity. Under normal or low-traffic conditions, the API limits requests to 5 queries per minute per workspace. During peak usage periods, the system processes requests based on available capacity, which can result in lower throughput.