Use the Genie Conversation API to integrate Genie into your applications
This feature is in Public Preview.
This article explains how to use the Genie Conversation API to enable Genie capabilities in your own chatbot, agent, or application.
Overview
The Genie API enables developers to integrate natural language data querying into applications, chatbots, and AI agent frameworks. It supports stateful conversations, allowing users to ask follow-up questions and explore data more naturally over time. Using the Genie API, you can integrate natural language querying into your tools and workflows, making data more accessible across the organization.
Before using the API, you must prepare a well-curated Genie space. The space defines the context that Genie uses to interpret questions and return results. If the space is incomplete or untested, users may receive a high rate of incorrect answers, even if the API integration itself is successful. This guide outlines the minimum setup required to create a space that works effectively with the Genie API.
Prerequisites
To use the Genie conversation API, you must have:
- Access to a Databricks workspace with the SQL entitlement.
- At least CAN USE privileges on a SQL pro or serverless SQL warehouse.
- Familiarity with the Databricks REST API reference.
Step 1: Create and test a Genie space
Prepare a Genie space that reliably answers user questions with a high degree of accuracy.
Even if you plan to query the Genie space using the API, you must set up and refine the space using the Databricks UI.
Use the following resources to help you configure and curate your Genie space:
- Set up a Genie space: Learn how to create a Genie space using the Databricks UI. This article includes step-by-step guidance for using UI tools to create a working Genie space.
- Review best practices: Learn best practices for curating a new Genie space. This article offers recommendations for how to approach new Genie space creation and how to refine and iterate on a space through testing and feedback.
- Set benchmarks: Prepare benchmark test questions that you can run to measure Genie's response accuracy.
A well-structured Genie space has the following characteristics:
- Uses well-annotated data: Genie relies on table metadata and column comments to generate responses. The Unity Catalog data sources for your Genie space should include clear and descriptive comments.
- Is user tested: You should be your space's first user. Test your space by asking questions you expect from end users. Use your testing to create and refine example SQL queries.
- Includes company-specific context: You need to include context to teach Genie about your company's data and jargon. See Add context to learn how to add instructions, example SQL, and functions for processing common questions. Aim to include at least 5 example SQL queries that have undergone testing and refinement.
- Uses benchmarks to test accuracy: Add at least 5 benchmark questions based on questions you anticipate from end users. Run benchmark tests to verify that Genie is answering those questions accurately.
When you are satisfied with the responses in your Genie space and have tested it with representative questions, you can begin integrating it with your application.
Step 2: Configure Databricks authentication
For production use cases where a user with access to a browser is present, use OAuth for users (OAuth U2M). In situations where browser-based authentication is not possible, you can use a service principal to authenticate with the API. See OAuth for service principals (OAuth M2M). Service principals must have permissions to access the required data and SQL warehouses.
Step 3: Gather details
Use the Genie space URL to find your workspace instance name and the Genie space ID. The following example demonstrates how to locate these components in the URL. For details about the workspace identifiers in your URL, see Get identifiers for workspace objects.
https://<databricks-instance>/genie/rooms/<space-id>
Copy the <databricks-instance>
and the <space-id>
for your Genie space.
Step 4: Start a conversation
The Start conversation endpoint POST /api/2.0/genie/spaces/{space_id}/start-conversation
starts a new conversation in your Genie space.
Replace the placeholders with your Databricks instance, Genie space ID, and authentication token. An example of a successful response follows the request. It includes details that you can use to access this conversation again for follow-up questions.
POST /api/2.0/genie/spaces/{space_id}/start-conversation
HOST= <DATABRICKS_INSTANCE>
Authorization: <your_authentication_token>
{
"content": "<your question>",
}
Response:
{
"conversation": {
"created_timestamp": 1719769718,
"id": "6a64adad2e664ee58de08488f986af3e",
"last_updated_timestamp": 1719769718,
"space_id": "3c409c00b54a44c79f79da06b82460e2",
"title": "Give me top sales for last month",
"user_id": 12345
},
"message": {
"attachments": null,
"content": "Give me top sales for last month",
"conversation_id": "6a64adad2e664ee58de08488f986af3e",
"created_timestamp": 1719769718,
"error": null,
"id": "e1ef34712a29169db030324fd0e1df5f",
"last_updated_timestamp": 1719769718,
"query_result": null,
"space_id": "3c409c00b54a44c79f79da06b82460e2",
"status": "IN_PROGRESS",
"user_id": 12345
}
}
Step 5: Retrieve generated SQL
Use the conversation_id
and message_id
in the response to poll to check the message's generation status and retrieve the generated SQL from Genie. See GET /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}
for complete request and response details.
Only POST
requests count toward the queries-per-minute throughput limit. GET
requests used to poll results are not subject to this limit.
Substitute your values into the following request:
GET /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}
HOST= <DATABRICKS_INSTANCE>
Authorization: Bearer <your_authentication_token>
The following example response reports the message details:
Response:
{
"attachments": null,
"content": "Give me top sales for last month",
"conversation_id": "6a64adad2e664ee58de08488f986af3e",
"created_timestamp": 1719769718,
"error": null,
"id": "e1ef34712a29169db030324fd0e1df5f",
"last_updated_timestamp": 1719769718,
"query_result": null,
"space_id": "3c409c00b54a44c79f79da06b82460e2",
"status": "IN_PROGRESS",
"user_id": 12345
}
When the status
field is COMPLETED
the response is populated in the attachments
array.
Step 6: Retrieve query results
The attachments
array contains Genie's response. It includes the generated text response (text
), the query statement if it exists (query
), and an identifier that you can use to get the associated query results (attachment_id
). Replace the placeholders in the following example to retrieve the generated query results:
GET /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages/{message_id}/query-result/{attachment_id}
Authorization: Bearer <your_authentication_token>
Step 7: Ask follow-up questions
After you receive a response, use the conversation_id
to continue the conversation. Context from previous messages is retained and used in follow-up responses. For complete request and response details, see POST /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages
.
POST /api/2.0/genie/spaces/{space_id}/conversations/{conversation_id}/messages
HOST= <DATABRICKS_INSTANCE>
Authorization: <your_authentication_token>
{
"content": "Which of these customers opened and forwarded the email?",
}
Best practices for using the Conversation API
To maintain performance and reliability when using the Genie Conversation API:
- Implement request queuing and backoff: The API does not manage request retries. Use your own queuing system and implement incremental backoff to avoid exceeding throughput limits.
- Poll for status updates every 5 to 10 seconds: Continue polling until a conclusive message status, such as
COMPLETED
,FAILED
, orCANCELLED
, is received. Limit polling to 10 minutes for most queries. If there is no conclusive response after 10 minutes, stop polling and return a timeout error or prompt the user to manually check the query status later. - Use exponential backoff after 2 minutes: If no response is received within 2 minutes, apply exponential backoff to improve reliability.
- Start a new conversation for each session: Avoid reusing conversation threads across sessions, as this can reduce accuracy due to unintended context reuse.
Monitor the space
After your application is set up, you can monitor questions and responses in the Databricks UI.
Encourage users to test the space so that you learn about the types of questions they are likely to ask and the responses they receive. Provide users with guidance to help them start testing the space. Use the Monitoring tab to view questions and responses. See Monitor the space.
You can also use audit logs to monitor activity in a Genie space. See AI/BI Genie events.
Throughput limit
During the Public Preview period, the throughput rates for the Conversations API are best-effort and depend on system capacity. Under normal or low-traffic conditions, requests are limited to 5 queries per minute per workspace. During peak usage periods, actual throughput can be lower as requests are processed based on available capacity.