Troubleshoot Genie spaces
This page outlines how to resolve common problems when creating and maintaining Genie spaces.
Misunderstood business jargon
Most companies or domains have specific shorthand they use to communicate about business-specific events. For example, when referring to a year, it might always mean the fiscal year, and this fiscal year might start in February or March instead of January. To enable Genie to answer these questions naturally and accurately, include instructions that explicitly map your business jargon to words and concepts Genie can understand. See Provide instructions.
Incorrect table or column usage
If Genie is attempting to pull data from an incorrect table or run analysis on incorrect columns, you might adjust the data in one of the following ways:
- Provide clear and precise descriptions: Check your tables and associated metadata to verify that the terminology used there matches the users' terminology in submitted questions. If it does not, refine the description or add an instruction that maps the terminology used in the table to the terminology used in the question.
- Add example queries: Provide sample SQL queries that Genie can use to learn how to respond to certain questions. See Provide instructions.
- Remove tables or columns from the space: Some tables might include overlapping columns or concepts that make it difficult for Genie to know which data to use in a response. If possible, remove unnecessary or overlapping tables or columns. To hide columns from the Genie space UI without changing the underlying data objects, see Hide or show relevant columns.
Filtering errors
Generated queries often include a WHERE clause to filter results according to a specific value. When Genie doesn't have visibility into the data values, it might set the WHERE clause to filter for the wrong value. For example, it might try to match the name "California" when the table uses abbreviations like "CA."
For situations like this, verify that relevant columns have Example values and Value dictionaries enabled. If new data has been added to relevant tables, refresh the values. See Build a knowledge store for more reliable Genie spaces.
Incorrect joins
If foreign key references are not defined in Unity Catalog, your space might not know how to join different tables together.
Try implementing one or more of the following solutions:
- Define foreign key references in your Unity Catalog when possible. See CONSTRAINT clause.
- If your tables' foreign key relationships are not specified in Unity Catalog, define join relationships in your Genie space's knowledge store. This strategy is helpful for more complex join scenarios like self-joins, or if you don't have sufficient permission to modify the underlying tables. See Define join relationships.
- Provide example queries where you join tables together in standard ways.
If none of these resolve the problem, pre-join the table into a view and use that as input for the space instead.
Column comments not syncing from foreign tables
Databricks does not manage the metadata, data, or semantics for writes to foreign tables. Depending on the source table, comments might not be accessible from Databricks. To make comments available, Databricks recommends doing one of the following:
- Edit column metadata in the Genie space UI. Edited metadata applies only to the Genie space where it is written. See Edit column metadata.
- Create an materialized views on top of federated tables. You can add and edit comments on a materialized view as you would on a managed table. You can reuse this view across multiple Genie spaces. For details about loading data from foreign tables to a materialized view, see Load data from foreign tables with materialized views. To learn more about working with materialized views, see Materialized views.
Metric calculation issues
The way that metrics are computed and rolled up can be arbitrarily complicated and encompass many business details that your space doesn't understand. This can lead to incorrect reporting.
Try implementing one or more of the following solutions:
- If your metrics are aggregated from base tables, provide example SQL queries computing each roll-up value.
- If your metrics have been pre-computed and are sitting in aggregated tables, explain this in table comments. Specify valid aggregations for each metric if the metrics in that table can be further rolled up.
- If the SQL you're trying to generate is very complicated, try creating views that have already aggregated your metrics for your space.
Incorrect time-based calculations
Genie might not always be able to infer the timezone represented in the data or the timezone in which your analysis needs to be performed unless you explicitly provide additional guidance.
Include more explicit instructions detailing the original source timezone, the conversion function, and the target timezone. The following examples show how to alter the general instructions for more reliable timezone conversions:
- Always convert times to a specific timezone: In this example, assume that the source timestamp is
UTCand you want results in theAmerica/Los_Angelestimezone. Add the following to the instructions, replacing<timezone-column>with the appropriate column name:- Time zones in the tables are in
UTC. - Convert all timezones using the following function:
convert_timezone('UTC', 'America/Los_Angeles', <timezone-column>).
- Time zones in the tables are in
- Convert non-UTC datetime formats to UTC: If the workspace default timezone is
UTCbut users in Los Angeles must reference today for a specific set of records, add the following to the space's general instructions:- To reference today, use
date(convert_timezone('UTC', 'America/Los_Angeles', current_timestamp())).
- To reference today, use
See convert_timezone function for more details and syntax.
Ignoring instructions
Even if you have explained your tables and columns in comments and provided general instructions, your space might still not be using them correctly.
Try one or more of the following strategies:
- Provide example queries that use your tables correctly. Example queries are especially effective for teaching your space how to use your data.
- Hide irrelevant columns in the Genie space. See Hide or show relevant columns.
- Create views from your tables that provide a simpler view of your data.
- Review your instructions and try to focus the space by removing irrelevant tables or instructions.
- Try starting a new chat. Previous interactions might influence Genie's responses in any given chat, but starting a new chat gives you a blank starting point for testing new instructions.
Performance issues
When Genie needs to generate exceptionally long queries or text responses, it can take a long time to respond or even time out during the thinking phase.
Try one or more of the following actions to improve performance:
- Use trusted assets or views to encapsulate complex queries. See Use trusted assets in AI/BI Genie spaces.
- Reduce the length of your example SQL queries whenever possible.
- Start a new chat if Genie starts to generate slow or failing responses.
Unreliable responses to mission-critical questions
Use trusted assets to provide verified answers to specific questions that you expect users to ask. See Use trusted assets in AI/BI Genie spaces.
Token limit warning
Tokens are the basic units of text that Genie uses to process and understand language. Text instructions and metadata in a Genie space are converted into tokens. If your space approaches the token limit, a warning appears. Genie uses context filtering to prioritize the tokens it considers most relevant to a question. While responses should still be generated when a warning appears, quality may be reduced if important context is filtered out. When the token limit is exceeded, you can no longer send or receive messages in the Genie space.
Consider the following practices to reduce the token count:
- Remove unnecessary columns: Unnecessary columns in your tables can significantly contribute to token usage. When possible, create views to exclude redundant or non-essential fields from your raw tables. You can also hide unneeded columns in a Genie space. See Hide or show relevant columns.
- Streamline column descriptions: While column descriptions are important, avoid duplicating information already conveyed by column names. For example, if a column is named
account_name, a description like "the name of your account" might be redundant and can be omitted. - Edit column metadata in the Genie space: See Edit column metadata to learn how to edit descriptions and provide synonyms in column metadata.
- Prune example SQL queries: Include a diverse range of example SQL queries to cover various types of questions, but remove overlapping or redundant examples.
- Simplify instructions: Verify that your instructions are clear and concise. Avoid unnecessary words.
Your account is not enabled for cross-Geo processing
Genie is a Designated Service managed by Databricks. Designated Services use Databricks Geos to manage data residency. Data cannot be processed in the same Geo as the workspace for some regions. If your workspace is in one of those regions, cross-Geo processing must be enabled by your account administrator.
Reaching throughput limits
When accessing Genie spaces through the Databricks UI, throughput is limited to 20 questions per minute per workspace, across all Genie spaces.
When accessing Genie spaces using the Conversation API's free tier (Public Preview), throughput is limited to a best effort five questions per minute per workspace, across all Genie spaces. See Use the Genie API to integrate Genie into your applications.