Build a knowledge store for more reliable Genie spaces
The Genie knowledge store allows you to curate and enhance your space through localized metadata, value sampling, and structured SQL instructions. These features help Genie understand your data and generate more accurate, relevant responses.
What is a knowledge store?
A knowledge store is a collection of curated semantic definitions that enhances Genie's understanding of your data and improves response accuracy.
The knowledge store consists of:
- Space-level metadata customization: Space-specific descriptions for tables, columns, and business terms and synonyms
- Space-level data customization: Simplified, focused datasets without changing the underlying Unity Catalog tables
- Value sampling: Real data examples that help Genie understand data types and match user prompts to actual values
- Join relationships: Defined table relationships for accurate
JOIN
statements
All knowledge store configurations are scoped to your Genie space and do not affect Unity Catalog metadata or other Databricks assets.
Manage knowledge store metadata
Teach Genie about the data in your space by providing local table and column descriptions and adding column synonyms that align with common business terms. Simplify datasets by hiding unnecessary or duplicate columns to keep Genie focused.
These practices improve usability for users who do not have direct permissions on the underlying tables, and they also support quicker iterations when updating instruction versions.
To access space-level metadata, click Configure > Data in your Genie space. Then click a table name to view its metadata and columns.
View columns
Click a table name to see an overview of the column names and details. The following example shows a sample from a table named accounts
.
-
Description: Genie uses metadata to understand your data and generate accurate responses. The default table description shows the Unity Catalog metadata associated with your data asset. You can edit this description to add specific directions that help Genie author SQL for your space. Click Reset to restore the Unity Catalog description.
-
Columns: Column names and descriptions are included in the column list. Each column is labeled with tags that show whether it includes Example values or a Value dictionary. See Value sampling overview.
Hide or show relevant columns
Columns can be managed individually or in bulk. Use the following instructions to hide or show columns.
- Hide a single column: Click the
next to the column name.
- Hide multiple columns:
- Select the checkboxes for the columns you want to hide.
- From the Actions menu, select Hide selected columns.
- Undo changes: Repeat the same steps to show a column that was hidden.
Edit column metadata
You can customize the following for each column:
- Description: Space-specific column descriptions that enhance Genie's understanding.
- Synonyms: Business terms and keywords that help match user language to column names.
- Advanced settings: Value sampling controls.
- Example values: Turn automatic sampling of representative values on or off.
- Build value dictionary: Enable or disable value dictionaries for categorical columns.
To edit column metadata:
- Click the pencil icon
next to a column name.
- Edit the description and synonyms for the column.
- If necessary, click Advanced settings to open value sampling controls.
- Click Save to keep your changes and close the dialog.
Value sampling overview
Value sampling enhances Genie's ability to understand and work with your actual data by collecting representative examples.
Value sampling improves Genie's SQL generation by providing access to real data values. When users ask conversational questions with misspellings or different terminology, value sampling helps Genie match prompts to actual data values in your tables.
Value sampling components
- Example values: Small samples from each column that help Genie understand data types and formatting. These are collected automatically for all eligible columns.
- Value dictionaries: Curated lists of up to 1,024 distinct values per column (less than 127 characters each). Created for up to 60 columns that contain categorical or consistently formatted string values such as states, product categories, or status codes.
Tables with row filters or column masks are excluded from value sampling.
Manage value sampling
Control which columns provide example values and value dictionaries to optimize Genie's understanding of your data. Value sampling is enabled by default for all Genie spaces.
Manage example values
Example values are automatically added when you add tabels to a Genie space.
To turn off example values for a column:
- Click Configure > Data in your Genie space.
- Click a table name to view its columns.
- Click the
edit icon next to the column name.
- Click Advanced.
- Turn Example values off.
This action automatically disables building a value dictionary for that column. If necessary, use this setting to turn Example values back on.
Configure value dictionaries
Genie automatically selects columns for value sampling when you add data to a space. You can manually manage which columns have value dictionaries enabled. Choose string columns with categorical or structured values for the best results. Avoid free-text columns like user IDs, names, or user reviews.
The following list includes examples of the types of data that work well with value dictionaries:
- State or country codes
- Product categories
- Status codes
- Department names
To enable a value dictionary:
- Click the
edit icon next to the column name.
- Click Advanced.
- Turn Build value dictionary on.
Refresh sample values
Refreshing sample values polls your data again and collects new values for example values and value dictionaries.
You should refresh sample values in the following cases:
- New values have been added to the column
- The format of existing values has changed
To update stored values:
- Click the
kebab menu in the column view
- Select Refresh sample values
Edit knowledge store instructions
In the table details view, click Configure > Instructions to add and edit knowledge store instructions.
Define join relationships
Help Genie create accurate JOIN
statements by defining table relationships:
- Click Joins.
- Click Add.
- Select left and right tables from the drop-down menus.
- Enter a Join condition (for example,
accounts.id = opportunity.accountid
)- (Optional) For more complicated join conditions, use a SQL expression. Click Use SQL expression, and then record the join condition as a SQL expression.
- Select a Relationship Type:
- Many to one: Multiple left rows map to one right row
- One to many: One left row maps to multiple right rows
- One to one: One left row maps to at most one right row
When multiple joins exist between the same tables or self-joins are used, Genie automatically generates aliases for the right-hand table to avoid ambiguity.
Next steps
Use the following links to help you continue to build your Genie space.
- Add context to your Genie space to help generate accurate responses. See Add SQL examples and instructions
- Learn best practices for optimizing your Genie space. See Curate an effective Genie space
- Evaluate and improve your space's performance. See Use benchmarks in a Genie space