Skip to main content

Build a knowledge store for more reliable Genie spaces

The Genie knowledge store allows you to curate and enhance your space through localized metadata, value sampling, and structured SQL instructions. These features help Genie understand your data and generate more accurate, relevant responses.

What is a knowledge store?

A knowledge store is a collection of curated semantic definitions that enhances Genie's understanding of your data and improves response accuracy.

The knowledge store consists of:

  • Space-level metadata customization: Space-specific descriptions for tables, columns, and business terms and synonyms
  • Space-level data customization: Simplified, focused datasets without changing the underlying Unity Catalog tables
  • Value sampling: Real data examples that help Genie understand data types and match user prompts to actual values
  • Join relationships: Defined table relationships for accurate JOIN statements
  • SQL expressions: Structured definitions of measures, filters, and dimensions that capture business logic

All knowledge store configurations are scoped to your Genie space and do not affect Unity Catalog metadata or other Databricks assets.

Manage knowledge store metadata

Teach Genie about the data in your space by providing local table and column descriptions and adding column synonyms that align with common business terms. Simplify datasets by hiding unnecessary or duplicate columns to keep Genie focused.

These practices improve usability for users who do not have direct permissions on the underlying tables, and they also support quicker iterations when updating instruction versions.

To access space-level metadata, click Configure > Data in your Genie space. Then click a table name to view its metadata and columns.

View columns

Click a table name to see an overview of the column names and details. The following example shows a sample from a table named accounts.

Table overview showing the metadata description and column details as described below.

  • Description: Genie uses metadata to understand your data and generate accurate responses. The default table description shows the Unity Catalog metadata associated with your data asset. You can edit this description to add specific directions that help Genie author SQL for your space. Click Reset to restore the Unity Catalog description.

  • Columns: Column names and descriptions are included in the column list. Each column is labeled with tags that show whether it includes Example values or a Value dictionary. See Value sampling overview.

Hide or show relevant columns

Columns can be managed individually or in bulk. Use the following instructions to hide or show columns.

  • Hide a single column: Click the Eye icon next to the column name.
  • Hide multiple columns:
    • Select the checkboxes for the columns you want to hide.
    • From the Actions menu, select Hide selected columns.
  • Undo changes: Repeat the same steps to show a column that was hidden.

Edit column metadata

You can customize the following for each column:

  • Description: Space-specific column descriptions that enhance Genie's understanding.
  • Synonyms: Business terms and keywords that help match user language to column names.
  • Advanced settings: Value sampling controls.
    • Example values: Turn automatic sampling of representative values on or off.
    • Build value dictionary: Enable or disable value dictionaries for categorical columns.

To edit column metadata:

  1. Click the pencil icon Pencil icon. next to a column name.
  2. Edit the description and synonyms for the column.
  3. If necessary, click Advanced settings to open value sampling controls.
  4. Click Save to keep your changes and close the dialog.

Value sampling overview

Value sampling enhances Genie's ability to understand and work with your actual data by collecting representative examples.

Value sampling improves Genie's SQL generation by providing access to real data values. When users ask conversational questions with misspellings or different terminology, value sampling helps Genie match prompts to actual data values in your tables.

Value sampling components

  • Example values: Small samples from each column that help Genie understand data types and formatting. These are collected automatically for all eligible columns.
  • Value dictionaries: Curated lists of up to 1,024 distinct values per column (less than 127 characters each). Created for up to 120 columns that contain categorical or consistently formatted string values such as states, product categories, or status codes.

Tables with row filters or column masks are excluded from value sampling.

Manage value sampling

Control which columns provide example values and value dictionaries to optimize Genie's understanding of your data. Value sampling is enabled by default for all Genie spaces.

Manage example values

Example values are automatically added when you add tables to a Genie space.

To turn off example values for a column:

  1. Click Configure > Data in your Genie space.
  2. Click a table name to view its columns.
  3. Click the Pencil icon. edit icon next to the column name.
  4. Click Advanced.
  5. Turn Example values off.

This action automatically disables building a value dictionary for that column. If necessary, use this setting to turn Example values back on.

Configure value dictionaries

Genie automatically selects columns for value sampling when you add data to a space. You can manually manage which columns have value dictionaries enabled. Choose string columns with categorical or structured values for the best results. Avoid free-text columns like user IDs, names, or user reviews.

The following list includes examples of the types of data that work well with value dictionaries:

  • State or country codes
  • Product categories
  • Status codes
  • Department names

To enable a value dictionary:

  1. Click the Pencil icon. edit icon next to the column name.
  2. Click Advanced.
  3. Turn Build value dictionary on.

A string column with the value dictionary button on the right.

Refresh sample values

Refreshing sample values polls your data again and collects new values for example values and value dictionaries.

You should refresh sample values in the following cases:

  • New values have been added to the column
  • The format of existing values has changed

To update stored values:

  1. Click the Kebab menu icon. kebab menu in the column view
  2. Select Refresh sample values

Refresh values or remove values options in the UI

Define join relationships

Help Genie create accurate JOIN statements by defining table relationships:

  1. Click Joins.
  2. Click Add.
  3. Select left and right tables from the drop-down menus.
  4. Enter a Join condition (for example, accounts.id = opportunity.accountid)
    • (Optional) For more complicated join conditions, use a SQL expression. Click Use SQL expression, and then record the join condition as a SQL expression.
  5. Select a Relationship Type:
    • Many to one: Multiple left rows map to one right row
    • One to many: One left row maps to multiple right rows
    • One to one: One left row maps to at most one right row

Join instructions showing one identified join relationship

note

When multiple joins exist between the same tables or self-joins are used, Genie automatically generates aliases for the right-hand table to avoid ambiguity.

Learn from feedback

When users click the thumbs up on a message, Genie attempts to learn if there is a new join relationship that it should remember. This feedback helps Genie improve its understanding of your data relationships over time and generate more accurate queries in future conversations.

Define SQL expressions

SQL expressions interface showing measures, filters, and dimensions

SQL expressions provide a structured, guided way to teach Genie about common business terms such as KPIs, attributes, and conditions. Genie can then use each of these granular definitions when a user asks about them.

SQL expressions complement example SQL queries, specified in instructions. While SQL expressions define reusable business concepts, example SQL queries are more helpful for teaching Genie how to approach common user prompt formats. For example, if users commonly ask for "a breakdown of performance", an example SQL query can show that this means closed sales by region, sales rep, and manager.

SQL expressions work best when you need to:

  • Provide structured definitions for KPIs and metrics, such as profit margin or conversion rate
  • Give Genie explicit context about how to calculate important values
  • Define additional dimensions for the dataset, such as month or customer segment
  • Teach Genie filters for business conditions, such as large orders or orders before a specific time

SQL expression types

You can define the following types of SQL expressions:

  • Measures: Key performance indicators (KPIs) and metrics. Define the name, SQL calculation, and synonyms.
  • Filters: Common filtering conditions. Define the name, SQL filter logic, and synonyms.
  • Dimensions: Attributes for grouping and analyzing data. Define the name, SQL expression, and synonyms.

Use the following instructions to define SQL expressions:

  1. Click Configure > Instructions > SQL Expressions
  2. Click Add. Choose Filter, Measure, or Dimension.
  3. In the Name field, enter a name for the expression.
  4. In the Code field, enter the SQL expression.
  • Filter expressions should evaluate to a boolean condition.
  • Measure expressions should calculate an aggregation over multiple rows in the table.
  • Dimension expressions should alter the value of each row from the existing data.
  1. In the Synonyms field, enter common ways that users might refer to the expressions colloquially.
  2. In the Instructions field, enter specific instructions that tell Genie what the expression is for and how to work with it.

Next steps

Use the following links to help you continue to build your Genie space.