Create custom AI agent tools with Unity Catalog functions

Use Unity Catalog functions to create AI agent tools that execute custom logic and perform specific tasks that extend the capabilities of LLMs beyond language generation.

Requirements

A serverless compute connection to create Unity Catalog functions written using SQL body create function statements. Python functions do not require serverless compute.
Use Databricks Runtime 15.0 and above.

Create an agent tool

In this example, you create a Unity Catalog tool, test its functionality, and add it to an agent. Run the following code in a Databricks notebook.

Install dependencies

Install Unity Catalog AI packages with the [databricks] extra, and install the Databricks-LangChain integration package.

This example uses LangChain, but a similar approach can be applied to other libraries. See Integrate Unity Catalog tools with third party generative AI frameworks.

Python
# Install Unity Catalog AI integration packages with the Databricks extra
%pip install unitycatalog-ai[databricks]
%pip install unitycatalog-langchain[databricks]

# Install the Databricks LangChain integration package
%pip install databricks-langchain

dbutils.library.restartPython()

Initialize the Databricks Function Client

Initialize the Databricks Function Client, which is a specialized interface for creating, managing, and running Unity Catalog functions in Databricks.

Python
from unitycatalog.ai.core.databricks import DatabricksFunctionClient

client = DatabricksFunctionClient()

Define the tool's logic

Unity Catalog tools are really just Unity Catalog user-defined functions (UDFs) under the hood. When you define a Unity Catalog tool, you’re registering a function in Unity Catalog. To learn more about Unity Catalog UDFs, see User-defined functions (UDFs) in Unity Catalog.

You can create Unity Catalog functions using one of two APIs:

create_python_function accepts a Python callable.
create_function accepts a SQL body create function statement. See Create Python functions.

Use the create_python_function API to create the function.

To make a Python callable recognizable to the Unity Catalog functions data model, your function must meet the following requirements:

Type hints: The function signature must define valid Python type hints. Both the named arguments and the return value must have their types defined.
Do not use variable arguments: Variable arguments such as *args and **kwargs are not supported. All arguments must be explicitly defined.
Type compatibility: Not all Python types are supported in SQL. See Spark Supported Data Types.
Descriptive docstrings: The Unity Catalog functions toolkit reads, parses, and extracts important information from your docstring.
- Docstrings must be formatted according to the Google docstring syntax.
- Write clear descriptions for your function and its arguments to help the LLM understand how and when to use the function.
Dependency imports: Libraries must be imported within the function's body. Imports outside the function will not be resolved when running the tool.

The following code snippets uses the create_python_function to register the Python callable add_numbers:

Python

CATALOG = "my_catalog"
SCHEMA = "my_schema"

def add_numbers(number_1: float, number_2: float) -> float:
  """
  A function that accepts two floating point numbers adds them,
  and returns the resulting sum as a float.

  Args:
    number_1 (float): The first of the two numbers to add.
    number_2 (float): The second of the two numbers to add.

  Returns:
    float: The sum of the two input numbers.
  """
  return number_1 + number_2

function_info = client.create_python_function(
  func=add_numbers,
  catalog=CATALOG,
  schema=SCHEMA,
  replace=True
)

Test the function

Test your function to check it works as expected. Specify a fully qualified function name in the execute_function API to run the function:

Python
result = client.execute_function(
  function_name=f"{CATALOG}.{SCHEMA}.add_numbers",
  parameters={"number_1": 36939.0, "number_2": 8922.4}
)

result.value # OUTPUT: '45861.4'

Wrap the function using the UCFunctionToolKit

Wrap the function using the UCFunctionToolkit to make it accessible to agent authoring libraries. The toolkit ensures consistency across different gen AI libraries and adds helpful features like auto-tracing for retrievers.

Python
from databricks_langchain import UCFunctionToolkit

# Create a toolkit with the Unity Catalog function
func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
toolkit = UCFunctionToolkit(function_names=[func_name])

tools = toolkit.tools

Use the tool in an agent

Add the tool to a LangChain agent using the tools property from UCFunctionToolkit.

This example authors a simple agent using LangChain AgentExecutor API for simplicity. For production workloads, use the agent authoring workflow seen in ChatAgent examples.

Python
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.prompts import ChatPromptTemplate
from databricks_langchain import (
  ChatDatabricks,
  UCFunctionToolkit,
)
import mlflow

# Initialize the LLM (optional: replace with your LLM of choice)
LLM_ENDPOINT_NAME = "databricks-meta-llama-3-3-70b-instruct"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME, temperature=0.1)

# Define the prompt
prompt = ChatPromptTemplate.from_messages(
  [
    (
      "system",
      "You are a helpful assistant. Make sure to use tools for additional functionality.",
    ),
    ("placeholder", "{chat_history}"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
  ]
)

# Enable automatic tracing
mlflow.langchain.autolog()

# Define the agent, specifying the tools from the toolkit above
agent = create_tool_calling_agent(llm, tools, prompt)

# Create the agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What is 36939.0 + 8922.4?"})

Improve tool-calling with clear documentation

Good documentation helps your agents know when and how to use each tool. Follow these best practices for documenting your tools:

For Unity Catalog functions, use the COMMENT clause to describe tool functionality and parameters.
Clearly define expected inputs and outputs.
Write meaningful descriptions to make tools easier for agents, and humans, to use.

Example: Effective tool documentation

The following example shows clear COMMENT strings for a tool that queries a structured table.

SQL
CREATE OR REPLACE FUNCTION main.default.lookup_customer_info(
  customer_name STRING COMMENT 'Name of the customer whose info to look up.'
)
RETURNS STRING
COMMENT 'Returns metadata about a specific customer including their email and ID.'
RETURN SELECT CONCAT(
    'Customer ID: ', customer_id, ', ',
    'Customer Email: ', customer_email
  )
  FROM main.default.customer_data
  WHERE customer_name = customer_name
  LIMIT 1;

Example: Ineffective tool documentation

The following example lacks important details, making it harder for agents to use the tool effectively:

SQL
CREATE OR REPLACE FUNCTION main.default.lookup_customer_info(
  customer_name STRING COMMENT 'Name of the customer.'
)
RETURNS STRING
COMMENT 'Returns info about a customer.'
RETURN SELECT CONCAT(
    'Customer ID: ', customer_id, ', ',
    'Customer Email: ', customer_email
  )
  FROM main.default.customer_data
  WHERE customer_name = customer_name
  LIMIT 1;

Running functions using serverless or local mode

When a gen AI service determines a tool call is needed, integration packages (UCFunctionToolkit instances) run the DatabricksFunctionClient.execute_function API.

The execute_function call can run functions in two execution modes: serverless or local. This mode determines which resource runs the function.

Serverless mode for production

Serverless mode is the default and recommended option for production use cases. It runs functions remotely using a SQL Serverless endpoint, ensuring that your agent's process remains secure and free from the risks of running arbitrary code locally.

Python
# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="serverless")

When your agent requests a tool execution in serverless mode, the following happens:

The DatabricksFunctionClient sends a request to Unity Catalog to retrieve the function definition if the definition has not been locally cached.
The DatabricksFunctionClient extracts the function definition and validates the parameter names and types.
The DatabricksFunctionClient submits the execution as a UDF to a serverless instance.

Local mode for development

Local mode is designed for development and debugging. It executes functions in a local subprocess instead of making requests to a SQL Serverless endpoint. This allows you to troubleshoot tool calls more effectively by providing local stack traces.

When your agent requests running a tool in local mode, the DatabricksFunctionClient does the following:

Sends a request to Unity Catalog to retrieve the function definition if the definition has not been locally cached.
Extracts the Python callable definition, caches the callable locally, and validates the parameter names and types.
Invokes the callable with the specified parameters in a restricted subprocess with timeout protection.

Python
# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="local")

Running in "local" mode provides the following features:

CPU time limit: Restricts the total CPU runtime for callable execution to prevent excessive computational loads.

The CPU time limit is based on actual CPU usage, not wall-clock time. Due to system scheduling and concurrent processes, CPU time can exceed wall-clock time in real-world scenarios.
Memory limit: Restricts the virtual memory allocated to the process.
Timeout protection: Enforces a total wall-clock timeout for running functions.

Customize these limits using environment variables (read further).

Environment variables

Configure how functions run in the DatabricksFunctionClient using the following environment variables:

Environment variable	Default value	Description
`EXECUTOR_MAX_CPU_TIME_LIMIT`	`10` seconds	Maximum allowable CPU execution time (local mode only).
`EXECUTOR_MAX_MEMORY_LIMIT`	`100` MB	Maximum allowable virtual memory allocation for the process (local mode only).
`EXECUTOR_TIMEOUT`	`20` seconds	Maximum total wall clock time (local mode only).
`UCAI_DATABRICKS_SESSION_RETRY_MAX_ATTEMPTS`	`5`	The Maximum number of attempts to retry refreshing the session client in case of token expiry.
`UCAI_DATABRICKS_SERVERLESS_EXECUTION_RESULT_ROW_LIMIT`	`100`	The Maximum number of rows to return when running functions using serverless compute and `databricks-connect`.

Next steps

Add Unity Catalog tools to agents programmatically. See ChatAgent examples.
Add Unity Catalog tools to agents using the AI Playground UI. See Prototype tool-calling agents in AI Playground.
Manage Unity Catalog functions using the Function Client. See Unity Catalog documentation - Function client

Requirements​

Create an agent tool​

Install dependencies​

Initialize the Databricks Function Client​

Define the tool's logic​

Test the function​

Wrap the function using the UCFunctionToolKit​

Use the tool in an agent​

Improve tool-calling with clear documentation​

Example: Effective tool documentation​

Example: Ineffective tool documentation​

Running functions using serverless or local mode​

Serverless mode for production​

Local mode for development​

Environment variables​

Next steps​