Create custom AI agent tools with Unity Catalog functions
Use Unity Catalog functions to create AI agent tools that execute custom logic and perform specific tasks that extend the capabilities of LLMs beyond language generation.
Requirements
- A serverless compute connection to create Unity Catalog functions written using SQL body create function statements. Python functions do not require serverless compute.
- Use Databricks Runtime 15.0 and above.
Create an agent tool
In this example, you create a Unity Catalog tool, test its functionality, and add it to an agent. Run the following code in a Databricks notebook.
Install dependencies
Install Unity Catalog AI packages with the [databricks]
extra, and install the Databricks-LangChain integration package.
This example uses LangChain, but a similar approach can be applied to other libraries. See Integrate Unity Catalog tools with third party generative AI frameworks.
# Install Unity Catalog AI integration packages with the Databricks extra
%pip install unitycatalog-ai[databricks]
%pip install unitycatalog-langchain[databricks]
# Install the Databricks LangChain integration package
%pip install databricks-langchain
dbutils.library.restartPython()
Initialize the Databricks Function Client
Initialize the Databricks Function Client, which is a specialized interface for creating, managing, and running Unity Catalog functions in Databricks.
from unitycatalog.ai.core.databricks import DatabricksFunctionClient
client = DatabricksFunctionClient()
Define the tool's logic
Unity Catalog tools are really just Unity Catalog user-defined functions (UDFs) under the hood. When you define a Unity Catalog tool, you’re registering a function in Unity Catalog. To learn more about Unity Catalog UDFs, see User-defined functions (UDFs) in Unity Catalog.
You can create Unity Catalog functions using one of two APIs:
create_python_function
accepts a Python callable.create_function
accepts a SQL body create function statement. See Create Python functions.
Use the create_python_function
API to create the function.
To make a Python callable recognizable to the Unity Catalog functions data model, your function must meet the following requirements:
- Type hints: The function signature must define valid Python type hints. Both the named arguments and the return value must have their types defined.
- Do not use variable arguments: Variable arguments such as *args and **kwargs are not supported. All arguments must be explicitly defined.
- Type compatibility: Not all Python types are supported in SQL. See Spark Supported Data Types.
- Descriptive docstrings: The Unity Catalog functions toolkit reads, parses, and extracts important information from your docstring.
- Docstrings must be formatted according to the Google docstring syntax.
- Write clear descriptions for your function and its arguments to help the LLM understand how and when to use the function.
- Dependency imports: Libraries must be imported within the function's body. Imports outside the function will not be resolved when running the tool.
The following code snippets uses the create_python_function
to register the Python callable add_numbers
:
CATALOG = "my_catalog"
SCHEMA = "my_schema"
def add_numbers(number_1: float, number_2: float) -> float:
"""
A function that accepts two floating point numbers adds them,
and returns the resulting sum as a float.
Args:
number_1 (float): The first of the two numbers to add.
number_2 (float): The second of the two numbers to add.
Returns:
float: The sum of the two input numbers.
"""
return number_1 + number_2
function_info = client.create_python_function(
func=add_numbers,
catalog=CATALOG,
schema=SCHEMA,
replace=True
)
Test the function
Test your function to check it works as expected. Specify a fully qualified function name in the execute_function
API to run the function:
result = client.execute_function(
function_name=f"{CATALOG}.{SCHEMA}.add_numbers",
parameters={"number_1": 36939.0, "number_2": 8922.4}
)
result.value # OUTPUT: '45861.4'
Wrap the function using the UCFunctionToolKit
Wrap the function using the UCFunctionToolkit
to make it accessible to agent authoring libraries. The toolkit ensures consistency across different gen AI libraries and adds helpful features like auto-tracing for retrievers.
from databricks_langchain import UCFunctionToolkit
# Create a toolkit with the Unity Catalog function
func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
toolkit = UCFunctionToolkit(function_names=[func_name])
tools = toolkit.tools
Use the tool in an agent
Add the tool to a LangChain agent using the tools
property from UCFunctionToolkit
.
This example authors a simple agent using LangChain AgentExecutor
API for simplicity. For production workloads, use the agent authoring workflow seen in ChatAgent
examples.
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.prompts import ChatPromptTemplate
from databricks_langchain import (
ChatDatabricks,
UCFunctionToolkit,
)
import mlflow
# Initialize the LLM (optional: replace with your LLM of choice)
LLM_ENDPOINT_NAME = "databricks-meta-llama-3-3-70b-instruct"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME, temperature=0.1)
# Define the prompt
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant. Make sure to use tools for additional functionality.",
),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
]
)
# Enable automatic tracing
mlflow.langchain.autolog()
# Define the agent, specifying the tools from the toolkit above
agent = create_tool_calling_agent(llm, tools, prompt)
# Create the agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What is 36939.0 + 8922.4?"})
Improve tool-calling with clear documentation
Good documentation helps your agents know when and how to use each tool. Follow these best practices for documenting your tools:
- For Unity Catalog functions, use the
COMMENT
clause to describe tool functionality and parameters. - Clearly define expected inputs and outputs.
- Write meaningful descriptions to make tools easier for agents, and humans, to use.
Example: Effective tool documentation
The following example shows clear COMMENT
strings for a tool that queries a structured table.
CREATE OR REPLACE FUNCTION main.default.lookup_customer_info(
customer_name STRING COMMENT 'Name of the customer whose info to look up.'
)
RETURNS STRING
COMMENT 'Returns metadata about a specific customer including their email and ID.'
RETURN SELECT CONCAT(
'Customer ID: ', customer_id, ', ',
'Customer Email: ', customer_email
)
FROM main.default.customer_data
WHERE customer_name = customer_name
LIMIT 1;
Example: Ineffective tool documentation
The following example lacks important details, making it harder for agents to use the tool effectively:
CREATE OR REPLACE FUNCTION main.default.lookup_customer_info(
customer_name STRING COMMENT 'Name of the customer.'
)
RETURNS STRING
COMMENT 'Returns info about a customer.'
RETURN SELECT CONCAT(
'Customer ID: ', customer_id, ', ',
'Customer Email: ', customer_email
)
FROM main.default.customer_data
WHERE customer_name = customer_name
LIMIT 1;
Running functions using serverless or local mode
When a gen AI service determines a tool call is needed, integration packages (UCFunctionToolkit
instances) run the DatabricksFunctionClient.execute_function
API.
The execute_function
call can run functions in two execution modes: serverless or local. This mode determines which resource runs the function.
Serverless mode for production
Serverless mode is the default and recommended option for production use cases. It runs functions remotely using a SQL Serverless endpoint, ensuring that your agent's process remains secure and free from the risks of running arbitrary code locally.
# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="serverless")
When your agent requests a tool execution in serverless mode, the following happens:
- The
DatabricksFunctionClient
sends a request to Unity Catalog to retrieve the function definition if the definition has not been locally cached. - The
DatabricksFunctionClient
extracts the function definition and validates the parameter names and types. - The
DatabricksFunctionClient
submits the execution as a UDF to a serverless instance.
Local mode for development
Local mode is designed for development and debugging. It executes functions in a local subprocess instead of making requests to a SQL Serverless endpoint. This allows you to troubleshoot tool calls more effectively by providing local stack traces.
When your agent requests running a tool in local mode, the DatabricksFunctionClient
does the following:
- Sends a request to Unity Catalog to retrieve the function definition if the definition has not been locally cached.
- Extracts the Python callable definition, caches the callable locally, and validates the parameter names and types.
- Invokes the callable with the specified parameters in a restricted subprocess with timeout protection.
# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="local")
Running in "local"
mode provides the following features:
-
CPU time limit: Restricts the total CPU runtime for callable execution to prevent excessive computational loads.
The CPU time limit is based on actual CPU usage, not wall-clock time. Due to system scheduling and concurrent processes, CPU time can exceed wall-clock time in real-world scenarios.
-
Memory limit: Restricts the virtual memory allocated to the process.
-
Timeout protection: Enforces a total wall-clock timeout for running functions.
Customize these limits using environment variables (read further).
Environment variables
Configure how functions run in the DatabricksFunctionClient
using the following environment variables:
Environment variable | Default value | Description |
---|---|---|
|
| Maximum allowable CPU execution time (local mode only). |
|
| Maximum allowable virtual memory allocation for the process (local mode only). |
|
| Maximum total wall clock time (local mode only). |
|
| The Maximum number of attempts to retry refreshing the session client in case of token expiry. |
|
| The Maximum number of rows to return when running functions using serverless compute and |
Next steps
-
Add Unity Catalog tools to agents programmatically. See
ChatAgent
examples. -
Add Unity Catalog tools to agents using the AI Playground UI. See Prototype tool-calling agents in AI Playground.
-
Manage Unity Catalog functions using the Function Client. See Unity Catalog documentation - Function client