Create custom AI agent tools with Unity Catalog functions
Use Unity Catalog functions to create AI agent tools that execute custom logic and perform specific tasks that extend the capabilities of LLMs beyond language generation.
Requirements
- A serverless compute connection to create Unity Catalog functions written using SQL body create function statements. Python functions do not require serverless compute.
- Use Databricks Runtime 15.0 and above.
Create an agent tool
In this example, you create a Unity Catalog tool, test its functionality, and add it to an agent. Run the following code in a Databricks notebook.
Install dependencies
Install Unity Catalog AI packages with the [databricks]
extra, and install the Databricks-LangChain integration package.
This example uses LangChain, but a similar approach can be applied to other libraries. See Integrate Unity Catalog tools with third party generative AI frameworks.
# Install Unity Catalog AI integration packages with the Databricks extra
%pip install unitycatalog-ai[databricks]
%pip install unitycatalog-langchain[databricks]
# Install the Databricks LangChain integration package
%pip install databricks-langchain
dbutils.library.restartPython()
Initialize the Databricks Function Client
Initialize the Databricks Function Client, which is a specialized interface for creating, managing, and running Unity Catalog functions in Databricks.
from unitycatalog.ai.core.databricks import DatabricksFunctionClient
client = DatabricksFunctionClient()
Define the tool's logic
Create a Unity Catalog function containing the tool’s logic.
You can create Python functions using one of two APIs:
create_python_function
accepts a Python callable.create_function
accepts a SQL body create function statement.
Use create_python_function
API to create the function. Keep in mind that there are some requirements for the successful use of this API:
- Type hints: The function signature must define valid Python type hints. Both the named arguments and the return value must have their types defined.
- Do not use variable arguments: Variable arguments such as
*args
and**kwargs
are not supported. All arguments must be explicitly defined. - Type compatibility: Not all Python types are supported in SQL.
- Descriptive docstrings: The Unity Catalog functions toolkit reads, parses, and extracts important information from your docstring.
- Docstrings must be formatted according to the Google docstring syntax.
- Write clear descriptions for your function and its arguments to help the LLM understand how and when to use the function.
For more information, see Unity Catalog documentation - Creating functions from Python callables.
CATALOG = "my_catalog"
SCHEMA = "my_schema"
def add_numbers(number_1: float, number_2: float) -> float:
"""
A function that accepts two floating point numbers adds them,
and returns the resulting sum as a float.
Args:
number_1 (float): The first of the two numbers to add.
number_2 (float): The second of the two numbers to add.
Returns:
float: The sum of the two input numbers.
"""
return number_1 + number_2
function_info = client.create_python_function(
func=add_numbers,
catalog=CATALOG,
schema=SCHEMA,
replace=True
)
Test the function
Test your function to ensure it works as expected:
result = client.execute_function(
function_name=f"{CATALOG}.{SCHEMA}.add_numbers",
parameters={"number_1": 36939.0, "number_2": 8922.4}
)
result.value # OUTPUT: '45861.4'
Wrap the function using the UCFunctionToolKit
Wrap the function using the UCFunctionToolkit
to make it accessible to agent authoring libraries. The toolkit ensures consistency across different libraries and adds helpful features like auto-tracing for retrievers.
from databricks_langchain import UCFunctionToolkit
# Create a toolkit with the Unity Catalog function
func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
toolkit = UCFunctionToolkit(function_names=[func_name])
tools = toolkit.tools
Use the tool in an agent
Add the tool to a LangChain agent using the tools
property from UCFunctionToolkit
.
This example authors a simple agent using LangChain AgentExecutor
API for simplicity. For production workloads, use the agent authoring workflow seen in ChatAgent
examples.
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.prompts import ChatPromptTemplate
from databricks_langchain import (
ChatDatabricks,
UCFunctionToolkit,
)
import mlflow
# Initialize the LLM (optional: replace with your LLM of choice)
LLM_ENDPOINT_NAME = "databricks-meta-llama-3-3-70b-instruct"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME, temperature=0.1)
# Define the prompt
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant. Make sure to use tools for additional functionality.",
),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
]
)
# Enable automatic tracing
mlflow.langchain.autolog()
# Define the agent, specifying the tools from the toolkit above
agent = create_tool_calling_agent(llm, tools, prompt)
# Create the agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What is 36939.0 + 8922.4?"})
Manage Unity Catalog functions
Use the Databricks Function Client to manage Unity Catalog functions. The Databricks Function Client is based on the open source Unity Catalog Function Client but offers several improvements unique to Databricks.
This page documents functionality specific to the Databricks Function Client. For general information about managing Unity Catalog functions, see Unity Catalog documentation - Function client.
Execute functions using serverless or local mode
You can run functions with the DatabricksFunctionClient
in two modes: serverless mode and local mode.
Specify a fully qualified function name in the execute_function
API to run the function. When a gen AI service determines a tool call is needed to fulfill a request, integration packages (toolkit instances) automatically call this API to run the function.
Serverless mode
Serverless mode is the default and recommended option for production use cases. It runs functions remotely using a SQL Serverless endpoint, ensuring that your agent's process remains secure and free from the risks of running arbitrary code locally.
# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="serverless")
When your agent requests a tool execution in serverless mode the following happens:
- The
DatabricksFunctionClient
sends a request to Unity Catalog to retrieve the function definition if the definition has not been locally cached. - The
DatabricksFunctionClient
extracts the function definition and validates the parameter names and types. - The
DatabricksFunctionClient
submits the execution as a UDF to a serverless instance.
Local mode
Local mode is designed for development and debugging. It executes functions in a local subprocess instead of making requests to a SQL Serverless endpoint. This allows you to troubleshoot tool calls more effectively by providing local stack traces.
When your agent requests running a tool in local mode, the DatabricksFunctionClient
does the following:
- Sends a request to Unity Catalog to retrieve the function definition if the definition has not been locally cached.
- Extracts the python callable definition and caches the callable locally and validates the parameter names and types.
- Invokes the callable with the specified parameters in a restricted subprocess with timeout protection.
# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="local")
Running in"local"
mode provides the following features:
-
CPU time limit: Restricts the total CPU runtime for callable execution to prevent excessive computational loads.
The CPU time limit is based on actual CPU usage, not wall-clock time. Due to system scheduling and concurrent processes, CPU time can exceed wall-clock time in real-world scenarios.
-
Memory limit: Restricts the virtual memory allocated to the process.
-
Timeout protection: Enforces a total wall-clock timeout for running functions.
Customize these limits using environment variables (read further).
Environment variables
Configure how functions run in the DatabricksFunctionClient
using the following environment variables:
Environment variable | Default value | Description |
---|---|---|
|
| Maximum allowable CPU execution time (local mode only). |
|
| Maximum allowable virtual memory allocation for the process (local mode only). |
|
| Maximum total wall clock time (local mode only). |
|
| Maximum number of attempts to retry refreshing the session client in case of token expiry. |
|
| Maximum number of rows when executing functions using serverless compute with |