Skip to main content

Create custom AI agent tools with Unity Catalog functions

Use Unity Catalog functions to create AI agent tools that execute custom logic and perform specific tasks that extend the capabilities of LLMs beyond language generation.

Requirements

  • A serverless compute connection to create Unity Catalog functions written using SQL body create function statements. Python functions do not require serverless compute.
  • Use Databricks Runtime 15.0 and above.

Create an agent tool

In this example, you create a Unity Catalog tool, test its functionality, and add it to an agent. Run the following code in a Databricks notebook.

Install dependencies

Install Unity Catalog AI packages with the [databricks] extra, and install the Databricks-LangChain integration package.

This example uses LangChain, but a similar approach can be applied to other libraries. See Integrate Unity Catalog tools with third party generative AI frameworks.

Python
# Install Unity Catalog AI integration packages with the Databricks extra
%pip install unitycatalog-ai[databricks]
%pip install unitycatalog-langchain[databricks]

# Install the Databricks LangChain integration package
%pip install databricks-langchain

dbutils.library.restartPython()

Initialize the Databricks Function Client

Initialize the Databricks Function Client, which is a specialized interface for creating, managing, and running Unity Catalog functions in Databricks.

Python
from unitycatalog.ai.core.databricks import DatabricksFunctionClient

client = DatabricksFunctionClient()

Define the tool's logic

Create a Unity Catalog function containing the tool’s logic.

You can create Python functions using one of two APIs:

  • create_python_function accepts a Python callable.
  • create_function accepts a SQL body create function statement.

Use create_python_function API to create the function. Keep in mind that there are some requirements for the successful use of this API:

  • Type hints: The function signature must define valid Python type hints. Both the named arguments and the return value must have their types defined.
  • Do not use variable arguments: Variable arguments such as *args and **kwargs are not supported. All arguments must be explicitly defined.
  • Type compatibility: Not all Python types are supported in SQL.
  • Descriptive docstrings: The Unity Catalog functions toolkit reads, parses, and extracts important information from your docstring.
    • Docstrings must be formatted according to the Google docstring syntax.
    • Write clear descriptions for your function and its arguments to help the LLM understand how and when to use the function.

For more information, see Unity Catalog documentation - Creating functions from Python callables.

Python

CATALOG = "my_catalog"
SCHEMA = "my_schema"

def add_numbers(number_1: float, number_2: float) -> float:
"""
A function that accepts two floating point numbers adds them,
and returns the resulting sum as a float.

Args:
number_1 (float): The first of the two numbers to add.
number_2 (float): The second of the two numbers to add.

Returns:
float: The sum of the two input numbers.
"""
return number_1 + number_2

function_info = client.create_python_function(
func=add_numbers,
catalog=CATALOG,
schema=SCHEMA,
replace=True
)

Test the function

Test your function to ensure it works as expected:

Python
result = client.execute_function(
function_name=f"{CATALOG}.{SCHEMA}.add_numbers",
parameters={"number_1": 36939.0, "number_2": 8922.4}
)

result.value # OUTPUT: '45861.4'

Wrap the function using the UCFunctionToolKit

Wrap the function using the UCFunctionToolkit to make it accessible to agent authoring libraries. The toolkit ensures consistency across different libraries and adds helpful features like auto-tracing for retrievers.

Python
from databricks_langchain import UCFunctionToolkit

# Create a toolkit with the Unity Catalog function
func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
toolkit = UCFunctionToolkit(function_names=[func_name])

tools = toolkit.tools

Use the tool in an agent

Add the tool to a LangChain agent using the tools property from UCFunctionToolkit.

This example authors a simple agent using LangChain AgentExecutor API for simplicity. For production workloads, use the agent authoring workflow seen in ChatAgent examples.

Python
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.prompts import ChatPromptTemplate
from databricks_langchain import (
ChatDatabricks,
UCFunctionToolkit,
)
import mlflow

# Initialize the LLM (optional: replace with your LLM of choice)
LLM_ENDPOINT_NAME = "databricks-meta-llama-3-3-70b-instruct"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME, temperature=0.1)

# Define the prompt
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant. Make sure to use tools for additional functionality.",
),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
]
)

# Enable automatic tracing
mlflow.langchain.autolog()

# Define the agent, specifying the tools from the toolkit above
agent = create_tool_calling_agent(llm, tools, prompt)

# Create the agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What is 36939.0 + 8922.4?"})

Manage Unity Catalog functions

Use the Databricks Function Client to manage Unity Catalog functions. The Databricks Function Client is based on the open source Unity Catalog Function Client but offers several improvements unique to Databricks.

This page documents functionality specific to the Databricks Function Client. For general information about managing Unity Catalog functions, see Unity Catalog documentation - Function client.

Execute functions using serverless or local mode

You can run functions with the DatabricksFunctionClient in two modes: serverless mode and local mode.

Specify a fully qualified function name in the execute_function API to run the function. When a gen AI service determines a tool call is needed to fulfill a request, integration packages (toolkit instances) automatically call this API to run the function.

Serverless mode

Serverless mode is the default and recommended option for production use cases. It runs functions remotely using a SQL Serverless endpoint, ensuring that your agent's process remains secure and free from the risks of running arbitrary code locally.

Python
# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="serverless")

When your agent requests a tool execution in serverless mode the following happens:

  1. The DatabricksFunctionClient sends a request to Unity Catalog to retrieve the function definition if the definition has not been locally cached.
  2. The DatabricksFunctionClient extracts the function definition and validates the parameter names and types.
  3. The DatabricksFunctionClient submits the execution as a UDF to a serverless instance.

Local mode

Local mode is designed for development and debugging. It executes functions in a local subprocess instead of making requests to a SQL Serverless endpoint. This allows you to troubleshoot tool calls more effectively by providing local stack traces.

When your agent requests running a tool in local mode, the DatabricksFunctionClient does the following:

  1. Sends a request to Unity Catalog to retrieve the function definition if the definition has not been locally cached.
  2. Extracts the python callable definition and caches the callable locally and validates the parameter names and types.
  3. Invokes the callable with the specified parameters in a restricted subprocess with timeout protection.
Python
# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="local")

Running in"local" mode provides the following features:

  • CPU time limit: Restricts the total CPU runtime for callable execution to prevent excessive computational loads.

    The CPU time limit is based on actual CPU usage, not wall-clock time. Due to system scheduling and concurrent processes, CPU time can exceed wall-clock time in real-world scenarios.

  • Memory limit: Restricts the virtual memory allocated to the process.

  • Timeout protection: Enforces a total wall-clock timeout for running functions.

Customize these limits using environment variables (read further).

Environment variables

Configure how functions run in the DatabricksFunctionClient using the following environment variables:

Environment variable

Default value

Description

EXECUTOR_MAX_CPU_TIME_LIMIT

10 seconds

Maximum allowable CPU execution time (local mode only).

EXECUTOR_MAX_MEMORY_LIMIT

100 MB

Maximum allowable virtual memory allocation for the process (local mode only).

EXECUTOR_TIMEOUT

20 seconds

Maximum total wall clock time (local mode only).

UCAI_DATABRICKS_SESSION_RETRY_MAX_ATTEMPTS

5

Maximum number of attempts to retry refreshing the session client in case of token expiry.

UCAI_DATABRICKS_SERVERLESS_EXECUTION_RESULT_ROW_LIMIT

100

Maximum number of rows when executing functions using serverless compute with databricks-connect.