Create custom model services

Beta

This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.

This page describes how to create, share, and manage model services in Unity Catalog.

Requirements

Unity AI Gateway preview enabled for your account. See Manage Databricks previews.
A Databricks workspace in a Unity AI Gateway supported region.
Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
To create a model service, you must have:
- USE CATALOG, USE SCHEMA, and CREATE SERVICE on the catalog and schema where you create the model service.
- EXECUTE on each model that the model service references as a destination.
- USE CATALOG, USE SCHEMA, and CREATE TABLE on the catalog and schema where the inference table is created, if you enable inference logging.

Create a model service

You can create a model service in the Unity AI Gateway UI, in Catalog Explorer, or with the Unity Catalog REST API.

Use the UI

Do one of the following:
- In the workspace sidebar, click AI Gateway, then click Create.
- In Catalog Explorer, go to the schema where you want to create the model service, then click Create > Model service.
Enter a name for the model service, and select the catalog and schema to create it in. If you start from Catalog Explorer, Catalog Explorer prefills the catalog and schema.
Select the primary model to serve, from the Databricks-hosted models that you have EXECUTE on and that Unity AI Gateway can serve.
Click Create.

After you create the model service, Databricks opens its overview page, where you can get started or configure additional features such as inference logging.

Use the REST API

Send a POST request to the model-services endpoint of the Unity Catalog REST API. The following example creates a model service that routes to a primary model and falls back to a second model, with inference logging and rate limits enabled:

Bash
curl https://<workspace-url>/api/2.2/unity-catalog/model-services \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DATABRICKS_TOKEN" \
  -H "x-databricks-workspace-id: <workspace-id>" \
  -d '{
    "catalog_name": "main",
    "schema_name": "default",
    "name": "team-chat",
    "comment": "Shared chat endpoint with fallback.",
    "destinations": [
      { "name": "primary", "model": "system.ai.databricks-claude-opus-4-6" },
      { "name": "fallback", "model": "system.ai.databricks-gpt-5-2" }
    ],
    "routes": {
      "strategy": "fallback",
      "destinations": ["primary", "fallback"]
    },
    "inference_table": "main.logging.team_chat_payload",
    "rate_limits": {
      "tpm": 10000,
      "qpm": 1000
    }
  }'

Replace the following:

<workspace-url>: Your Databricks workspace URL.
<workspace-id>: The ID of the workspace to associate the request with. This workspace is charged for pay-per-token usage.

Grant access to a model service

To let others query a model service, grant them EXECUTE on the model service and USE CATALOG and USE SCHEMA on its catalog and schema. If the model service logs to an inference table, grant SELECT on the table to let them read the logged requests and responses.

SQL
GRANT USE CATALOG ON CATALOG main TO ai_team;
GRANT USE SCHEMA ON SCHEMA main.default TO ai_team;
GRANT EXECUTE ON MODEL SERVICE main.default.team_chat TO ai_team;

-- Optional: grant access to the inference table
GRANT SELECT ON TABLE main.logging.team_chat_payload TO ai_team;

For more about granting and discovering access, see Discover and govern access to model services.

Configure features on a model service

You configure features such as rate limits, inference logging, and guardrails on the model service from the Unity AI Gateway UI, the same way you configure them on an Unity AI Gateway endpoint. See:

Inference logging

When you enable inference logging, Databricks creates a new, empty Unity Catalog table with a predefined schema at the location you specify. Note the following:

You must have USE CATALOG, USE SCHEMA, and CREATE TABLE on the target catalog and schema.
The creator of the model service is the owner of the inference table. No other users have access unless you grant it.
If a table already exists at the specified location, creating the model service fails.
The inference table has an independent lifecycle from the model service. If you drop the table, the model service keeps working but stops logging.

For more about inference tables, see Monitor model services using inference tables.

Delete a model service

To delete a model service, you must have at least the MANAGE privilege on it. The owner has a superset of MANAGE.

SQL
DROP MODEL SERVICE main.default.team_chat;

System-provided model services in system.ai cannot be deleted.

Requirements​

Create a model service​

Use the UI​

Use the REST API​

Grant access to a model service​

Configure features on a model service​

Inference logging​

Delete a model service​

Next steps​