Create custom model services
This feature is in Beta. Account admins can control access to this feature from the account console Previews page. See Manage Databricks previews.
This page describes how to create, share, and manage model services in Unity Catalog.
Requirements
- Unity AI Gateway preview enabled for your account. See Manage Databricks previews.
- A Databricks workspace in a Unity AI Gateway supported region.
- Unity Catalog enabled for your workspace. See Enable a workspace for Unity Catalog.
- To create a model service, you must have:
USE CATALOG,USE SCHEMA, andCREATE SERVICEon the catalog and schema where you create the model service.EXECUTEon each model that the model service references as a destination.USE CATALOG,USE SCHEMA, andCREATE TABLEon the catalog and schema where the inference table is created, if you enable inference logging.
Create a model service
You can create a model service in the Unity AI Gateway UI, in Catalog Explorer, or with the Unity Catalog REST API.
Use the UI
- Do one of the following:
- In the workspace sidebar, click AI Gateway, then click Create.
- In Catalog Explorer, go to the schema where you want to create the model service, then click Create > Model service.
- Enter a name for the model service, and select the catalog and schema to create it in. If you start from Catalog Explorer, Catalog Explorer prefills the catalog and schema.
- Select the primary model to serve, from the Databricks-hosted models that you have
EXECUTEon and that Unity AI Gateway can serve. - Click Create.
After you create the model service, Databricks opens its overview page, where you can get started or configure additional features such as inference logging.
Use the REST API
Send a POST request to the model-services endpoint of the Unity Catalog REST API. The following example creates a model service that routes to a primary model and falls back to a second model, with inference logging and rate limits enabled:
curl https://<workspace-url>/api/2.2/unity-catalog/model-services \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DATABRICKS_TOKEN" \
-H "x-databricks-workspace-id: <workspace-id>" \
-d '{
"catalog_name": "main",
"schema_name": "default",
"name": "team-chat",
"comment": "Shared chat endpoint with fallback.",
"destinations": [
{ "name": "primary", "model": "system.ai.databricks-claude-opus-4-6" },
{ "name": "fallback", "model": "system.ai.databricks-gpt-5-2" }
],
"routes": {
"strategy": "fallback",
"destinations": ["primary", "fallback"]
},
"inference_table": "main.logging.team_chat_payload",
"rate_limits": {
"tpm": 10000,
"qpm": 1000
}
}'
Replace the following:
<workspace-url>: Your Databricks workspace URL.<workspace-id>: The ID of the workspace to associate the request with. This workspace is charged for pay-per-token usage.
Grant access to a model service
To let others query a model service, grant them EXECUTE on the model service and USE CATALOG and USE SCHEMA on its catalog and schema. If the model service logs to an inference table, grant SELECT on the table to let them read the logged requests and responses.
GRANT USE CATALOG ON CATALOG main TO ai_team;
GRANT USE SCHEMA ON SCHEMA main.default TO ai_team;
GRANT EXECUTE ON MODEL SERVICE main.default.team_chat TO ai_team;
-- Optional: grant access to the inference table
GRANT SELECT ON TABLE main.logging.team_chat_payload TO ai_team;
For more about granting and discovering access, see Discover and govern access to model services.
Configure features on a model service
You configure features such as rate limits, inference logging, and guardrails on the model service from the Unity AI Gateway UI, the same way you configure them on an Unity AI Gateway endpoint. See:
- Configure Unity AI Gateway endpoints (legacy)
- Configure rate limits for AI services using Unity AI Gateway
- Monitor model services using inference tables
Inference logging
When you enable inference logging, Databricks creates a new, empty Unity Catalog table with a predefined schema at the location you specify. Note the following:
- You must have
USE CATALOG,USE SCHEMA, andCREATE TABLEon the target catalog and schema. - The creator of the model service is the owner of the inference table. No other users have access unless you grant it.
- If a table already exists at the specified location, creating the model service fails.
- The inference table has an independent lifecycle from the model service. If you drop the table, the model service keeps working but stops logging.
For more about inference tables, see Monitor model services using inference tables.
Delete a model service
To delete a model service, you must have at least the MANAGE privilege on it. The owner has a superset of MANAGE.
DROP MODEL SERVICE main.default.team_chat;
System-provided model services in system.ai cannot be deleted.