Enable inference tables on model serving endpoints using the API

Preview

This feature is in Public Preview.

Important

This article describes topics that apply to inference tables for custom models. For external models or provisioned throughput workloads, use AI Gateway-enabled inference tables.

This article explains how to use the Databricks API to enable inference tables for a model serving endpoint. For general information about using inference tables, including how to enable them using the Databricks UI, see Inference tables for monitoring and debugging models.

You can enable inference tables when you create a new endpoint or on an existing endpoint. Databricks recommends that you create the endpoint with a service principal so that the inference table is not affected if the user who created the endpoint is removed from the workspace.

The owner of the inference tables is the user who created the endpoint. All access control lists (ACLs) on the table follow the standard Unity Catalog permissions and can be modified by the table owner.

Requirements

  • Your workspace must have Unity Catalog enabled.

  • Both the creator of the endpoint and the modifier must have Can Manage permission on the endpoint. See Access control lists.

  • Both the creator of the endpoint and the modifier must have the following permissions in Unity Catalog:

    • USE CATALOG permissions on the specified catalog.

    • USE SCHEMA permissions on the specified schema.

    • CREATE TABLE permissions in the schema.

Enable inference tables at endpoint creation using the API

You can enable inference tables for an endpoint during endpoint creation using the API. For instructions on creating an endpoint, see Create custom model serving endpoints.

In the API, the request body has an auto_capture_config to specify:

  • The Unity Catalog catalog: string representing the catalog to store the table

  • The Unity Catalog schema: string representing the schema to store the table

  • (optional) table prefix: string used as a prefix for the inference table name. If this isn’t specified, the endpoint name is used.

  • (optional) enabled: boolean value used to enable or disable inference tables. This true by default.

After specifying a catalog, schema, and optionally table prefix, a table is created at <catalog>.<schema>.<table_prefix>_payload. This table automatically creates a Unity Catalog managed table. The owner of the table is the user who creates the endpoint.

Note

Specifying an existing table is not supported since the inference table is always automatically created on endpoint creation or endpoint updates.

Warning

The inference table could become corrupted if you do any of the following:

  • Change the table schema.

  • Change the table name.

  • Delete the table.

  • Lose permissions to the Unity Catalog catalog or schema.

In this case, the auto_capture_config of the endpoint status shows a FAILED state for the payload table. If this happens, you must create a new endpoint to continue using inference tables.

The following example demonstrates how to enable inference tables during endpoint creation.

POST /api/2.0/serving-endpoints

{
  "name": "feed-ads",
  "config":
  {
    "served_entities": [
      {
       "entity_name": "ads1",
       "entity_version": "1",
       "workload_size": "Small",
       "scale_to_zero_enabled": true
      }
    ],
    "auto_capture_config":
    {
       "catalog_name": "ml",
       "schema_name": "ads",
       "table_name_prefix": "feed-ads-prod"
    }
  }
}

The response looks like:

{
  "name": "feed-ads",
  "creator": "customer@example.com",
  "creation_timestamp": 1666829055000,
  "last_updated_timestamp": 1666829055000,
  "state": {
    "ready": "NOT_READY",
    "config_update": "IN_PROGRESS"
  },
  "pending_config": {
    "start_time": 1666718879000,
    "served_entities": [
      {
        "name": "ads1-1",
        "entity_name": "ads1",
        "entity_version": "1",
        "workload_size": "Small",
        "scale_to_zero_enabled": true,
        "state": {
          "deployment": "DEPLOYMENT_CREATING",
          "deployment_state_message": "Creating"
        },
        "creator": "customer@example.com",
        "creation_timestamp": 1666829055000
    }
   ],
   "config_version": 1,
   "traffic_config": {
     "routes": [
       {
         "served_model_name": "ads1-1",
         "traffic_percentage": 100
       }
      ]
   },
   "auto_capture_config": {
     "catalog_name": "ml",
     "schema_name": "ads",
     "table_name_prefix": "feed-ads-prod",
     "state": {
       "payload_table": {
         "name": "feed-ads-prod_payload"
       }
     },
     "enabled": true
   }
  },
  "id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "permission_level": "CAN_MANAGE"
}

Once logging to inference tables has been enabled, wait until your endpoint is ready. Then you can start calling it.

After you create an inference table, schema evolution and adding data should be handled by the system.

The following operations do not impact the integrity of the table:

  • Running OPTIMIZE, ANALYZE, and VACUUM against the table.

  • Deleting old unused data.

If you don’t specify an auto_capture_config, by default the settings configuration from the previous configuration version is re-used. For example, if inference tables was already enabled, the same settings are used on the next endpoint update or if inference tables was disabled, then it continues being disabled.

{
  "served_entities": [
    {
      "name":"current",
      "entity_name":"model-A",
      "entity_version":"1",
      "workload_size":"Small",
      "scale_to_zero_enabled":true
    }
  ],
  "auto_capture_config": {
    "enabled": false
  }
}

Enable inference tables on an existing endpoint using the API

You can also enable inference tables on an existing endpoint using the API. After inference tables are enabled, continue specifying the same auto_capture_config body in future update endpoint API calls to continue using inference tables.

Note

Changing the table location after enabling inference tables is not supported.

PUT /api/2.0/serving-endpoints/{name}/config

{
  "served_entities": [
    {
      "name":"current",
      "entity_name":"model-A",
      "entity_version":"1",
      "workload_size":"Small",
      "scale_to_zero_enabled":true
    },
    {
      "name":"challenger",
      "entity_name":"model-B",
      "entity_version":"1",
      "workload_size":"Small",
      "scale_to_zero_enabled":true
    }
  ],
  "traffic_config":{
    "routes": [
      {
        "served_model_name":"current",
        "traffic_percentage":"50"
      },
      {
        "served_model_name":"challenger",
        "traffic_percentage":"50"
      }
    ]
  },
  "auto_capture_config":{
   "catalog_name": "catalog",
   "schema_name": "schema",
   "table_name_prefix": "my-endpoint"
  }
}

Disable inference tables

When disabling inference tables, you do not need to specify catalog, schema, or table prefix. The only required field is enabled: false.

POST /api/2.0/serving-endpoints

{
  "name": "feed-ads",
  "config":{
    "served_entities": [
      {
       "entity_name": "ads1",
       "entity_version": "1",
       "workload_size": "Small",
       "scale_to_zero_enabled": true
      }
    ],
    "auto_capture_config":{
       "enabled": false
    }
  }
}

To re-enable a disabled inference table follow the instructions in Enable inference tables on an existing endpoint. You can use either the same table or specify a new table.

Next steps

After you enable inference tables, you can monitor the served models in your model serving endpoint with Databricks Lakehouse Monitoring. For details, see Workflow: Monitor model performance using inference tables.