Enable Databricks Mosaic AI Gateway features

This notebook shows how to enable and use Databricks Mosaic AI Gateway features to manage and govern models from providers, such as OpenAI and Anthropic.

In this notebook, you use the Model Serving and AI Gateway API to accomplish the following tasks:

Create and configure an endpoint for OpenAI GPT-4o-Mini.
Enable AI Gateway features including usage tracking, inference tables, guardrails, and rate limits.
Set up invalid keywords and personally identifiable information (PII) detection for model requests and responses.
Implement rate limits for model serving endpoints.
Configure multiple models for A/B testing.

If you prefer a low-code experience, you can create an external models endpoint and configure AI Gateway features using the Serving UI (AWS|Azure).

2

3

5

7

8

10

Table

This result is stored as _sqldf and can be used in other Python and SQL cells.

Add an AI Gateway configuration

After you set up a model serving endpoint, you can query the OpenAI model using any of the various querying methods accessible in Databricks.

You can further enrich the model serving endpoint by enabling the Databricks Mosaic AI Gateway, which offers a variety of features for monitoring and managing your endpoint. These features include inference tables, guardrails, and rate limits, among other things.

To start, the following is a simple configuration that enables inference tables for monitoring endpoint usage. Understanding how the endpoint is being used and how often, helps to determine what usage limits and guardrails are beneficial for your use case.

12

13

15

Table

17

Table

19

Table

Invalid Keywords

21

22

24

Error: Invalid keywords detected in the prompt. Please revise your input.

26

27

29

Error: PII (Personally Identifiable Information) detected. Please try again.

31

Table

Visualization 1

33

34

36

Request 1 sent Request 2 sent Request 3 sent Request 4 sent Request 5 sent Request 6 sent Request 7 sent Request 8 sent Request 9 sent Request 10 sent

RateLimitError: Error code: 429 - {'error_code': 'REQUEST_LIMIT_EXCEEDED', 'message': 'REQUEST_LIMIT_EXCEEDED: User defined rate limit(s) exceeded for endpoint: dr-gateway-demo.'}

File <command-1862781031873971>, line 8 6 start_time = time.time() 7 for i in range(1, 12): ----> 8 client.chat.completions.create( 9 model="dr-gateway-demo", 10 messages=[ 11 {"role": "system", "content": "You are a helpful assistant."}, 12 {"role": "user", "content": f"This is request {i}"}, 13 ], 14 max_tokens=10, 15 ) 16 print(f"Request {i} sent") 17 print(f"Total time: {time.time() - start_time:.2f} seconds")

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-96f8c053-30f1-4cae-afe7-f8c0461fc561/lib/python3.10/site-packages/openai/_base_client.py:1041, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls) 1038 err.response.read() 1040 log.debug("Re-raising status error") -> 1041 raise self._make_status_error_from_response(err.response) from None 1043 return self._process_response( 1044 cast_to=cast_to, 1045 options=options, (...) 1049 retries_taken=options.get_max_retries(self.max_retries) - retries, 1050 )

38

39

enable-ai-gateway-features(Python)

Enable Databricks Mosaic AI Gateway features

Create a model serving endpoint for OpenAI GPT-4o-Mini

Add an AI Gateway configuration

Query the inference table

Set up AI Guardrails

Set invalid keywords

Set up PII detection

Add rate limits

Add another model