enable-ai-gateway-features(Python)

Loading...

Enable Databricks Mosaic AI Gateway features

This notebook shows how to enable and use Databricks Mosaic AI Gateway features to manage and govern models from providers, such as OpenAI and Anthropic.

In this notebook, you use the Model Serving and AI Gateway API to accomplish the following tasks:

  • Create and configure an endpoint for OpenAI GPT-4o-Mini.
  • Enable AI Gateway features including usage tracking, inference tables, guardrails, and rate limits.
  • Set up invalid keywords and personally identifiable information (PII) detection for model requests and responses.
  • Implement rate limits for model serving endpoints.
  • Configure multiple models for A/B testing.

If you prefer a low-code experience, you can create an external models endpoint and configure AI Gateway features using the Serving UI (AWS|Azure).

2

3

Create a model serving endpoint for OpenAI GPT-4o-Mini

The following creates a model serving endpoint for GPT-4o Mini without AI Gateway enabled. First, you define a helper function for creating and updating the endpoint:

5

Next, write a simple configuration to set up the endpoint. See POST /api/2.0/serving-endpoints for API details.

7

8

One of the immediate benefits of using OpenAI models (or models from other providers) using Databricks is that you can immediately query the model using the any of the following methods:

  • Databricks Python SDK
  • OpenAI Python client
  • REST API calls
  • MLflow Deployments SDK
  • Databricks SQL ai_query function

See the Query foundation models and external models article (AWS|Azure).

For example, you can use ai_query to query the model with Databricks SQL.

10

Add an AI Gateway configuration

After you set up a model serving endpoint, you can query the OpenAI model using any of the various querying methods accessible in Databricks.

You can further enrich the model serving endpoint by enabling the Databricks Mosaic AI Gateway, which offers a variety of features for monitoring and managing your endpoint. These features include inference tables, guardrails, and rate limits, among other things.

To start, the following is a simple configuration that enables inference tables for monitoring endpoint usage. Understanding how the endpoint is being used and how often, helps to determine what usage limits and guardrails are beneficial for your use case.

12

13

Query the inference table

The following displays the inference table that was created when enabled in AI Gateway. Note: For example purposes, a number of queries were run on this endpoint in the AI playground after running the above update to add inference tables, but before querying them.

15

You can extract details such as the request messages, response messages, and token counts using SQL:

17

Set up AI Guardrails

Set invalid keywords

You can investigate the inference table to see whether the endpoint is being used for inappropriate topics. From the inference table, it looks like a user is talking about SuperSecretProject! For this example, you can assume that topic is not in the scope of use for this chat endpoint.

19

The following adds SuperSecretProject to the list of invalid keywords to make sure usage stays in scope.

21

22

Now, queries referencing SuperSecretProject are not run, but instead returns an error message, "Error: Invalid keywords detected in the prompt. Please revise your input."

24

Error: Invalid keywords detected in the prompt. Please revise your input.

Set up PII detection

Now, the endpoint blocks messages referencing SuperSecretProject. You can also make sure the endpoint doesn't accept requests with or respond with messages containing any PII.

The following updates the guardrails configuration for pii:

26

27

The following tries to prompt the model to work with PII, but returns the message, "Error: PII (Personally Identifiable Information) detected. Please try again.".

29

Error: PII (Personally Identifiable Information) detected. Please try again.

Add rate limits

Say you are investigating the inference tables further and you see some steep spikes in usage suggesting a higher-than-expected volume of queries. Extremely high usage could be costly if not monitored and limited.

31

You can set a rate limit to prevent excessive queries. In this case, you can set the limit on the endpoint, but it is also possible to set per-user limits.

33

34

The following shows an example of what the output error looks like when the rate limit is exceeded.

36

Request 1 sent Request 2 sent Request 3 sent Request 4 sent Request 5 sent Request 6 sent Request 7 sent Request 8 sent Request 9 sent Request 10 sent
RateLimitError: Error code: 429 - {'error_code': 'REQUEST_LIMIT_EXCEEDED', 'message': 'REQUEST_LIMIT_EXCEEDED: User defined rate limit(s) exceeded for endpoint: dr-gateway-demo.'}
File <command-1862781031873971>, line 8 6 start_time = time.time() 7 for i in range(1, 12): ----> 8 client.chat.completions.create( 9 model="dr-gateway-demo", 10 messages=[ 11 {"role": "system", "content": "You are a helpful assistant."}, 12 {"role": "user", "content": f"This is request {i}"}, 13 ], 14 max_tokens=10, 15 ) 16 print(f"Request {i} sent") 17 print(f"Total time: {time.time() - start_time:.2f} seconds")

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-96f8c053-30f1-4cae-afe7-f8c0461fc561/lib/python3.10/site-packages/openai/_base_client.py:1041, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls) 1038 err.response.read() 1040 log.debug("Re-raising status error") -> 1041 raise self._make_status_error_from_response(err.response) from None 1043 return self._process_response( 1044 cast_to=cast_to, 1045 options=options, (...) 1049 retries_taken=options.get_max_retries(self.max_retries) - retries, 1050 )

Add another model

At some point, you might want to A/B test models from different providers. You can add another OpenAI model to the configuration, like in the following example:

38

39

Now, traffic will be split between these two models (you can configure the proportion of traffic going to each model). This enables us to use the inference tables to evaluate the quality of each model and make an informed decision about switching from one model to another.