embedding-with-oss-models(Python)

Loading...

Register and serve an OSS embedding model

This notebook sets up the open source text embedding model e5-small-v2 in a Model Serving endpoint usable for Vector Search.

  • Download the model from the Hugging Face Hub.
  • Register it to the MLflow Model Registry.
  • Start a Model Serving endpoint to serve the model.

The model e5-small-v2 is available at https://huggingface.co/intfloat/e5-small-v2.

For a list of library versions included in Databricks Runtime, see the release notes for your Databricks Runtime version (AWS | Azure).

Install Databricks Python SDK

This notebook uses its Python client to work with serving endpoints.

3

Download model

5

6

Register model to MLflow

8

9

10

11

12

13

14

Create model serving endpoint

For more details, see "Create foundation model serving endpoints" (AWS | Azure).

Note: This example creates a small CPU endpoint that scales down to 0. This is for quick, small tests. For more realistic use cases, consider using GPU endpoints for faster embedding computation and not scaling down to 0 if you expect frequent queries, as Model Serving endpoints have some cold start overhead.

16

    Create Databricks SDK workspace client

    Create endpoint

    Query endpoint

    The above create_and_wait command waits until the endpoint is ready. You can also check the status of the serving endpoint in the Databricks UI.

    For more information, see "Query foundation models" (AWS | Azure).

    20