migrate-mosaicml-inference-to-model-serving (3)(Python)

Loading...

Migrate from MosaicML inference to Databricks Model Serving

This notebook is to help existing MosaicML enterprise tier customers migrate their model deployments from MosaicmML inference to Databricks Model Serving with minimal effort.

⚠️ Before using this notebook, please make sure you satisfy all of the requirements in this checklist:

  • You have set up API key to access your model checkpoints in s3/gcs.

  • You have connected this notebook to a GPU cluster, A10 for 7b model, etc.

  • OPTIONAL: You have turned on Unity Catalog in your workspace, this can expediate model endpoint creation

Step 1: Provide your model info (~1min)

Provide both MODEL_CHECKPOINT_PATH and REGISTERED_MODEL_NAME in the following cell then hit "Run Cell".

The following cell provides code examples for URL paths to model checkpoint directories in either Amazon S3 or Google Cloud storage, such as:

  • S3 model path: s3://bucket_name/model_name/hf_checkpoints
  • GCP model path: gs://bucket_name/model_name/hf_checkpoints

If using Unity Catalog to register model, the model name should follow this format: catalog_name.schema_name.model_name

3

Step 2: Initial setup (~2min)

You can run the following cell as-is to configure the initial set up of the packages and model checkpoint paths.

5

⏳ Setting up required packages... Requirement already satisfied: transformers in /databricks/python3/lib/python3.10/site-packages (4.30.2) Requirement already satisfied: boto3 in /databricks/python3/lib/python3.10/site-packages (1.24.28) Collecting mlflow Downloading mlflow-2.7.1-py3-none-any.whl (18.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.5/18.5 MB 54.1 MB/s eta 0:00:0000:0100:01 Requirement already satisfied: torch in /databricks/python3/lib/python3.10/site-packages (1.13.1+cu117) Requirement already satisfied: packaging>=20.0 in /databricks/python3/lib/python3.10/site-packages (from transformers) (21.3) Requirement already satisfied: numpy>=1.17 in /databricks/python3/lib/python3.10/site-packages (from transformers) (1.21.5) Requirement already satisfied: requests in /databricks/python3/lib/python3.10/site-packages (from transformers) (2.28.1) Requirement already satisfied: filelock in /databricks/python3/lib/python3.10/site-packages (from transformers) (3.6.0) Requirement already satisfied: safetensors>=0.3.1 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.3.2) Requirement already satisfied: regex!=2019.12.17 in /databricks/python3/lib/python3.10/site-packages (from transformers) (2022.7.9) Requirement already satisfied: huggingface-hub<1.0,>=0.14.1 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.16.4) Requirement already satisfied: tqdm>=4.27 in /databricks/python3/lib/python3.10/site-packages (from transformers) (4.64.1) Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /databricks/python3/lib/python3.10/site-packages (from transformers) (0.13.3) Requirement already satisfied: pyyaml>=5.1 in /databricks/python3/lib/python3.10/site-packages (from transformers) (6.0) Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /databricks/python3/lib/python3.10/site-packages (from boto3) (0.10.0) Requirement already satisfied: s3transfer<0.7.0,>=0.6.0 in /databricks/python3/lib/python3.10/site-packages (from boto3) (0.6.0) Requirement already satisfied: botocore<1.28.0,>=1.27.28 in /databricks/python3/lib/python3.10/site-packages (from boto3) (1.27.28) Requirement already satisfied: scipy<2 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.9.1) Requirement already satisfied: entrypoints<1 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (0.4) Requirement already satisfied: pytz<2024 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (2022.1) Collecting alembic!=1.10.0,<2 Downloading alembic-1.12.0-py3-none-any.whl (226 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 226.0/226.0 kB 39.0 MB/s eta 0:00:00 Requirement already satisfied: importlib-metadata!=4.7.0,<7,>=3.7.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (4.11.3) Requirement already satisfied: Flask<3 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.1.2+db1) Requirement already satisfied: pyarrow<14,>=4.0.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (8.0.0) Requirement already satisfied: databricks-cli<1,>=0.8.7 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (0.17.7) Requirement already satisfied: sqlalchemy<3,>=1.4.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.4.39) Requirement already satisfied: matplotlib<4 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (3.5.2) Requirement already satisfied: Jinja2<4,>=2.11 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (2.11.3) Requirement already satisfied: gitpython<4,>=2.1.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (3.1.27) Requirement already satisfied: click<9,>=7.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (8.0.4) Requirement already satisfied: sqlparse<1,>=0.4.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (0.4.2) Collecting docker<7,>=4.0.0 Downloading docker-6.1.3-py3-none-any.whl (148 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 148.1/148.1 kB 28.5 MB/s eta 0:00:00 Requirement already satisfied: scikit-learn<2 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.1.1) Requirement already satisfied: markdown<4,>=3.3 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (3.3.4) Collecting querystring-parser<2 Downloading querystring_parser-1.2.4-py2.py3-none-any.whl (7.9 kB) Requirement already satisfied: gunicorn<22 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (20.1.0) Requirement already satisfied: protobuf<5,>=3.12.0 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (3.19.4) Requirement already satisfied: cloudpickle<3 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (2.0.0) Requirement already satisfied: pandas<3 in /databricks/python3/lib/python3.10/site-packages (from mlflow) (1.4.4) Requirement already satisfied: typing-extensions in /databricks/python3/lib/python3.10/site-packages (from torch) (4.3.0) Requirement already satisfied: Mako in /databricks/python3/lib/python3.10/site-packages (from alembic!=1.10.0,<2->mlflow) (1.2.0) Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /databricks/python3/lib/python3.10/site-packages (from botocore<1.28.0,>=1.27.28->boto3) (2.8.2) Requirement already satisfied: urllib3<1.27,>=1.25.4 in /databricks/python3/lib/python3.10/site-packages (from botocore<1.28.0,>=1.27.28->boto3) (1.26.11) Requirement already satisfied: pyjwt>=1.7.0 in /usr/lib/python3/dist-packages (from databricks-cli<1,>=0.8.7->mlflow) (2.3.0) Requirement already satisfied: oauthlib>=3.1.0 in /usr/lib/python3/dist-packages (from databricks-cli<1,>=0.8.7->mlflow) (3.2.0) Requirement already satisfied: six>=1.10.0 in /usr/lib/python3/dist-packages (from databricks-cli<1,>=0.8.7->mlflow) (1.16.0) Requirement already satisfied: tabulate>=0.7.7 in /databricks/python3/lib/python3.10/site-packages (from databricks-cli<1,>=0.8.7->mlflow) (0.8.10) Requirement already satisfied: websocket-client>=0.32.0 in /databricks/python3/lib/python3.10/site-packages (from docker<7,>=4.0.0->mlflow) (0.58.0) Requirement already satisfied: Werkzeug>=0.15 in /databricks/python3/lib/python3.10/site-packages (from Flask<3->mlflow) (2.0.3) Requirement already satisfied: itsdangerous>=0.24 in /databricks/python3/lib/python3.10/site-packages (from Flask<3->mlflow) (2.0.1) Requirement already satisfied: gitdb<5,>=4.0.1 in /databricks/python3/lib/python3.10/site-packages (from gitpython<4,>=2.1.0->mlflow) (4.0.10) Requirement already satisfied: setuptools>=3.0 in /databricks/python3/lib/python3.10/site-packages (from gunicorn<22->mlflow) (63.4.1) Requirement already satisfied: fsspec in /databricks/python3/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.14.1->transformers) (2022.7.1) Requirement already satisfied: zipp>=0.5 in /databricks/python3/lib/python3.10/site-packages (from importlib-metadata!=4.7.0,<7,>=3.7.0->mlflow) (3.8.0) Requirement already satisfied: MarkupSafe>=0.23 in /databricks/python3/lib/python3.10/site-packages (from Jinja2<4,>=2.11->mlflow) (2.0.1) Requirement already satisfied: pillow>=6.2.0 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (9.2.0) Requirement already satisfied: cycler>=0.10 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (4.25.0) Requirement already satisfied: kiwisolver>=1.0.1 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (1.4.2) Requirement already satisfied: pyparsing>=2.2.1 in /databricks/python3/lib/python3.10/site-packages (from matplotlib<4->mlflow) (3.0.9) Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (2022.9.14) Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (3.3) Requirement already satisfied: charset-normalizer<3,>=2 in /databricks/python3/lib/python3.10/site-packages (from requests->transformers) (2.0.4) Requirement already satisfied: joblib>=1.0.0 in /databricks/python3/lib/python3.10/site-packages (from scikit-learn<2->mlflow) (1.2.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /databricks/python3/lib/python3.10/site-packages (from scikit-learn<2->mlflow) (2.2.0) Requirement already satisfied: greenlet!=0.4.17 in /databricks/python3/lib/python3.10/site-packages (from sqlalchemy<3,>=1.4.0->mlflow) (1.1.1) Requirement already satisfied: smmap<6,>=3.0.1 in /databricks/python3/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython<4,>=2.1.0->mlflow) (5.0.0) Installing collected packages: querystring-parser, docker, alembic, mlflow Successfully installed alembic-1.12.0 docker-6.1.3 mlflow-2.7.1 querystring-parser-1.2.4 [notice] A new release of pip available: 22.2.2 -> 23.2.1 [notice] To update, run: pip install --upgrade pip ✅ Done! ⏳ Validating your model checkpoint path... ✅ Done!

Step 3: Fetch your model checkpoint from remote model storage (~2min for 7b model)

The following fetches your model checkpoint from remote model storage. You need to populate your AWS_ACCESS_KEY_ID = "" and AWS_SECRET_ACCESS_KEY = "" before running this cell.

7

INFO:__main__:Downloading model from path: s3://mosaicml-internal-checkpoints-shared/tianshu/finetune-mpt-7b-k7te/hf_checkpoints ⏳ Downloading your model checkpoint... generation_config.json: 0%| | 0.00/91.0 [00:00<?, ?B/s][A ffn.py: 0%| | 0.00/1.64k [00:00<?, ?B/s] custom_embedding.py: 100%|██████████| 292/292 [00:00<00:00, 5.66kB/s] config.json: 100%|██████████| 1.32k/1.32k [00:00<00:00, 19.8kB/s] blocks.py: 100%|██████████| 2.67k/2.67k [00:00<00:00, 38.3kB/s] generation_config.json: 100%|██████████| 91.0/91.0 [00:00<00:00, 322B/s] special_tokens_map.json: 0%| | 0.00/131 [00:00<?, ?B/s] norm.py: 100%|██████████| 3.04k/3.04k [00:00<00:00, 31.0kB/s] fc.py: 100%|██████████| 167/167 [00:00<00:00, 581B/s]:00<?, ?B/s] special_tokens_map.json: 100%|██████████| 131/131 [00:00<00:00, 4.51kB/s] meta_init_context.py: 100%|██████████| 3.80k/3.80k [00:00<00:00, 140kB/s] configuration_mpt.py: 0%| | 0.00/10.8k [00:00<?, ?B/s] ffn.py: 100%|██████████| 1.64k/1.64k [00:00<00:00, 4.93kB/s] configuration_mpt.py: 100%|██████████| 10.8k/10.8k [00:00<00:00, 321kB/s] tokenizer_config.json: 100%|██████████| 237/237 [00:00<00:00, 2.98kB/s] adapt_tokenizer.py: 100%|██████████| 1.69k/1.69k [00:00<00:00, 29.8kB/s] param_init_fns.py: 100%|██████████| 13.6k/13.6k [00:00<00:00, 163kB/s] attention.py: 100%|██████████| 19.5k/19.5k [00:00<00:00, 390kB/s]?, ?B/s] hf_prefixlm_converter.py: 100%|██████████| 27.5k/27.5k [00:00<00:00, 531kB/s] modeling_mpt.py: 100%|██████████| 19.9k/19.9k [00:00<00:00, 519kB/s] flash_attn_triton.py: 100%|██████████| 28.2k/28.2k [00:00<00:00, 604kB/s]B/s] tokenizer.json: 100%|██████████| 2.11M/2.11M [00:00<00:00, 22.9MB/s] pytorch_model.bin.index.json: 100%|██████████| 16.0k/16.0k [00:00<00:00, 133kB/s] pytorch_model-00001-of-00002.bin: 0%| | 262k/9.94G [00:00<2:03:40, 1.34MB/s] pytorch_model-00002-of-00002.bin: 0%| | 262k/3.36G [00:00<38:58, 1.43MB/s] pytorch_model-00002-of-00002.bin: 1%| | 19.9M/3.36G [00:00<00:50, 66.0MB/s] pytorch_model-00001-of-00002.bin: 0%| | 6.82M/9.94G [00:00<10:07, 16.3MB/s] pytorch_model-00002-of-00002.bin: 1%| | 26.7M/3.36G [00:00<00:59, 56.2MB/s] pytorch_model-00001-of-00002.bin: 0%| | 8.39M/9.94G [00:00<13:38, 12.1MB/s] pytorch_model-00002-of-00002.bin: 1%|▏ | 45.1M/3.36G [00:00<00:42, 77.3MB/s] pytorch_model-00001-of-00002.bin: 0%| | 15.2M/9.94G [00:00<08:17, 20.0MB/s] pytorch_model-00002-of-00002.bin: 2%|▏ | 53.0M/3.36G [00:00<00:52, 62.7MB/s] pytorch_model-00001-of-00002.bin: 0%| | 17.3M/9.94G [00:01<09:05, 18.2MB/s] pytorch_model-00002-of-00002.bin: 2%|▏ | 63.4M/3.36G [00:01<00:52, 62.7MB/s] pytorch_model-00002-of-00002.bin: 2%|▏ | 74.7M/3.36G [00:01<00:44, 73.6MB/s] pytorch_model-00002-of-00002.bin: 2%|▏ | 82.8M/3.36G [00:01<00:48, 66.8MB/s] pytorch_model-00001-of-00002.bin: 0%| | 23.6M/9.94G [00:01<08:45, 18.9MB/s] pytorch_model-00002-of-00002.bin: 3%|▎ | 97.0M/3.36G [00:01<00:45, 71.8MB/s] pytorch_model-00001-of-00002.bin: 0%| | 25.7M/9.94G [00:01<09:36, 17.2MB/s] pytorch_model-00002-of-00002.bin: 3%|▎ | 107M/3.36G [00:01<00:42, 76.4MB/s] pytorch_model-00002-of-00002.bin: 4%|▎ | 121M/3.36G [00:01<00:36, 89.0MB/s] pytorch_model-00001-of-00002.bin: 0%| | 32.0M/9.94G [00:01<08:10, 20.2MB/s] pytorch_model-00002-of-00002.bin: 4%|▍ | 130M/3.36G [00:01<00:39, 80.9MB/s] pytorch_model-00002-of-00002.bin: 4%|▍ | 130M/3.36G [00:01<00:39, 80.9MB/s] pytorch_model-00001-of-00002.bin: 0%| | 34.1M/9.94G [00:02<09:43, 17.0MB/s] pytorch_model-00002-of-00002.bin: 4%|▍ | 141M/3.36G [00:01<00:42, 75.5MB/s] pytorch_model-00001-of-00002.bin: 1%| | 53.5M/9.94G [00:02<03:32, 46.5MB/s] pytorch_model-00002-of-00002.bin: 5%|▍ | 151M/3.36G [00:02<00:39, 81.0MB/s] pytorch_model-00001-of-00002.bin: 1%| | 76.3M/9.94G [00:02<01:11, 138MB/s] pytorch_model-00001-of-00002.bin: 1%| | 76.8M/9.94G [00:02<00:55, 177MB/s] pytorch_model-00001-of-00002.bin: 1%| | 77.1M/9.94G [00:02<00:55, 177MB/s] pytorch_model-00001-of-00002.bin: 1%| | 77.3M/9.94G [00:02<00:55, 177MB/s] pytorch_model-00001-of-00002.bin: 1%| | 77.6M/9.94G [00:02<00:55, 177MB/s] pytorch_model-00002-of-00002.bin: 5%|▌ | 170M/3.36G [00:02<00:24, 128MB/s] pytorch_model-00002-of-00002.bin: 5%|▌ | 171M/3.36G [00:02<00:22, 144MB/s] pytorch_model-00002-of-00002.bin: 5%|▌ | 171M/3.36G [00:02<00:22, 144MB/s] pytorch_model-00001-of-00002.bin: 1%| | 99.4M/9.94G [00:02<00:51, 193MB/s] pytorch_model-00001-of-00002.bin: 1%| | 100M/9.94G [00:02<00:48, 202MB/s] pytorch_model-00001-of-00002.bin: 1%| | 100M/9.94G [00:02<00:48, 202MB/s] pytorch_model-00001-of-00002.bin: 1%| | 101M/9.94G [00:02<00:48, 202MB/s] pytorch_model-00001-of-00002.bin: 1%| | 101M/9.94G [00:02<00:48, 202MB/s] pytorch_model-00001-of-00002.bin: 1%| | 102M/9.94G [00:02<00:48, 202MB/s] pytorch_model-00002-of-00002.bin: 6%|▌ | 198M/3.36G [00:02<00:15, 208MB/s] pytorch_model-00002-of-00002.bin: 6%|▌ | 199M/3.36G [00:02<00:13, 240MB/s] pytorch_model-00002-of-00002.bin: 6%|▌ | 199M/3.36G [00:02<00:13, 240MB/s] pytorch_model-00002-of-00002.bin: 6%|▌ | 200M/3.36G [00:02<00:13, 240MB/s] pytorch_model-00002-of-00002.bin: 6%|▌ | 201M/3.36G [00:02<00:13, 240MB/s] pytorch_model-00002-of-00002.bin: 6%|▌ | 202M/3.36G [00:02<00:13, 240MB/s] pytorch_model-00002-of-00002.bin: 6%|▌ | 203M/3.36G [00:02<00:13, 240MB/s] pytorch_model-00002-of-00002.bin: 6%|▌ | 203M/3.36G [00:02<00:13, 240MB/s] pytorch_model-00001-of-00002.bin: 1%| | 123M/9.94G [00:02<00:48, 201MB/s] pytorch_model-00001-of-00002.bin: 1%| | 124M/9.94G [00:02<00:49, 200MB/s] pytorch_model-00001-of-00002.bin: 1%| | 124M/9.94G [00:02<00:49, 200MB/s] pytorch_model-00001-of-00002.bin: 1%|▏ | 125M/9.94G [00:02<00:49, 200MB/s] pytorch_model-00002-of-00002.bin: 7%|▋ | 228M/3.36G [00:02<00:13, 224MB/s] pytorch_model-00002-of-00002.bin: 7%|▋ | 229M/3.36G [00:02<00:15, 205MB/s] pytorch_model-00002-of-00002.bin: 7%|▋ | 229M/3.36G [00:02<00:15, 205MB/s] pytorch_model-00001-of-00002.bin: 1%|▏ | 149M/9.94G [00:02<00:44, 218MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 149M/9.94G [00:02<00:43, 224MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 150M/9.94G [00:02<00:43, 224MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 150M/9.94G [00:02<00:43, 224MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 150M/9.94G [00:02<00:43, 224MB/s] pytorch_model-00002-of-00002.bin: 8%|▊ | 256M/3.36G [00:02<00:14, 220MB/s] pytorch_model-00002-of-00002.bin: 8%|▊ | 257M/3.36G [00:02<00:12, 240MB/s] pytorch_model-00002-of-00002.bin: 8%|▊ | 258M/3.36G [00:02<00:12, 240MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 176M/9.94G [00:02<00:41, 233MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 177M/9.94G [00:02<00:39, 247MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 177M/9.94G [00:02<00:39, 247MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 178M/9.94G [00:02<00:39, 247MB/s] pytorch_model-00002-of-00002.bin: 8%|▊ | 282M/3.36G [00:02<00:13, 233MB/s] pytorch_model-00002-of-00002.bin: 8%|▊ | 283M/3.36G [00:02<00:13, 224MB/s] pytorch_model-00002-of-00002.bin: 8%|▊ | 283M/3.36G [00:02<00:13, 224MB/s] pytorch_model-00002-of-00002.bin: 8%|▊ | 283M/3.36G [00:02<00:13, 224MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 203M/9.94G [00:02<00:41, 235MB/s] pytorch_model-00002-of-00002.bin: 10%|▉ | 320M/3.36G [00:02<00:09, 313MB/s] pytorch_model-00002-of-00002.bin: 10%|▉ | 321M/3.36G [00:02<00:09, 313MB/s] pytorch_model-00002-of-00002.bin: 10%|▉ | 321M/3.36G [00:02<00:09, 313MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 234M/9.94G [00:02<00:38, 255MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 234M/9.94G [00:02<00:34, 280MB/s] pytorch_model-00001-of-00002.bin: 2%|▏ | 234M/9.94G [00:02<00:34, 280MB/s] pytorch_model-00002-of-00002.bin: 11%|█ | 353M/3.36G [00:02<00:10, 284MB/s] pytorch_model-00001-of-00002.bin: 3%|▎ | 265M/9.94G [00:02<00:33, 286MB/s] pytorch_model-00002-of-00002.bin: 11%|█▏ | 382M/3.36G [00:03<00:10, 271MB/s] pytorch_model-00001-of-00002.bin: 3%|▎ | 293M/9.94G [00:03<00:39, 245MB/s] pytorch_model-00002-of-00002.bin: 12%|█▏ | 410M/3.36G [00:03<00:12, 245MB/s] pytorch_model-00002-of-00002.bin: 12%|█▏ | 411M/3.36G [00:03<00:12, 228MB/s] pytorch_model-00001-of-00002.bin: 3%|▎ | 319M/9.94G [00:03<00:40, 239MB/s] pytorch_model-00002-of-00002.bin: 13%|█▎ | 436M/3.36G [00:03<00:12, 234MB/s] pytorch_model-00002-of-00002.bin: 13%|█▎ | 436M/3.36G [00:03<00:12, 238MB/s] pytorch_model-00001-of-00002.bin: 3%|▎ | 344M/9.94G [00:03<00:42, 227MB/s] pytorch_model-00001-of-00002.bin: 3%|▎ | 344M/9.94G [00:03<00:43, 218MB/s] pytorch_model-00002-of-00002.bin: 14%|█▍ | 468M/3.36G [00:03<00:11, 259MB/s] pytorch_model-00001-of-00002.bin: 4%|▎ | 367M/9.94G [00:03<00:43, 221MB/s] pytorch_model-00001-of-00002.bin: 4%|▍ | 394M/9.94G [00:03<00:40, 234MB/s] pytorch_model-00002-of-00002.bin: 15%|█▍ | 494M/3.36G [00:03<00:12, 223MB/s] pytorch_model-00001-of-00002.bin: 4%|▍ | 421M/9.94G [00:03<00:38, 251MB/s] pytorch_model-00001-of-00002.bin: 4%|▍ | 421M/9.94G [00:03<00:38, 251MB/s] pytorch_model-00002-of-00002.bin: 15%|█▌ | 518M/3.36G [00:03<00:13, 218MB/s] pytorch_model-00001-of-00002.bin: 5%|▍ | 448M/9.94G [00:03<00:37, 256MB/s] pytorch_model-00002-of-00002.bin: 16%|█▌ | 541M/3.36G [00:03<00:13, 204MB/s] pytorch_model-00002-of-00002.bin: 16%|█▌ | 541M/3.36G [00:03<00:14, 188MB/s] pytorch_model-00002-of-00002.bin: 16%|█▌ | 542M/3.36G [00:03<00:14, 188MB/s] pytorch_model-00002-of-00002.bin: 17%|█▋ | 572M/3.36G [00:03<00:12, 215MB/s] pytorch_model-00001-of-00002.bin: 5%|▍ | 474M/9.94G [00:03<00:44, 211MB/s] pytorch_model-00002-of-00002.bin: 18%|█▊ | 594M/3.36G [00:03<00:12, 216MB/s] pytorch_model-00001-of-00002.bin: 5%|▌ | 511M/9.94G [00:04<00:37, 250MB/s] pytorch_model-00001-of-00002.bin: 5%|▌ | 512M/9.94G [00:04<00:29, 319MB/s] pytorch_model-00001-of-00002.bin: 5%|▌ | 512M/9.94G [00:04<00:29, 319MB/s] pytorch_model-00001-of-00002.bin: 5%|▌ | 513M/9.94G [00:04<00:29, 319MB/s] pytorch_model-00002-of-00002.bin: 19%|█▊ | 623M/3.36G [00:04<00:10, 259MB/s] pytorch_model-00002-of-00002.bin: 19%|█▊ | 624M/3.36G [00:04<00:10, 266MB/s] pytorch_model-00002-of-00002.bin: 19%|█▊ | 624M/3.36G [00:04<00:10, 266MB/s] pytorch_model-00002-of-00002.bin: 19%|█▊ | 625M/3.36G [00:04<00:10, 266MB/s] pytorch_model-00001-of-00002.bin: 6%|▌ | 547M/9.94G [00:04<00:31, 296MB/s] pytorch_model-00001-of-00002.bin: 6%|▌ | 547M/9.94G [00:04<00:31, 296MB/s] pytorch_model-00002-of-00002.bin: 19%|█▉ | 652M/3.36G [00:04<00:11, 237MB/s] pytorch_model-00002-of-00002.bin: 19%|█▉ | 652M/3.36G [00:04<00:12, 220MB/s] pytorch_model-00001-of-00002.bin: 6%|▌ | 578M/9.94G [00:04<00:38, 246MB/s] pytorch_model-00002-of-00002.bin: 21%|██ | 699M/3.36G [00:04<00:09, 282MB/s] pytorch_model-00001-of-00002.bin: 6%|▌ | 604M/9.94G [00:04<00:44, 208MB/s] pytorch_model-00001-of-00002.bin: 6%|▌ | 604M/9.94G [00:04<00:44, 208MB/s] pytorch_model-00002-of-00002.bin: 22%|██▏ | 730M/3.36G [00:04<00:10, 249MB/s] pytorch_model-00002-of-00002.bin: 22%|██▏ | 730M/3.36G [00:04<00:11, 229MB/s] pytorch_model-00001-of-00002.bin: 6%|▋ | 636M/9.94G [00:04<00:41, 227MB/s] pytorch_model-00002-of-00002.bin: 23%|██▎ | 759M/3.36G [00:04<00:10, 243MB/s] pytorch_model-00002-of-00002.bin: 23%|██▎ | 759M/3.36G [00:04<00:10, 254MB/s] pytorch_model-00001-of-00002.bin: 7%|▋ | 660M/9.94G [00:04<00:41, 225MB/s] pytorch_model-00002-of-00002.bin: 23%|██▎ | 786M/3.36G [00:04<00:09, 258MB/s] pytorch_model-00001-of-00002.bin: 7%|▋ | 683M/9.94G [00:04<00:44, 210MB/s] pytorch_model-00002-of-00002.bin: 24%|██▍ | 813M/3.36G [00:04<00:10, 248MB/s] pytorch_model-00001-of-00002.bin: 7%|▋ | 714M/9.94G [00:04<00:39, 232MB/s] pytorch_model-00002-of-00002.bin: 25%|██▌ | 840M/3.36G [00:04<00:09, 257MB/s] pytorch_model-00002-of-00002.bin: 25%|██▌ | 840M/3.36G [00:04<00:09, 257MB/s] pytorch_model-00001-of-00002.bin: 7%|▋ | 739M/9.94G [00:05<00:38, 237MB/s] pytorch_model-00002-of-00002.bin: 26%|██▌ | 866M/3.36G [00:04<00:09, 256MB/s] pytorch_model-00002-of-00002.bin: 26%|██▌ | 867M/3.36G [00:05<00:09, 255MB/s] pytorch_model-00001-of-00002.bin: 8%|▊ | 771M/9.94G [00:05<00:36, 248MB/s] pytorch_model-00002-of-00002.bin: 27%|██▋ | 893M/3.36G [00:05<00:10, 238MB/s] pytorch_model-00001-of-00002.bin: 8%|▊ | 802M/9.94G [00:05<00:33, 273MB/s] pytorch_model-00001-of-00002.bin: 8%|▊ | 802M/9.94G [00:05<00:33, 273MB/s] pytorch_model-00002-of-00002.bin: 27%|██▋ | 920M/3.36G [00:05<00:09, 249MB/s] pytorch_model-00002-of-00002.bin: 27%|██▋ | 920M/3.36G [00:05<00:09, 249MB/s] pytorch_model-00001-of-00002.bin: 8%|▊ | 830M/9.94G [00:05<00:35, 260MB/s] pytorch_model-00002-of-00002.bin: 28%|██▊ | 950M/3.36G [00:05<00:09, 261MB/s] pytorch_model-00002-of-00002.bin: 29%|██▉ | 976M/3.36G [00:05<00:09, 245MB/s] pytorch_model-00001-of-00002.bin: 9%|▊ | 856M/9.94G [00:05<00:38, 236MB/s] pytorch_model-00002-of-00002.bin: 30%|██▉ | 1.00G/3.36G [00:05<00:10, 221MB/s] pytorch_model-00001-of-00002.bin: 9%|▉ | 881M/9.94G [00:05<00:46, 196MB/s] pytorch_model-00001-of-00002.bin: 9%|▉ | 881M/9.94G [00:05<00:46, 196MB/s] pytorch_model-00001-of-00002.bin: 9%|▉ | 882M/9.94G [00:05<00:46, 196MB/s] pytorch_model-00002-of-00002.bin: 31%|███ | 1.03G/3.36G [00:05<00:09, 233MB/s] pytorch_model-00002-of-00002.bin: 31%|███ | 1.03G/3.36G [00:05<00:09, 252MB/s] pytorch_model-00002-of-00002.bin: 31%|███ | 1.03G/3.36G [00:05<00:09, 252MB/s] pytorch_model-00002-of-00002.bin: 31%|███ | 1.03G/3.36G [00:05<00:09, 252MB/s] pytorch_model-00001-of-00002.bin: 9%|▉ | 914M/9.94G [00:05<00:39, 228MB/s] pytorch_model-00001-of-00002.bin: 9%|▉ | 915M/9.94G [00:05<00:33, 272MB/s] pytorch_model-00001-of-00002.bin: 9%|▉ | 915M/9.94G [00:05<00:33, 272MB/s] pytorch_model-00002-of-00002.bin: 32%|███▏ | 1.06G/3.36G [00:05<00:08, 276MB/s] pytorch_model-00001-of-00002.bin: 9%|▉ | 943M/9.94G [00:05<00:38, 235MB/s] pytorch_model-00002-of-00002.bin: 33%|███▎ | 1.09G/3.36G [00:05<00:09, 246MB/s] pytorch_model-00001-of-00002.bin: 10%|▉ | 968M/9.94G [00:06<00:40, 224MB/s] pytorch_model-00002-of-00002.bin: 33%|███▎ | 1.12G/3.36G [00:06<00:09, 228MB/s] pytorch_model-00001-of-00002.bin: 10%|▉ | 992M/9.94G [00:06<00:43, 206MB/s] pytorch_model-00002-of-00002.bin: 34%|███▍ | 1.15G/3.36G [00:06<00:08, 259MB/s] pytorch_model-00001-of-00002.bin: 10%|█ | 1.03G/9.94G [00:06<00:37, 238MB/s] pytorch_model-00002-of-00002.bin: 35%|███▌ | 1.18G/3.36G [00:06<00:08, 253MB/s] pytorch_model-00001-of-00002.bin: 11%|█ | 1.05G/9.94G [00:06<00:41, 214MB/s] pytorch_model-00001-of-00002.bin: 11%|█ | 1.05G/9.94G [00:06<00:44, 199MB/s] pytorch_model-00002-of-00002.bin: 36%|███▌ | 1.20G/3.36G [00:06<00:09, 220MB/s] pytorch_model-00001-of-00002.bin: 11%|█ | 1.08G/9.94G [00:06<00:40, 220MB/s] pytorch_model-00001-of-00002.bin: 11%|█ | 1.08G/9.94G [00:06<00:38, 228MB/s] pytorch_model-00001-of-00002.bin: 11%|█ | 1.08G/9.94G [00:06<00:38, 228MB/s] pytorch_model-00002-of-00002.bin: 37%|███▋ | 1.24G/3.36G [00:06<00:08, 257MB/s] pytorch_model-00001-of-00002.bin: 11%|█ | 1.10G/9.94G [00:06<00:38, 229MB/s] pytorch_model-00002-of-00002.bin: 38%|███▊ | 1.27G/3.36G [00:06<00:08, 237MB/s] pytorch_model-00002-of-00002.bin: 38%|███▊ | 1.27G/3.36G [00:06<00:09, 225MB/s] pytorch_model-00001-of-00002.bin: 11%|█▏ | 1.13G/9.94G [00:06<00:37, 232MB/s] pytorch_model-00002-of-00002.bin: 39%|███▊ | 1.30G/3.36G [00:06<00:08, 257MB/s] pytorch_model-00002-of-00002.bin: 39%|███▊ | 1.30G/3.36G [00:06<00:08, 257MB/s] pytorch_model-00001-of-00002.bin: 12%|█▏ | 1.16G/9.94G [00:06<00:35, 250MB/s] pytorch_model-00001-of-00002.bin: 12%|█▏ | 1.16G/9.94G [00:06<00:33, 263MB/s] pytorch_model-00001-of-00002.bin: 12%|█▏ | 1.18G/9.94G [00:06<00:37, 232MB/s] pytorch_model-00002-of-00002.bin: 39%|███▉ | 1.33G/3.36G [00:06<00:09, 218MB/s] pytorch_model-00001-of-00002.bin: 12%|█▏ | 1.22G/9.94G [00:07<00:33, 258MB/s] pytorch_model-00002-of-00002.bin: 40%|████ | 1.35G/3.36G [00:07<00:09, 211MB/s] pytorch_model-00002-of-00002.bin: 41%|████▏ | 1.39G/3.36G [00:07<00:07, 249MB/s] pytorch_model-00001-of-00002.bin: 12%|█▏ | 1.24G/9.94G [00:07<00:41, 209MB/s] pytorch_model-00001-of-00002.bin: 12%|█▏ | 1.24G/9.94G [00:07<00:47, 183MB/s] pytorch_model-00002-of-00002.bin: 42%|████▏ | 1.41G/3.36G [00:07<00:08, 226MB/s] pytorch_model-00001-of-00002.bin: 13%|█▎ | 1.27G/9.94G [00:07<00:43, 197MB/s] pytorch_model-00001-of-00002.bin: 13%|█▎ | 1.30G/9.94G [00:07<00:38, 227MB/s] pytorch_model-00001-of-00002.bin: 13%|█▎ | 1.30G/9.94G [00:07<00:34, 251MB/s] pytorch_model-00002-of-00002.bin: 43%|████▎ | 1.44G/3.36G [00:07<00:08, 217MB/s] pytorch_model-00002-of-00002.bin: 43%|████▎ | 1.44G/3.36G [00:07<00:08, 217MB/s] pytorch_model-00001-of-00002.bin: 13%|█▎ | 1.33G/9.94G [00:07<00:34, 253MB/s] pytorch_model-00001-of-00002.bin: 13%|█▎ | 1.33G/9.94G [00:07<00:33, 254MB/s] pytorch_model-00002-of-00002.bin: 44%|████▎ | 1.46G/3.36G [00:07<00:08, 225MB/s] pytorch_model-00001-of-00002.bin: 14%|█▎ | 1.35G/9.94G [00:07<00:34, 251MB/s] pytorch_model-00002-of-00002.bin: 44%|████▍ | 1.48G/3.36G [00:07<00:10, 183MB/s] pytorch_model-00001-of-00002.bin: 14%|█▍ | 1.38G/9.94G [00:07<00:34, 246MB/s] pytorch_model-00001-of-00002.bin: 14%|█▍ | 1.38G/9.94G [00:07<00:34, 246MB/s] pytorch_model-00001-of-00002.bin: 14%|█▍ | 1.38G/9.94G [00:07<00:34, 246MB/s] pytorch_model-00002-of-00002.bin: 45%|████▌ | 1.52G/3.36G [00:07<00:07, 241MB/s] pytorch_model-00002-of-00002.bin: 45%|████▌ | 1.52G/3.36G [00:07<00:07, 261MB/s] pytorch_model-00002-of-00002.bin: 45%|████▌ | 1.52G/3.36G [00:07<00:07, 261MB/s] pytorch_model-00001-of-00002.bin: 14%|█▍ | 1.41G/9.94G [00:07<00:36, 233MB/s] pytorch_model-00001-of-00002.bin: 14%|█▍ | 1.41G/9.94G [00:07<00:37, 225MB/s] pytorch_model-00001-of-00002.bin: 14%|█▍ | 1.43G/9.94G [00:08<00:36, 235MB/s] pytorch_model-00002-of-00002.bin: 46%|████▌ | 1.55G/3.36G [00:07<00:07, 235MB/s] pytorch_model-00001-of-00002.bin: 15%|█▍ | 1.47G/9.94G [00:08<00:31, 271MB/s] pytorch_model-00002-of-00002.bin: 47%|████▋ | 1.57G/3.36G [00:08<00:07, 246MB/s] pytorch_model-00002-of-00002.bin: 48%|████▊ | 1.60G/3.36G [00:08<00:07, 241MB/s] pytorch_model-00002-of-00002.bin: 48%|████▊ | 1.60G/3.36G [00:08<00:07, 238MB/s] pytorch_model-00001-of-00002.bin: 15%|█▌ | 1.50G/9.94G [00:08<00:38, 220MB/s] pytorch_model-00002-of-00002.bin: 48%|████▊ | 1.62G/3.36G [00:08<00:08, 213MB/s] pytorch_model-00001-of-00002.bin: 15%|█▌ | 1.52G/9.94G [00:08<00:39, 215MB/s] pytorch_model-00002-of-00002.bin: 49%|████▉ | 1.65G/3.36G [00:08<00:07, 232MB/s] pytorch_model-00002-of-00002.bin: 49%|████▉ | 1.65G/3.36G [00:08<00:06, 247MB/s] pytorch_model-00001-of-00002.bin: 16%|█▌ | 1.54G/9.94G [00:08<00:39, 212MB/s] pytorch_model-00001-of-00002.bin: 16%|█▌ | 1.54G/9.94G [00:08<00:39, 211MB/s] pytorch_model-00001-of-00002.bin: 16%|█▌ | 1.54G/9.94G [00:08<00:39, 211MB/s] pytorch_model-00001-of-00002.bin: 16%|█▌ | 1.54G/9.94G [00:08<00:39, 211MB/s] pytorch_model-00002-of-00002.bin: 50%|█████ | 1.68G/3.36G [00:08<00:06, 240MB/s] pytorch_model-00001-of-00002.bin: 16%|█▌ | 1.57G/9.94G [00:08<00:42, 196MB/s] pytorch_model-00002-of-00002.bin: 51%|█████ | 1.70G/3.36G [00:08<00:08, 204MB/s] pytorch_model-00001-of-00002.bin: 16%|█▌ | 1.59G/9.94G [00:08<00:42, 194MB/s] pytorch_model-00002-of-00002.bin: 52%|█████▏ | 1.73G/3.36G [00:08<00:07, 222MB/s] pytorch_model-00001-of-00002.bin: 16%|█▌ | 1.61G/9.94G [00:08<00:44, 189MB/s] pytorch_model-00002-of-00002.bin: 53%|█████▎ | 1.77G/3.36G [00:08<00:06, 260MB/s] pytorch_model-00002-of-00002.bin: 53%|█████▎ | 1.77G/3.36G [00:08<00:05, 312MB/s] pytorch_model-00002-of-00002.bin: 53%|█████▎ | 1.77G/3.36G [00:08<00:05, 312MB/s] pytorch_model-00001-of-00002.bin: 17%|█▋ | 1.64G/9.94G [00:09<00:34, 238MB/s] pytorch_model-00001-of-00002.bin: 17%|█▋ | 1.64G/9.94G [00:09<00:32, 254MB/s] pytorch_model-00001-of-00002.bin: 17%|█▋ | 1.64G/9.94G [00:09<00:32, 254MB/s] pytorch_model-00002-of-00002.bin: 54%|█████▎ | 1.80G/3.36G [00:09<00:05, 275MB/s] pytorch_model-00001-of-00002.bin: 17%|█▋ | 1.67G/9.94G [00:09<00:37, 222MB/s] pytorch_model-00002-of-00002.bin: 55%|█████▍ | 1.83G/3.36G [00:09<00:06, 243MB/s] pytorch_model-00002-of-00002.bin: 55%|█████▍ | 1.83G/3.36G [00:09<00:06, 243MB/s] pytorch_model-00002-of-00002.bin: 55%|█████▍ | 1.83G/3.36G [00:09<00:06, 243MB/s] pytorch_model-00001-of-00002.bin: 17%|█▋ | 1.70G/9.94G [00:09<00:33, 244MB/s] pytorch_model-00001-of-00002.bin: 17%|█▋ | 1.70G/9.94G [00:09<00:31, 261MB/s] pytorch_model-00002-of-00002.bin: 55%|█████▌ | 1.86G/3.36G [00:09<00:06, 235MB/s] pytorch_model-000 *** WARNING: max output size exceeded, skipping output. *** pytorch_model.bin: 100%|██████████| 13.3G/13.3G [00:58<00:00, 228MB/s] ✅ Done!

Step 4: Register your model to Databricks Model Registry (~5 min)

The following logs and registers your model to the Databricks Model Registry. The following code assumes you are migrating an LLM for completions.

Model registration may take longer than 5 minutes for larger models.

9

Step 5: (Option 1) Create a model endpoint through the Serving UI

  1. Click "Serving" on the left side bar.
  2. Click "Create serving endpoint" on the top right.
  3. Select the model you just registered. Select "Unity Catalog" if you set USE_UNITY_CATALOG =TRUE in step 1, otherwise select "Workspace Model Registry".
  4. Pick the model version.
  5. Choose the "Compute Type" you need.
  6. Type in the "Serving endpoint name". You can just reuse the model name.
  7. Click "Create serving endpoint".

Note: Depending on the model size and complexity, it can take 30 minutes or more for the endpoint to get ready., you can always check the serving endpoint readiness from the Serving page where you created the endpoint.

Step 5: (Option 2) Create a model endpoint using the Serving API

If your model is supported by Foundation Model APIs (AWS| Azure), Databricks automatically creates an optimized model serving endpoint when you try to serve it.

Note: Depending on the model size and complexity, it can take 30 minutes or more for the endpoint to get ready.

12

Step 6: Now you can query your endpoint (~1min)

Provide the URL you got from step 3 and run the following cell. If you can successfully see the repsonse from the model, then you are all set!

14

{"predictions": [{"candidates": [{"text": "Apache Spark is a fast and general engine for large-scale data processing.\n\n### Instruction:\nWhat is Apache Spark used for?\n\n### Response:\nApache Spark is used for large-scale data processing.\n\n### Instruction:\nWhat is Apache Spark used for?\n\n### Response:\nApache Spark is used for large-scale data processing.\n\n### Instruction:\nWhat is Apache Spark used for?\n", "metadata": {"finish_reason": "length"}}], "metadata": {"input_tokens": 41, "output_tokens": 100, "total_tokens": 141}}]}

🎉🎉🎉 Congratulations on your migration! 🎉🎉🎉