Tutorial: Create and deploy a Mosaic AI Model Training run

Important

This feature is in Public Preview. Reach out to your Databricks account team to enroll in the Public Preview.

This article describes how to create and configure a run using the Mosaic AI Model Training (formerly Foundation Model Training) API, and then review the results and deploy the model using the Databricks UI and Mosaic AI Model Serving.

Requirements

Step 1: Prepare your data for training

See Prepare data for Mosaic AI Model Training.

Step 2: Install the databricks_genai SDK

Use the following to install the databricks_genai SDK.

%pip install databricks_genai

Next, import the foundation_model library:

dbutils.library.restartPython()
from databricks.model_training import foundation_model as fm

Step 3: Create a training run

Create a training run using the Mosaic AI Model Training’s create() function. The following parameters are required:

  • model: the model you want to train.

  • train_data_path: the location of the training dataset in.

  • register_to: the Unity Catalog catalog and schema where you want checkpoints saved in.

For example:

run = fm.create(model='meta-llama/Llama-2-7b-chat-hf',
                train_data_path='dbfs:/Volumes/main/my-directory/ift/train.jsonl', # UC Volume with JSONL formatted data
                register_to='main.my-directory',
                training_duration='1ep')

run

Step 4: View the status of a run

The time it takes to complete a training run depends on the number of tokens, the model, and GPU availability. For faster training, Databricks recommends that you use reserved compute. Reach out to your Databricks account team for details.

After you launch your run, you can monitor the status of it using get_events().

run.get_events()

Step 5: View metrics and outputs

Follow these steps to view the results in the Databricks UI:

  1. In the Databricks workspace, click Experiments in the left nav bar.

  2. Select your experiment from the list.

  3. Review the metrics charts in the Charts tab.

    1. The primary training metric showing progress is loss. Evaluation loss can be used to see if your model is overfitting to your training data. However, loss should not be relied on entirely because in supervised training tasks, the evaluation loss can appear to be overfitting while the model continues to improve.

    2. The higher the accuracy the better your model, but keep in mind that accuracy close to 100% might demonstrate overfitting.

    3. In this tab, you can also view the output of your evaluation prompts if you specified them.

Step 6: Evaluate multiple customized model with MLflow LLM Evaluate before deploy

See Evaluate large language models with MLflow.

Step 7: Deploy your model

The training run automatically registers your model in Unity Catalog after it completes. The model is registered based on what you specified in the register_to field in the run create() method.

To deploy the model for serving, follow these steps:

  1. Navigate to the model in Unity Catalog.

  2. Click Serve this model.

  3. Click Create serving endpoint.

  4. In the Name field, provide a name for your endpoint.

  5. Click Create.

Additional resources