Tutorial: Create and deploy a Mosaic AI Model Training run
Preview
This feature is in Public Preview in us-east-1
and us-west-2
.
This article describes how to create and configure a run using the Mosaic AI Model Training (formerly Foundation Model Training) API, and then review the results and deploy the model using the Databricks UI and Mosaic AI Model Serving.
Requirements
A workspace in the
us-east-1
orus-west-2
AWS region.Databricks Runtime 12.2 LTS ML or above.
This tutorial must be run in a Databricks notebook.
Training data in the accepted format. See Prepare data for Mosaic AI Model Training.
Step 2: Install the databricks_genai
SDK
Use the following to install the databricks_genai
SDK.
%pip install databricks_genai
Next, import the foundation_model
library:
dbutils.library.restartPython()
from databricks.model_training import foundation_model as fm
Step 3: Create a training run
Create a training run using the Mosaic AI Model Training’s create()
function. The following parameters are required:
model
: the model you want to train.train_data_path
: the location of the training dataset in.register_to
: the Unity Catalog catalog and schema where you want checkpoints saved in.
For example:
run = fm.create(model='meta-llama/Meta-Llama-3.1-8B-Instruct',
train_data_path='dbfs:/Volumes/main/my-directory/ift/train.jsonl', # UC Volume with JSONL formatted data
register_to='main.my-directory',
training_duration='1ep')
run
Step 4: View the status of a run
The time it takes to complete a training run depends on the number of tokens, the model, and GPU availability. For faster training, Databricks recommends that you use reserved compute. Reach out to your Databricks account team for details.
After you launch your run, you can monitor the status of it using get_events()
.
run.get_events()
Step 5: View metrics and outputs
Follow these steps to view the results in the Databricks UI:
In the Databricks workspace, click Experiments in the left nav bar.
Select your experiment from the list.
Review the metrics charts in the Charts tab. Training metrics are generated for each training run and evaluation metrics are only generated if an evaluation data path is provided.
The primary training metric showing progress is loss. Evaluation loss can be used to see if your model is overfitting to your training data. However, loss should not be relied on entirely because in supervised training tasks, the evaluation loss can appear to be overfitting while the model continues to improve.
The higher the accuracy the better your model, but keep in mind that accuracy close to 100% might demonstrate overfitting.
The following metrics appear in MLflow after your run:
LanguageCrossEntropy
computes cross entropy on language modeling outputs. A lower score is better.LanguagePerplexity
measures how well a language model predicts the next word or character in a block of text based on previous words or characters. A lower score is better.TokenAccuracy
computes token-level accuracy for language modeling. A higher score is better.
In this tab, you can also view the output of your evaluation prompts if you specified them.
Step 7: Deploy your model
The training run automatically registers your model in Unity Catalog after it completes. The model is registered based on what you specified in the register_to
field in the run create()
method.
To deploy the model for serving, follow these steps:
Navigate to the model in Unity Catalog.
Click Serve this model.
Click Create serving endpoint.
In the Name field, provide a name for your endpoint.
Click Create.
Additional resources
Create a training run using the Mosaic AI Model Training API
See the Instruction fine-tuning: Named Entity Recognition demo notebook for an instruction fine-tuning example that walks through data preparation, fine-tuning training run configuration and deployment.