Package custom artifacts and files for Serverless Real-Time Inference

Preview

This feature is in Public Preview.

This article describes how to ensure your model’s file and artifact dependencies are available on your Serverless Real-Time Inference endpoint for model serving. Learn more about Serverless Real-Time Inference.

Requirements

MLflow 1.29 and above

Package artifacts with models

When your model requires files or artifacts during inference, you can package them into the model artifact when you log the model.

If you’re working with Databricks Notebooks, a common practice is to have such files reside in DBFS. Models are also sometimes configured to download artifacts from the internet (such as HuggingFace Tokenizers). Real-time workloads at scale perform best when all required dependencies are statically captured at deployment time. For this reason, Serverless Real-Time Inference requires DBFS artifacts be packaged into the model artifact itself and uses MLflow interfaces to do so. Network artifacts loaded with the model should be packaged with the model whenever possible.

With the MLflow command log_model() you can log a model and its dependent artifacts with the artifacts parameter.

mlflow.pyfunc.log_model(
    ...
    artifacts={'model-weights': "/dbfs/path/to/file", "tokenizer_cache": "./tokenizer_cache"},
    ...
)

In PyFunc models, these artifacts’ paths are accessible from the context object under context.artifacts, and they can be loaded in the standard way for that file type.

For example, in a custom MLflow model:

class ModelPyfunc(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        self.model = torch.load(context.artifacts["model-weights"])
        self.tokenizer = transformers.BertweetTokenizer.from_pretrained("model-base", local_files_only=True, cache_dir=context.artifacts["tokenizer_cache"])
    ...

After your files and artifacts are packaged within your model artifact, you can serve your model to a Serverless Real-Time Inference endpoint.