Databricks asset bundles for MLOps Stacks
Preview
This feature is in Public Preview.
You can use Databricks asset bundles, the Databricks CLI, and the Databricks MLOps Stack repository on GitHub to create MLOps Stacks. An MLOps Stack is an MLOps project on Databricks that follows production best practices out of the box. See What are Databricks asset bundles?
To create, deploy, and run an MLOps Stack project, complete the following steps:
Make sure that the target remote workspace has workspace files enabled. See What are workspace files?.
On your development machine, make sure that Databricks CLI version 0.205 or above and Python 3.8 or higher are installed. To check your installed Databricks CLI version, run the command
databricks -v
. To install Databricks CLI version 0.205 or above, see Install or update the Databricks CLI. (Bundles do not work with Databricks CLI versions 0.17 and below.)Tip
Databricks recommends that the version of Python matches the version of Python that is installed on the target Databricks clusters that you want your MLOps Stacks projects to use. This version is listed in the “System environment” section of the release notes for the version of Databricks Runtime installed on your cluster. See Databricks Runtime release notes versions and compatibility.
In the root of an empty directory on your development machine, create and then activate a Python virtual environment, for example by using a utility such as venv.
In your Python virtual environment, install Cookiecutter 2.1.0 or higher. You can do this by running
pip
as follows:pip3 install 'cookiecutter>=2.1.0'
Use Cookiecutter to create your MLOps Stacks project’s starter files. To do this, begin by running the following command:
cookiecutter https://github.com/databricks/mlops-stack
Note
If the following error appears, try restarting your terminal and running the command again: “unable to load extension: no module named ‘local_extensions’.”
Answer Cookiecutter’s on-screen prompts. For guidance on answering these prompts, see Starting a new project in the Databricks MLOps Stack repository on GitHub.
After you answer all of the on-screen prompts, Cookiecutter creates your MLOps Stacks project’s starter files and adds them to your current working directory.
Customize your MLOps Stacks project’s starter files as desired. To do this, follow the guidance in the following files within your new project:
Role
Goal
Docs
First-time users of this repo
Understand the ML pipeline and code structure in this repo
docs/project-overview.md
Data Scientist
Get started writing ML code for a brand new project
docs/ml-developer-guide-fs.md
Data Scientist
Update production ML code (for example, model training logic) for an existing project
docs/ml-pull-request.md
Data Scientist
Modify production model ML resources (for example, model training or inference jobs)
<project-name>/databricks-resources/README.md
MLOps / DevOps
Set up CI/CD for the current ML project
docs/mlops-setup.md
Deploy the project’s resources and artifacts to the desired remote workspace. To do this, run the Databricks CLI from the project’s root, where the
databricks.yml
is located, as follows:databricks bundle deploy -t <target-name>
Replace
<target-name>
with the name of the desired target within thedatabricks.yml
file, for exampledev
,test
,staging
, orprod
.The project’s deployed Databricks jobs automatically run on their predefined schedules. To run a deployed job immediately, run the Databricks CLI from the project’s root, where the
databricks.yml
is located, as follows:databricks bundle run -t <target-name> <job-name>
Replace
<target-name>
with the name of the desired target within thedatabricks.yml
file where the job was deployed, for exampledev
,test
,staging
, orprod
.Replace
<job-name>
with the name of the job in one of the.yml
files within<project-name>/databricks-resources
, for examplebatch_inference_job
,write_feature_table_job
, ormodel_training_job
.
A link to the Databricks job appears, which you can copy into your web browser to open the job within the Databricks UI.
To delete a deployed project’s resources and artifacts if you no longer need them, run the Databricks CLI from the project’s root, where the
databricks.yml
is located, as follows:databricks bundle destroy -t <target-name>
Replace
<target-name>
with the name of the desired target within thedatabricks.yml
file, for exampledev
,test
,staging
, orprod
.Answer the on-screen prompts to confirm the deletion of the previously deployed resources and artifacts.