Databricks asset bundles for MLOps Stacks

Preview

This feature is in Public Preview.

You can use Databricks asset bundles, the Databricks CLI, and the Databricks MLOps Stack repository on GitHub to create MLOps Stacks. An MLOps Stack is an MLOps project on Databricks that follows production best practices out of the box. See What are Databricks asset bundles?

To create, deploy, and run an MLOps Stack project, complete the following steps:

  1. Make sure that the target remote workspace has workspace files enabled. See What are workspace files?.

  2. On your development machine, make sure that Databricks CLI version 0.205 or above and Python 3.8 or higher are installed. To check your installed Databricks CLI version, run the command databricks -v. To install Databricks CLI version 0.205 or above, see Install or update the Databricks CLI. (Bundles do not work with Databricks CLI versions 0.17 and below.)

    Tip

    Databricks recommends that the version of Python matches the version of Python that is installed on the target Databricks clusters that you want your MLOps Stacks projects to use. This version is listed in the “System environment” section of the release notes for the version of Databricks Runtime installed on your cluster. See Databricks Runtime release notes versions and compatibility.

  3. In the root of an empty directory on your development machine, create and then activate a Python virtual environment, for example by using a utility such as venv.

  4. In your Python virtual environment, install Cookiecutter 2.1.0 or higher. You can do this by running pip as follows:

    pip3 install 'cookiecutter>=2.1.0'
    
  5. Use Cookiecutter to create your MLOps Stacks project’s starter files. To do this, begin by running the following command:

    cookiecutter https://github.com/databricks/mlops-stack
    

    Note

    If the following error appears, try restarting your terminal and running the command again: “unable to load extension: no module named ‘local_extensions’.”

  6. Answer Cookiecutter’s on-screen prompts. For guidance on answering these prompts, see Starting a new project in the Databricks MLOps Stack repository on GitHub.

    After you answer all of the on-screen prompts, Cookiecutter creates your MLOps Stacks project’s starter files and adds them to your current working directory.

  7. Customize your MLOps Stacks project’s starter files as desired. To do this, follow the guidance in the following files within your new project:

    Role

    Goal

    Docs

    First-time users of this repo

    Understand the ML pipeline and code structure in this repo

    docs/project-overview.md

    Data Scientist

    Get started writing ML code for a brand new project

    docs/ml-developer-guide-fs.md

    Data Scientist

    Update production ML code (for example, model training logic) for an existing project

    docs/ml-pull-request.md

    Data Scientist

    Modify production model ML resources (for example, model training or inference jobs)

    <project-name>/databricks-resources/README.md

    MLOps / DevOps

    Set up CI/CD for the current ML project

    docs/mlops-setup.md

  8. Deploy the project’s resources and artifacts to the desired remote workspace. To do this, run the Databricks CLI from the project’s root, where the databricks.yml is located, as follows:

    databricks bundle deploy -t <target-name>
    

    Replace <target-name> with the name of the desired target within the databricks.yml file, for example dev, test, staging, or prod.

  9. The project’s deployed Databricks jobs automatically run on their predefined schedules. To run a deployed job immediately, run the Databricks CLI from the project’s root, where the databricks.yml is located, as follows:

    databricks bundle run -t <target-name> <job-name>
    
    • Replace <target-name> with the name of the desired target within the databricks.yml file where the job was deployed, for example dev, test, staging, or prod.

    • Replace <job-name> with the name of the job in one of the .yml files within <project-name>/databricks-resources, for example batch_inference_job, write_feature_table_job, or model_training_job.

    A link to the Databricks job appears, which you can copy into your web browser to open the job within the Databricks UI.

  10. To delete a deployed project’s resources and artifacts if you no longer need them, run the Databricks CLI from the project’s root, where the databricks.yml is located, as follows:

    databricks bundle destroy -t <target-name>
    

    Replace <target-name> with the name of the desired target within the databricks.yml file, for example dev, test, staging, or prod.

    Answer the on-screen prompts to confirm the deletion of the previously deployed resources and artifacts.