Specify a run identity for a Databricks Asset Bundles workflow

This article describes how to use the run_as setting to specify the identity to use when running Databricks Asset Bundles workflows.

The run_as setting can be configured as a top-level mapping to apply to resources, or within a target deployment mapping in a bundle configuration file. It can be set to a user_name or a service_principal_name.

This setting provides the ability to separate the identity used to deploy a bundle job or pipeline from the one used by the job or pipeline workflow to run. This increases the flexibility of bundle development and management, while also allowing guardrails to be established for deployments and runs. In particular:

  • If the identity used to deploy a bundle is the same as the identity configured in the bundle’s run_as setting there are no restrictions. All bundle resources are supported.

  • If the identity used to deploy a bundle is different from the identity configured in the bundle’s run_as setting, only a subset of bundle resources are supported. Pipelines and model serving endpoints are not supported.

Set a bundle run identity

To set the run identity of bundle resources, specify run_as as a top-level mapping as shown in the following example:

bundle:
  name: "run_as"

# This is the identity that will be used when "databricks bundle run my_test_job" is executed.
run_as:
  service_principal_name: "5cf3z04b-a73c-4x46-9f3d-52da7999069e"

resources:
  jobs:
    my_test_job _1:
      name: Test job 1
      tasks:
        - task_key: "task_1"
          new_cluster:
            num_workers: 1
            spark_version: 13.2.x-snapshot-scala2.12
            node_type_id: i3.xlarge
            runtime_engine: PHOTON
          notebook_task:
            notebook_path: "./test.py"
    my_test_job_2:
      name: Test job 2
      run_as:
        service_principal_name: "69511ed2-zb27-444c-9863-4bc8ff497637"
      tasks:
        - task_key: "task_2"
          notebook_task:
            notebook_path: "./test.py"

Important

The run_as setting is not supported for pipelines or model serving endpoints. An error occurs if these resources are defined in a bundle where run_as is also configured.

Set target deployment identities

It is best practice to configure run identities for staging and production target deployments. In addition, setting a run_as identity to a service principal for production targets is the most secure way of running a production workflow as it:

  • Ensures that either the workflow was deployed by the same service principal or by someone with CAN_USE permissions on the service principal itself.

  • Decouples the permission to run the production workflow from the identity that created or deployed the bundle.

  • Allows users to configure and set a service principal for production with fewer permissions than the identity used to deploy the production bundle.

In the following example databricks.yml configuration file, three target modes have been configured: development, staging, and production. The development mode is configured to run as an individual user, and the staging and production modes are configured to run using two different service principals. Service principals are always in the form of an application ID, which can be retrieved from a Service principal’s page in your workspace admin settings.

bundle:
  name: my_targeted_bundle

run_as:
  service_principal_name: "5cf3z04b-a73c-4x46-9f3d-52da7999069e"

targets:
  # Development deployment settings, set as the default
  development:
    mode: development
    default: true
    workspace:
      host: https://my-host.cloud.databricks.com
    run_as:
      user_name: someone@example.com

  # Staging deployment settings
  staging:
    workspace:
      host: https://my-host.cloud.databricks.com
      root_path: /Shared/staging-workspace/.bundle/${bundle.name}/${bundle.target}
    run_as:
      service_principal_name: "69511ed2-zb27-444c-9863-4bc8ff497637"

  # Production deployment settings
  production:
    mode: production
    workspace:
      host: https://my-host.cloud.databricks.com
      root_path: /Shared/production-workspace/.bundle/${bundle.name}/${bundle.target}
    run_as:
      service_principal_name: "68ed9cd5-8923-4851-x0c1-c7536c67ff99"

resources:
  jobs:
    my_test_job:
      name: Test job
      tasks:
        - task_key: "task"
          new_cluster:
            num_workers: 1
            spark_version: 13.3.x-cpu-ml-scala2.12
            node_type_id: i3.xlarge
            runtime_engine: STANDARD
          notebook_task:
            notebook_path: "./test.py"