宣言型自動化バンドルの構成

この記事では、宣言型自動化バンドル（旧称：Databricksアセットバンドル）を定義するバンドル構成ファイルの構文について説明します。「宣言型自動化バンドルとは何か？」を参照してください。

バンドルを作成および操作する方法については、「宣言型自動化バンドルの開発」を参照してください。

バンドル構成リファレンスについては、「構成リファレンス」を参照してください。

databricks.yml

バンドルには、バンドルプロジェクトフォルダのルートに databricks.yml という名前の設定ファイルが1つ(1つだけ)含まれている必要があります。databricks.yml はバンドルを定義するメイン構成ファイルですが、 include マッピング内の他の構成ファイル (リソース構成ファイルなど) を参照できます。バンドル構成は YAML で表されます。YAML の詳細については、公式の YAML 仕様を参照してください。

最も単純なdatabricks.yml 、必須のトップレベルマッピングであるバンドル名とターゲットデプロイメントを定義します。

YAML
bundle:
  name: my_bundle

targets:
  dev:
    default: true

すべてのトップレベルマッピングの詳細については、「構成リファレンス」を参照してください。

ヒント

Pythonによる宣言型自動化バンドルのサポートにより、Pythonでリソースを定義できます。Pythonにおけるバンドル設定を参照してください。

仕様

以下のYAML仕様は、宣言型自動化バンドルの最上位レベルの設定キーを提供します。構成に関する詳細なリファレンスについては、「構成リファレンス」および「宣言型自動化バンドル」のリソースを参照してください。

YAML
# This is the default bundle configuration if not otherwise overridden in
# the "targets" top-level mapping.
bundle: # Required.
  name: string # Required.
  databricks_cli_version: string
  cluster_id: string
  deployment: Map
  git:
    origin_url: string
    branch: string

# This is the identity to use to run the bundle
run_as:
  - user_name: <user-name>
  - service_principal_name: <service-principal-name>

# These are any additional configuration files to include.
include:
  - '<some-file-or-path-glob-to-include>'
  - '<another-file-or-path-glob-to-include>'

# These are any scripts that can be run.
scripts:
  <some-unique-script-name>:
    content: string

# These are any additional files or paths to include or exclude.
sync:
  include:
    - '<some-file-or-path-glob-to-include>'
    - '<another-file-or-path-glob-to-include>'
  exclude:
    - '<some-file-or-path-glob-to-exclude>'
    - '<another-file-or-path-glob-to-exclude>'
  paths:
    - '<some-file-or-path-to-synchronize>'

# These are the default artifact settings if not otherwise overridden in
# the targets top-level mapping.
artifacts:
  <some-unique-artifact-identifier>:
    build: string
    dynamic_version: boolean
    executable: string
    files:
      - source: string
    path: string
    type: string

# These are for any custom variables for use throughout the bundle.
variables:
  <some-unique-variable-name>:
    description: string
    default: string or complex
    lookup: Map
    type: string # The only valid value is "complex" if the variable is a complex variable, otherwise do not define this key.

# These are the workspace settings if not otherwise overridden in
# the targets top-level mapping.
workspace:
  artifact_path: string
  host: string
  profile: string
  resource_path: string
  root_path: string
  state_path: string

# These are the permissions to apply to resources defined
# in the resources mapping.
permissions:
  - level: <permission-level>
    group_name: <unique-group-name>
  - level: <permission-level>
    user_name: <unique-user-name>
  - level: <permission-level>
    service_principal_name: <unique-principal-name>

# These are the resource settings if not otherwise overridden in
# the targets top-level mapping.
resources:
  alerts:
    <unique-alert-name>:
      # alert settings
  apps:
    <unique-app-name>:
      # app settings
  catalogs:
    <unique-catalog-name>:
      # catalog settings
  clusters:
    <unique-cluster-name>:
      # cluster settings
  dashboards:
    <unique-dashboard-name>:
      # dashboard settings
  database_catalogs:
    <unique-database-catalog-name>:
      # database catalog settings
  database_instances:
    <unique-database-instance-name>:
      # database instance settings
  experiments:
    <unique-experiment-name>:
      # experiment settings
  jobs:
    <unique-job-name>:
      # job settings
  model_serving_endpoints:
    <unique-model-serving-endpoint-name>:
    # model_serving_endpoint settings
  pipelines:
    <unique-pipeline-name>:
      # pipeline settings
  postgres_branches:
    <unique-postgres-branch-name>:
      # postgres branch settings
  postgres_endpoints:
    <unique-postgres-endpoint-name>:
      # postgres endpoint settings
  postgres_projects:
    <unique-postgres-project-name>:
      # postgres project settings
  quality_monitors:
    <unique-quality-monitor-name>:
    # quality monitor settings
  registered_models:
    <unique-registered-model-name>:
    # registered model settings
  schemas:
    <unique-schema-name>:
      # schema settings
  secret_scopes:
    <unique-secret-scope-name>:
      # secret scopes settings
  sql_warehouses:
    <unique-sql-warehouse-name>:
      # sql warehouse settings
  synced_database_tables:
    <unique-synced-database-table-name>:
      # synced database table settings
  volumes:
    <unique-volume-name>:
    # volumes settings

# These are the targets to use for deployments and workflow runs. One and only one of these
# targets can be set to "default: true".
targets:
  <some-unique-programmatic-identifier-for-this-target>:
    artifacts:
      # artifact build settings for this target
    bundle:
      # bundle settings for this target
    default: boolean
    git: Map
    mode: string
    permissions:
      # permissions for this target
    presets:
      <preset>: <value>
    resources:
      # resource settings for this target
    sync:
      # sync settings for this target
    variables:
      <defined-variable-name>: <non-default-value> # value for this target
    workspace:
      # workspace settings for this target
    run_as:
      # run_as settings for this target

例

このセクションでは、バンドルの仕組みと構成の構成方法を理解するのに役立つ基本的な例をいくつか紹介します。

注記

バンドルの機能と一般的なバンドルの使用例を示す構成例については、バンドルの構成例と GitHub のバンドル例リポジトリを参照してください。

次のバンドル設定の例では、バンドル設定ファイル databricks.ymlと同じディレクトリにある hello.py という名前のローカルファイルを指定しています。このノートブックは、指定されたクラスタリング ID を持つリモートクラスタリングを使用してジョブとして実行されました。リモートワークスペース URL とワークスペース認証資格情報は、呼び出し元のローカル構成プロファイル ( DEFAULTという名前) から読み取られます。

YAML
bundle:
  name: hello-bundle

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

targets:
  dev:
    default: true

次の例では、異なるリモートワークスペースURLとワークスペース認証資格証明を使用する prod という名前のターゲットを追加します。これらは、指定されたワークスペースURLと一致する呼び出し元の .databrickscfg ファイルの host エントリから読み取られます。このジョブは、同じノートブックを実行しますが、指定されたクラスター ID を持つ異なるリモートクラスターを使用します。

注記

Databricks では、バンドル構成ファイルの移植性が向上するため、可能な限り default マッピングではなく host マッピングを使用することをお勧めします。host マッピングを設定すると、Databricks CLI は .databrickscfg ファイル内で一致するプロファイルを検索し、そのプロファイルのフィールドを使用して使用する Databricks 認証の種類を決定するように指示されます。一致する host フィールドを持つプロファイルが複数存在する場合は、bundle コマンドの --profile オプションを使用して、使用するプロファイルを指定する必要があります。

notebook_taskマッピングがprodマッピング内で明示的にオーバーライドされていない場合、マッピングはトップレベルのresourcesマッピング内でnotebook_taskマッピングを使用するためにフォールバックするため、prodマッピング内でnotebook_taskマッピングを宣言する必要はありません。

YAML
bundle:
  name: hello-bundle

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 2345-678901-fabcd456

以下のバンドルコマンドを使用して、 devターゲット内でこのジョブを検証、デプロイ、および実行します。バンドルのライフサイクルに関する詳細は、「宣言型自動化バンドルの開発」を参照してください。

Bash
# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run hello_job

# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev hello_job

代わりに、prodターゲット内でこのジョブを検証、デプロイ、実行するには、次のようにします。

Bash
# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod hello_job

モジュール化を進め、バンドル間での定義と設定の再利用を改善するには、バンドル設定を個別のファイルに分割します。

YAML
# databricks.yml

bundle:
  name: hello-bundle

include:
  - '*.yml'

YAML
# hello-job.yml

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

YAML
# targets.yml

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 2345-678901-fabcd456

databricks.yml​

仕様​

例​

その他のリソース​

databricks.yml

仕様

例

その他のリソース