Configuração de pacotes de automação declarativa

Este artigo descreve a sintaxe para arquivos de configuração de pacotes, que definem os Pacotes de Automação Declarativa (anteriormente conhecidos como Pacotes Ativos Databricks ). Veja O que são pacotes de automação declarativa?

Para criar e trabalhar com pacotes, consulte Desenvolver pacotes de automação declarativa.

Para obter informações sobre a configuração do pacote, consulte a Referência de configuração.

databricks.yml

Um pacote deve conter um (e somente um) arquivo de configuração chamado databricks.yml na raiz da pasta do projeto do pacote. databricks.yml é o principal arquivo de configuração que define um pacote, mas pode fazer referência a outros arquivos de configuração, como arquivos de configuração de recurso, no mapeamento include. A configuração do pacote é expressa em YAML. Para obter mais informações sobre o YAML, consulte a especificação oficial do YAML.

O mais simples databricks.yml define o nome do pacote, que é um mapeamento de nível superior obrigatório, e uma implantação de destino.

YAML
bundle:
  name: my_bundle

targets:
  dev:
    default: true

Para obter detalhes sobre todos os mapeamentos de nível superior, consulte a Referência de configuração.

dica

O suporte Python para Declarative Automation Bundles permite que você defina recursos em Python. Consulte Configuração de pacotes em Python.

Especificação

A seguinte especificação YAML fornece a chave de configuração de nível superior para Pacotes de Automação Declarativa. Para obter informações completas sobre a configuração, consulte o recurso Referência de configuração e Pacotes de automação declarativa.

YAML
# This is the default bundle configuration if not otherwise overridden in
# the "targets" top-level mapping.
bundle: # Required.
  name: string # Required.
  databricks_cli_version: string
  cluster_id: string
  deployment: Map
  git:
    origin_url: string
    branch: string

# This is the identity to use to run the bundle
run_as:
  - user_name: <user-name>
  - service_principal_name: <service-principal-name>

# These are any additional configuration files to include.
include:
  - '<some-file-or-path-glob-to-include>'
  - '<another-file-or-path-glob-to-include>'

# These are any scripts that can be run.
scripts:
  <some-unique-script-name>:
    content: string

# These are any additional files or paths to include or exclude.
sync:
  include:
    - '<some-file-or-path-glob-to-include>'
    - '<another-file-or-path-glob-to-include>'
  exclude:
    - '<some-file-or-path-glob-to-exclude>'
    - '<another-file-or-path-glob-to-exclude>'
  paths:
    - '<some-file-or-path-to-synchronize>'

# These are the default artifact settings if not otherwise overridden in
# the targets top-level mapping.
artifacts:
  <some-unique-artifact-identifier>:
    build: string
    dynamic_version: boolean
    executable: string
    files:
      - source: string
    path: string
    type: string

# These are for any custom variables for use throughout the bundle.
variables:
  <some-unique-variable-name>:
    description: string
    default: string or complex
    lookup: Map
    type: string # The only valid value is "complex" if the variable is a complex variable, otherwise do not define this key.

# These are the workspace settings if not otherwise overridden in
# the targets top-level mapping.
workspace:
  artifact_path: string
  host: string
  profile: string
  resource_path: string
  root_path: string
  state_path: string

# These are the permissions to apply to resources defined
# in the resources mapping.
permissions:
  - level: <permission-level>
    group_name: <unique-group-name>
  - level: <permission-level>
    user_name: <unique-user-name>
  - level: <permission-level>
    service_principal_name: <unique-principal-name>

# These are the resource settings if not otherwise overridden in
# the targets top-level mapping.
resources:
  alerts:
    <unique-alert-name>:
      # alert settings
  apps:
    <unique-app-name>:
      # app settings
  catalogs:
    <unique-catalog-name>:
      # catalog settings
  clusters:
    <unique-cluster-name>:
      # cluster settings
  dashboards:
    <unique-dashboard-name>:
      # dashboard settings
  database_catalogs:
    <unique-database-catalog-name>:
      # database catalog settings
  database_instances:
    <unique-database-instance-name>:
      # database instance settings
  experiments:
    <unique-experiment-name>:
      # experiment settings
  jobs:
    <unique-job-name>:
      # job settings
  model_serving_endpoints:
    <unique-model-serving-endpoint-name>:
    # model_serving_endpoint settings
  pipelines:
    <unique-pipeline-name>:
      # pipeline settings
  postgres_branches:
    <unique-postgres-branch-name>:
      # postgres branch settings
  postgres_endpoints:
    <unique-postgres-endpoint-name>:
      # postgres endpoint settings
  postgres_projects:
    <unique-postgres-project-name>:
      # postgres project settings
  quality_monitors:
    <unique-quality-monitor-name>:
    # quality monitor settings
  registered_models:
    <unique-registered-model-name>:
    # registered model settings
  schemas:
    <unique-schema-name>:
      # schema settings
  secret_scopes:
    <unique-secret-scope-name>:
      # secret scopes settings
  sql_warehouses:
    <unique-sql-warehouse-name>:
      # sql warehouse settings
  synced_database_tables:
    <unique-synced-database-table-name>:
      # synced database table settings
  volumes:
    <unique-volume-name>:
    # volumes settings

# These are the targets to use for deployments and workflow runs. One and only one of these
# targets can be set to "default: true".
targets:
  <some-unique-programmatic-identifier-for-this-target>:
    artifacts:
      # artifact build settings for this target
    bundle:
      # bundle settings for this target
    default: boolean
    git: Map
    mode: string
    permissions:
      # permissions for this target
    presets:
      <preset>: <value>
    resources:
      # resource settings for this target
    sync:
      # sync settings for this target
    variables:
      <defined-variable-name>: <non-default-value> # value for this target
    workspace:
      # workspace settings for this target
    run_as:
      # run_as settings for this target

Exemplos

Esta seção contém alguns exemplos básicos para ajudar você a entender como os pacotes funcionam e como estruturar a configuração.

nota

Para obter exemplos de configuração que demonstram o recurso de pacote e casos de uso comuns de pacote, consulte Exemplos de configuração de pacote e o repositório de exemplos de pacote em GitHub.

O exemplo de configuração de pacote a seguir especifica um arquivo local chamado hello.py que está no mesmo diretório do arquivo de configuração de pacote databricks.yml. Ele executa esse Notebook como um Job usando o clustering remoto com o ID de clustering especificado. O URL remoto workspace e as credenciais de autenticação workspace são lidos no perfil de configuração local do chamador chamado DEFAULT.

YAML
bundle:
  name: hello-bundle

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

targets:
  dev:
    default: true

O exemplo a seguir adiciona um alvo com o nome prod que usa um URL workspace remoto diferente e credenciais de autenticação workspace, que são lidas da entrada correspondente host do arquivo .databrickscfg do chamador com o URL workspace especificado. Esse trabalho executa o mesmo Notebook, mas usa um clustering remoto diferente com o ID de clustering especificado.

nota

A Databricks recomenda que o senhor use o mapeamento host em vez do mapeamento default sempre que possível, pois isso torna os arquivos de configuração do pacote mais portáteis. A configuração do mapeamento host instrui a CLI do Databricks a encontrar um perfil correspondente no arquivo .databrickscfg e, em seguida, usar os campos desse perfil para determinar qual tipo de autenticação do Databricks deve ser usado. Se houver vários perfis com um campo host correspondente, o senhor deverá usar a opção --profile no bundle comando para especificar um perfil a ser usado.

Observe que você não precisa declarar o mapeamento notebook_task no mapeamento prod, pois ele volta a usar o mapeamento notebook_task no mapeamento resources de nível superior, se o mapeamento notebook_task não for explicitamente substituído no mapeamento prod.

YAML
bundle:
  name: hello-bundle

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 2345-678901-fabcd456

Use o seguinte comando de pacote para validar, implantar e executar este Job dentro do alvo dev . Para obter detalhes sobre o ciclo de vida de um pacote, consulte Desenvolver pacotes de automação declarativa.

Bash
# Because the "dev" target is set to "default: true",
# you do not need to specify "-t dev":
databricks bundle validate
databricks bundle deploy
databricks bundle run hello_job

# But you can still explicitly specify it, if you want or need to:
databricks bundle validate
databricks bundle deploy -t dev
databricks bundle run -t dev hello_job

Em vez disso, para validar, implantar e executar esse job no destino prod:

Bash
# You must specify "-t prod", because the "dev" target
# is already set to "default: true":
databricks bundle validate
databricks bundle deploy -t prod
databricks bundle run -t prod hello_job

Para obter mais modularização e melhor reutilização de definições e configurações entre pacotes, divida a configuração do pacote em arquivos separados:

YAML
# databricks.yml

bundle:
  name: hello-bundle

include:
  - '*.yml'

YAML
# hello-job.yml

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: 1234-567890-abcde123
          notebook_task:
            notebook_path: ./hello.py

YAML
# targets.yml

targets:
  dev:
    default: true
  prod:
    workspace:
      host: https://<production-workspace-url>
    resources:
      jobs:
        hello-job:
          name: hello-job
          tasks:
            - task_key: hello-task
              existing_cluster_id: 2345-678901-fabcd456

databricks.yml​

Especificação​

Exemplos​

Recursos adicionais​

databricks.yml

Especificação

Exemplos

Recursos adicionais