Skip to main content

Migrate to the direct deployment engine

Experimental

This feature is Experimental.

Databricks Asset Bundles was originally built on top of the Databricks Terraform provider to manage deployments. However, in an effort to move away from this dependency, Databricks CLI version 0.279.0 and above supports two different deployment engines: terraform and direct. The direct deployment engine does not depend on Terraform and will soon become the default. The Terraform deployment engine will eventually be deprecated.

What are the advantages of direct deployment?

The new direct deployment engine uses the Databricks Go SDK and provides the following benefits:

  • No requirement to download Terraform and terraform-provider-databricks before deployment
  • Avoids issues with firewalls, proxies, and custom provider registries
  • Detailed diffs of changes available using bundle plan -o json
  • Faster deployment
  • Reduced time to release new bundle resources, because there is no need to align with the Terraform provider release

How do I start using direct deployment?

To start using the new direct deployment engine:

  • For existing bundles, migrate them using databricks bundle deployment migrate.
  • For new bundles, deploy them with the DATABRICKS_BUNDLE_ENGINE environment variable set to direct.

Migrate an existing bundle

The direct deployment engine uses its own JSON state file. The schema is different than the Terraform JSON state file. The bundle deployment migrate command converts the Terrform state file (terraform.tfstate) to the direct deployment state file (resources.json). The command reads IDs from the existing deployment.

  1. Perform a full deployment with Terraform:

    Bash
    databricks bundle deploy -t my_target
  2. Migrate the deployment:

    Bash
    databricks bundle deployment migrate -t my_target
  3. Verify that the migration was successful. The databricks bundle plan command should succeed and it should show no changes.

    Bash
    databricks bundle plan -t my_target
    • If the verification failed, remove the new state file:

      Bash
      rm .databricks/bundle/my_target/resources.json
    • If the verification succeeded, deploy the bundle to synchronize the state file to the workspace:

      Bash
      databricks bundle deploy -t my_target

Direct deploy a new bundle

The bundle migrate command does not work on bundles that have never been deployed because there is no state file. Instead, set the DATABRICKS_BUNDLE_ENGINE environment variable and deploy:

Bash
DATABRICKS_BUNDLE_ENGINE=direct databricks bundle deploy -t my_target

What are the changes in the direct deployment engine?

The new direct deployment engine mostly behaves the same as the Terrform deployment engine, but there are some differences.

Resource state diff calculation

Unlike Terraform which maintains a single resource state (a mix of local configuration and remote state), the new engine keeps these separate and only records local configuration in its state file.

The resources state diff calculation is done in two steps:

  1. The local bundle configuration is compared to the snapshot configuration used for the most recent deployment. The remote state plays no role.
  2. The remote state is compared to the snapshot configuration used for the most recent deployment.

The result is that:

  • databricks.yml resource changes are never ignored and will always trigger an update.
  • Resource fields not handled by the implementation do not trigger an inconsistent result error. These resources are deployed successfully by the direct engine, but this can result in a drift. The deployed resources are updated during the next plan or deploy.

$resources substitution lookup

The most common use of $resources is resolving substitution IDs (for example, $resources.jobs.my_job.id) which behaves identically between the Terraform and direct deployment engines.

However, the resolution of the $resources substitution in the direct deployment engine (for example, $resources.pipelines.my_pipeline.name) is performed in two steps:

  1. References pointing to fields that are present in the local config are resolved to the value provided in the local config.
  2. References that are not present in the local config are resolved from the remote state. This is the state fetched using the appropriate GET request for a given resource.

The schema that is used for $resource resolution is available in the file out.fields.txt. The fields marked as ALL and STATE can be used for local resolution. The fields marked as ALL or REMOTE can be used for remote resolution.