Databricks Asset Bundle library dependencies

Preview

This feature is in Public Preview.

This article describes the syntax for declaring Databricks Asset Bundle library dependencies. Bundles enable programmatic management of Databricks workflows. See What are Databricks Asset Bundles?.

Your Databricks jobs, Delta Live Tables pipelines, and MLOps Stacks will likely depend on additional libraries in order to work as expected. Use the following information to declare these library dependencies in your bundle configuration files. See Databricks Asset Bundle configurations.

Bundles support the following library dependency types for Databricks jobs:

  • Python wheel

  • PyPI package

  • Maven package

Bundles supports only Maven library dependencies for Delta Live Tables pipelines.

The following sections provide examples that show how to declare these library dependencies.

Job: Python wheel

Databricks workspace filesystem, Amazon S3, and local filesystem URIs are supported for Python wheels. If S3 is used, the cluster must have read access to the Python wheel. You might need to launch the cluster with an AWS IAM role to access the S3 URI.

Important

You should not store Python wheels in the Databricks Filesystem (DBFS), especially not in the DBFS root. All workspace users have the ability to modify data and files stored in the DBFS root. You can avoid this by uploading Python wheels to workspace files or volumes, or by using Python wheels in cloud object storage.

The following example shows how to install two Python wheel files.

  • The first Python wheel file is in the same local folder as this bundle configuration file.

  • The second Python wheel is in the specified workspace filesystem path in the Databricks workspace.

resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - whl: "./my-wheel-0.1.0.whl"
            - whl: "/Workspace/Shared/Libraries/my-wheel-0.0.1-py3-none-any.whl"

Job: PyPI package

In your job task definition, in libraries, specify a pypi mapping for each PyPI package to be installed. For each mapping, specify the following:

  • For package, specify the name of the PyPI package to install. An optional exact version specification is also supported.

  • Optionally, for repo, specify the repository where the PyPI package can be found. If not specified, the default pip index is used (https://pypi.org/simple/).

The following example shows how to install two PyPI packages.

  • The first PyPI package uses the specified package version and the default pip index.

  • The second PyPI package uses the specified package version and the explicitly specified pip index.

resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - pypi:
                package: "wheel==0.41.2"
            - pypi:
                package: "numpy==1.25.2"
                repo: "https://pypi.org/simple/"

Job: Maven package

In your job task definition, in libraries, specify a maven mapping for each Maven package to be installed. For each mapping, specify the following:

  • For coordinates, specify the Gradle-style Maven coordinates for the package.

  • Optionally, for repo, specify the Maven repo to install the Maven package from. If omitted, both the Maven Central Repository and the Spark Packages Repository are searched.

  • Optionally, for exclusions, specify any dependencies to explicitly exclude. See Maven dependency exclusions.

The following example shows how to install two Maven packages.

  • The first Maven package uses the specified package coordinates and searches for this package in both the Maven Central Repository and the Spark Packages Repository.

  • The second Maven package uses the specified package coordinates, searches for this package only in the Maven Central Repository, and does not include any of this package’s dependencies that match the specified pattern.

resources:
  jobs:
    my_job:
      # ...
      tasks:
        - task_key: my_task
          # ...
          libraries:
            - maven:
                coordinates: "com.databricks:databricks-sdk-java:0.8.1"
            - maven:
                coordinates: "com.databricks:databricks-dbutils-scala_2.13:0.1.4"
                repo: "https://mvnrepository.com/"
                exclusions:
                  - "org.scala-lang:scala-library:2.13.0-RC*"

Pipeline: Maven

In your pipeline definition, in libraries, specify a maven mapping for each Maven package to be installed. For each mapping, specify the following:

  • For coordinates, specify the Gradle-style Maven coordinates for the package.

  • Optionally, for repo, specify the Maven repo to install the Maven package from. If omitted, both the Maven Central Repository and the Spark Packages Repository are searched.

  • Optionally, for exclusions, specify any dependencies to explicitly exclude. See Maven dependency exclusions.

The following example shows how to install two Maven packages.

  • The first Maven package uses the specified package coordinates and searches for this package in both the Maven Central Repository and the Spark Packages Repository.

  • The second Maven package uses the specified package coordinates, searches for this package only in the Maven Central Repository, and does not include any of this package’s dependencies that match the specified pattern.

resources:
  pipelines:
    my_pipeline:
      # ...
      libraries:
        - maven:
            coordinates: "com.databricks:databricks-sdk-java:0.8.1"
        - maven:
            coordinates: "com.databricks:databricks-dbutils-scala_2.13:0.1.4"
            repo: "https://mvnrepository.com/"
            exclusions:
              - "org.scala-lang:scala-library:2.13.0-RC*"