Skip to main content

Use a private artifact in a bundle

Files and artifacts stored in third party tools such as JFrog Artifactory or in private repositories may need to be part of your Databricks Asset Bundles. This article describes how to handle these files. For information about Databricks Asset Bundles, see What are Databricks Asset Bundles?.

For an example bundle that uses a private wheel, see the bundle-examples GitHub repository.

tip

If you are using notebooks, you can install Python wheels from a private repository in a notebook, then add a notebook_task to the job in your bundle. See Notebook-scoped Python libraries.

Download the artifact locally

To manage a private artifact using Databricks Asset Bundles, you first need to download it locally. Then you can reference it in your bundle and deploy it to the workspace as part of the bundle, or you can upload it to Unity Catalog and reference it in your bundle.

For example, the following command downloads a Python wheel file to the dist directory:

Shell
pip download -d dist my-wheel==1.0

You could also download a private PyPI package, then copy it to the dist directory.

Bash
export PYPI_TOKEN=<YOUR TOKEN>
pip download -d dist my-package==1.0.0 --index-url https://$PYPI_TOKEN@<package-index-url> --no-deps

(Optional) Upload the artifact to Unity Catalog

Once you have downloaded the artifact, you can optionally copy the downloaded artifact to your Unity Catalog volume using the Databricks CLI, so that it can be referenced from your bundle instead of uploaded to your workspace when the bundle is deployed. The following example copies a wheel to a Unity Catalog volume:

Bash
databricks fs cp my-wheel-1.0-*.whl dbfs:/Volumes/myorg_test/myorg_volumes/packages
tip

Databricks Asset Bundles will automatically upload all artifacts referenced in the bundle to Unity Catalog if you set artifact_path in your bundle configuration to a Unity Catalog volumes path.

Reference the artifact

To include the artifact in your bundle, reference it in your configuration.

The following example bundle references a wheel file in the dist directory in a job. This configuration uploads the wheel to the workspace when the bundle is deployed.

YAML
resources:
jobs:
demo-job:
name: demo-job
tasks:
- task_key: python-task
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: Standard_D4s_v5
num_workers: 1
spark_python_task:
python_file: ../src/main.py
libraries:
- whl: ../dist/my-wheel-1.0-*.whl

If you uploaded your artifact to a Unity Catalog volume, configure your job to reference it at that location:

YAML
resources:
jobs:
demo-job:
name: demo-job
tasks:
- task_key: python-task
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: Standard_D4s_v5
num_workers: 1
spark_python_task:
python_file: ../src/main.py
libraries:
- whl: /Volumes/myorg_test/myorg_volumes/packages/my-wheel-1.0-py3-none-any.whl

For a Python wheel, it can alternatively be referenced in a python_wheel_task for a job:

YAML
resources:
jobs:
demo-job:
name: demo-job
tasks:
- task_key: wheel_task
python_wheel_task:
package_name: my_package
entry_point: entry
job_cluster_key: Job_cluster
libraries:
- whl: ../dist/my-wheel-1.0-*.whl