Skip to main content

Build a Python wheel file using Databricks Asset Bundles

This article describes how to build, deploy, and run a Python wheel file as part of a Databricks Asset Bundle project. See What are Databricks Asset Bundles?.

For an example configuration that builds a JAR and uploads it to Unity Catalog, see Bundle that uploads a JAR file to Unity Catalog.

Requirements

Create the bundle using a template

In these steps, you create the bundle using the Databricks default bundle template for Python. This bundle consists of files to build into a Python wheel file and the definition of a Databricks job to build this Python wheel file. You then validate, deploy, and build the deployed files into a Python wheel file from the Python wheel job within your Databricks workspace.

note

The Databricks default bundle template for Python uses uv to build the Python wheel file. To install uv, see Installing uv.

If you want to create a bundle from scratch, see Create a bundle manually.

Step 1: Create the bundle

A bundle contains the artifacts you want to deploy and the settings for the workflows you want to run.

  1. Use your terminal or command prompt to switch to a directory on your local development machine that will contain the template's generated bundle.

  2. Use the Databricks CLI version to run the bundle init command:

    Bash
    databricks bundle init
  3. For Template to use, leave the default value of default-python by pressing Enter.

  4. For Unique name for this project, leave the default value of my_project, or type a different value, and then press Enter. This determines the name of the root directory for this bundle. This root directory is created within your current working directory.

  5. For Include a stub (sample) notebook, select no and press Enter. This instructs the Databricks CLI to not add a sample notebook to your bundle.

  6. For Include a stub (sample) Delta Live Tables pipeline, select no and press Enter. This instructs the Databricks CLI to not define a sample pipeline in your bundle.

  7. For Include a stub (sample) Python package, leave the default value of yes by pressing Enter. This instructs the Databricks CLI to add sample Python wheel package files and related build instructions to your bundle.

  8. For Use serverless, select yes and press Enter. This instructs the Databricks CLI to configure your bundle to run on serverless compute.

Step 2: Explore the bundle

To view the files that the template generated, switch to the root directory of your newly created bundle and open this directory with your preferred IDE. Files of particular interest include the following:

  • databricks.yml: This file specifies the bundle's name, specifies whl build settings, includes a reference to the job configuration file, and defines settings for target workspaces.
  • resources/<project-name>_job.yml: This file specifies the Python wheel job's settings.
  • src/<project-name>: This directory includes the files that the Python wheel job uses to build the Python wheel file.
note

If you want to install the Python wheel file on a cluster with Databricks Runtime 12.2 LTS or below, you must add the following top-level mapping to the databricks.yml file:

YAML
# Applies to all tasks of type python_wheel_task.
experimental:
python_wheel_wrapper: true

Step 3: Validate the project's bundle configuration file

In this step, you check whether the bundle configuration is valid.

  1. From the root directory, use the Databricks CLI to run the bundle validate command, as follows:

    Bash
    databricks bundle validate
  2. If a summary of the bundle configuration is returned, then the validation succeeded. If any errors are returned, fix the errors, and then repeat this step.

If you make any changes to your bundle after this step, you should repeat this step to check whether your bundle configuration is still valid.

Step 4: Build the Python wheel file and deploy the local project to the remote workspace

In this step, the Python wheel file is built and deployed to your remote Databricks workspace, and a Databricks job is created within your workspace.

  1. Use the Databricks CLI to run the bundle deploy command as follows:

    Bash
    databricks bundle deploy -t dev
  2. To check whether the locally built Python wheel file was deployed:

    1. In your Databricks workspace's sidebar, click Workspace.
    2. Click into the following folder: Workspace > Users > <your-username> > .bundle > <project-name> > dev > artifacts > .internal > <random-guid>.

    The Python wheel file should be in this folder.

  3. To check whether the job was created:

    1. In your Databricks workspace's sidebar, click Jobs & Pipelines.
    2. Optionally, select the Jobs and Owned by me filters.
    3. Click [dev <your-username>] <project-name>_job.
    4. Click the Tasks tab.

    There should be one task: main_task.

If you make any changes to your bundle after this step, repeat steps 3-4 to check whether your bundle configuration is still valid and then redeploy the project.

Step 5: Run the deployed project

In this step, you run the Databricks job in your workspace.

  1. From the root directory, use the Databricks CLI to run the bundle run command, as follows, replacing <project-name> with the name of your project from Step 1:

    Bash
    databricks bundle run -t dev <project-name>_job
  2. Copy the value of Run URL that appears in your terminal and paste this value into your web browser to open your Databricks workspace.

  3. In your Databricks workspace, after the task completes successfully and shows a green title bar, click the main_task task to see the results.

Build the whl using Poetry or setuptools

When you use databricks bundle init with the default-python template, a bundle is created that shows how to configure a bundle that builds a Python wheel using uv and pyproject.toml. However, you may want to use Poetry or setuptools instead to build a wheel.

Install Poetry or setuptools

  1. Install Poetry or setuptools:

    • Install Poetry, version 1.6 or above, if it is not already installed. To check your installed version of Poetry, run the command poetry -V or poetry --version.
    • Make sure you have Python version 3.10 or above installed. To check your version of Python, run the command python -V or python --version.
  2. If you intend to store this bundle with a Git provider, add a .gitignore file in the project's root, and add the following entries to this file:

    .databricks
    dist

Add build files

  1. In your bundle's root, create the following folders and files, depending on whether you use Poetry or setuptools for building Python wheel files:

    ├── src
    │ └── my_package
    │ ├── __init__.py
    │ ├── main.py
    │ └── my_module.py
    └── pyproject.toml
  2. Add the following code to the pyproject.toml or setup.py file:

    [tool.poetry]
    name = "my_package"
    version = "0.0.1"
    description = "<my-package-description>"
    authors = ["my-author-name <my-author-name>@<my-organization>"]

    [tool.poetry.dependencies]
    python = "^3.10"

    [build-system]
    requires = ["poetry-core"]
    build-backend = "poetry.core.masonry.api"

    [tool.poetry.scripts]
    main = "my_package.main:main"
    • Replace my-author-name with your organization's primary contact name.
    • Replace my-author-name>@<my-organization with your organization's primary email contact address.
    • Replace <my-package-description> with a display description for your Python wheel file.

Add artifacts bundle configuration

  1. Add the artifacts mapping configuration to your databricks.yml to build the whl artifact:

    This configuration runs the poetry build command and indicates the path to the pyproject.toml file is in the same directory as the databricks.yml file.

    note

    If you have already built a Python wheel file and just want to deploy it, then modify the following bundle configuration file by omitting the artifacts mapping. The Databricks CLI will then assume that the Python wheel file is already built and will automatically deploy the files that are specified in the libraries array's whl entries.

    YAML
    bundle:
    name: my-wheel-bundle

    artifacts:
    default:
    type: whl
    build: poetry build
    path: .

    resources:
    jobs:
    wheel-job:
    name: wheel-job
    tasks:
    - task_key: wheel-task
    new_cluster:
    spark_version: 13.3.x-scala2.12
    node_type_id: i3.xlarge
    data_security_mode: USER_ISOLATION
    num_workers: 1
    python_wheel_task:
    entry_point: main
    package_name: my_package
    libraries:
    - whl: ./dist/*.whl

    targets:
    dev:
    workspace:
    host: <workspace-url>

:::