Use custom Docker images
Custom Docker images for AI Runtime CLI workloads is in Beta.
Docker Container Services (DCS) lets you bring your own Docker container image to air workloads. Use a custom image when you need:
- Specific system library versions.
- Complex dependencies that don't fit cleanly into
environment.dependencies. - An exact environment to reproduce research results.
- Standard images built by your organization's platform or security team.
Prerequisites
- Install the AI Runtime CLI.
- For private images, a Docker Hub account with access to your image.
Register an image
Before running a workload with a custom image, register it with air register image. Registration pulls and caches the image in the Databricks platform. Each user must register an image once per image tag. Re-register only when you push a new tag or rotate credentials. Registration takes 2–6 minutes and blocks until the image is ready.
Public images
Register public images by providing the Docker image URL and your Databricks profile:
air register image docker.io/nvidia/cuda:12.9.0-devel-ubuntu24.04 -p my-databricks-profile
The short form image reference also works. For example, library/ubuntu:latest.
Private Docker Hub images
To register a private Docker Hub image, generate a personal access token first. In your Docker Hub account settings, click Personal access tokens → Generate new token. Read-only access is sufficient.
Choose one of the following authentication methods:
Using docker login (recommended for interactive use)
Log in to Docker Hub at the terminal. You will be prompted for your Docker Hub username and personal access token:
docker login
This stores your credentials in ~/.docker/config.json. Then register the image — air reads the credentials automatically:
air register image myorg/myrepo:mytag -p my-databricks-profile
Using interactive authentication
Authenticate and store credentials in a Databricks secret scope in one step:
air register image myorg/myrepo:mytag --interactive-authenticate -p my-databricks-profile
You will be prompted for your Docker Hub username and personal access token. Credentials are stored in your workspace secret scope for future registrations.
Using a pre-stored Databricks secret (recommended for CI/scripts)
Store credentials in a Databricks secret and reference it directly:
air register image myorg/myrepo:mytag --scope my-secret-scope --key my-docker-key -p my-databricks-profile
Use a Docker image in a workload
Specify the Docker image in your workload YAML under environment.docker_image.url:
experiment_name: my-dcs-training
environment:
docker_image:
url: myorg/myrepo:mytag
compute:
num_accelerators: 1
accelerator_type: GPU_1xA10
command: python /app/train.py
When bringing your own Docker image, environment.dependencies and environment.version are not supported. Specifying environment.docker_image.url with either field triggers an error. If you have additional dependencies, install the packages in the Dockerfile instead.
Submit the workload:
air run --file workload.yaml -p my-databricks-profile
Environment variables injected into your container
AI Runtime injects the following environment variables into every container at runtime:
NUM_NODES— total number of nodes.LOCAL_WORLD_SIZE— GPUs per node.WORLD_SIZE— total number of processes.POD_RANK— current node rank (0-indexed). Also injected asNODE_RANK.LOCAL_ADDR— local node IP (multi-node only).MASTER_ADDR— rank-0 coordination address (multi-node only).MASTER_PORT— rank-0 coordination port (multi-node only).
Examples
Single-node A10
experiment_name: my-dcs-single-node
environment:
docker_image:
url: myorg/myrepo:mytag
compute:
num_accelerators: 1
accelerator_type: GPU_1xA10
command: python3 /app/train.py
Multi-node H100 with RDMA
For multi-node H100 jobs that need full network bandwidth on AWS p5 instances, base your image on one of the Databricks base images with NCCL and EFA preconfigured:
experiment_name: my-dcs-distributed
environment:
docker_image:
url: myorg/myrepo:mytag
compute:
num_accelerators: 16 # 2 nodes × 8 H100
accelerator_type: GPU_8xH100
command: |-
torchrun \
--nnodes="${NUM_NODES}" \
--nproc_per_node="${LOCAL_WORLD_SIZE}" \
--node_rank="${POD_RANK}" \
--rdzv_endpoint="${MASTER_ADDR}:${MASTER_PORT}" \
/app/train.py
Build your own image
Databricks base images
Databricks publishes base images on Docker Hub at databricksruntime/air with CUDA, NCCL, and cloud-specific networking (AWS EFA or Azure InfiniBand) preconfigured.
Tag | Cloud | Variant | Use when |
|---|---|---|---|
| AWS | Runtime | Installing pre-built wheels only |
| AWS | Devel | Compiling CUDA extensions (requires |
| Azure | Runtime | Installing pre-built wheels only |
| Azure | Devel | Compiling CUDA extensions (requires |
Use the runtime variant unless your Dockerfile compiles CUDA extensions such as flash-attn, apex, or custom kernels.
Example Dockerfile adding PyTorch to a Databricks base image. The base images provide Python at /opt/venv, managed by uv. uv pip install targets that environment by default; to use a different environment, create and activate a venv before running uv pip install.
FROM databricksruntime/air:dcs-base-aws-runtime
RUN uv pip install --no-cache \
torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
RUN uv pip install --no-cache \
transformers==4.45.0 \
accelerate==0.34.0 \
'mlflow>=3.6'
COPY ./train /app/train
Build, push, and register:
docker build -t myorg/myrepo:mytag .
docker push myorg/myrepo:mytag
air register image myorg/myrepo:mytag --interactive-authenticate -p my-databricks-profile
Requirements
- Images must be hosted on Docker Hub. Amazon ECR, Google GCR, and GitHub GHCR are not supported.
- Image size must be under 20 GB.
WORKDIRis not honored at runtime. Use absolute paths for files baked into the image. For example, usepython /app/train.py, notpython train.py.- You cannot use
environment.dependenciesorenvironment.versionwithenvironment.docker_image.url. If you need extra packages beyond what is in the image, you must add them to the Dockerfile.