Get started: Serverless GPU compute with H100 GPUs
This notebook demonstrates how to use Databricks Serverless GPU compute with H100 accelerators. You'll learn how to connect to H100 GPUs and run distributed workloads using the serverless_gpu Python library.
The serverless_gpu library enables seamless execution of GPU workloads directly from Databricks notebooks. It provides decorators and runtime utilities for distributed GPU computing. To learn more, see the Serverless GPU API documentation.
Connect to serverless GPU compute
To run this notebook, you need access to Databricks Serverless GPU compute with H100 accelerators.
- From the compute selector, select Serverless GPU.
- In the "Environment" tab on the right side, select H100 for your accelerator. This option uses 8 H100 chips on a single node.
- Click Apply.
See the Hello World example below for how to target remote GPUs to scale to more resources.
When to use H100 GPUs
Compared to A10s, H100s offer larger floating-point operations per second (FLOPS) and high-bandwidth memory (HBM). Use H100s for large model training where high throughput and/or large GPU memory is needed.
Verify GPU connection
Use the nvidia-smi command to confirm that you're connected to 8 H100 GPUs. This command displays GPU information including model, memory, and utilization.
%sh nvidia-smi
Thu Jan 15 17:56:54 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA H100 80GB HBM3 On | 00000000:53:00.0 Off | 0 |
| N/A 26C P0 70W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 On | 00000000:64:00.0 Off | 0 |
| N/A 28C P0 68W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 On | 00000000:75:00.0 Off | 0 |
| N/A 26C P0 71W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 On | 00000000:86:00.0 Off | 0 |
| N/A 29C P0 68W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA H100 80GB HBM3 On | 00000000:97:00.0 Off | 0 |
| N/A 27C P0 67W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA H100 80GB HBM3 On | 00000000:A8:00.0 Off | 0 |
| N/A 26C P0 67W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA H100 80GB HBM3 On | 00000000:B9:00.0 Off | 0 |
| N/A 26C P0 69W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA H100 80GB HBM3 On | 00000000:CA:00.0 Off | 0 |
| N/A 26C P0 67W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Hello World example
This example demonstrates how to run a distributed function across multiple GPUs using the @distributed decorator.
The annotated function below is launched on 8 processes, one per GPU on the node the notebook is attached to. The launch annotation specifies the number of GPUs.
The function uses the runtime module to access the local and global GPU ranks.
Set remote to False to launch on the H100s connected to the notebook. Set it to True to provision remote GPU resources.
from serverless_gpu import distributed
from serverless_gpu import runtime as rt
@distributed(
gpus=8,
gpu_type='h100',
remote=False, # Use the GPUs the notebook is running on
)
def hello_world(name: str) -> list[int]:
if rt.get_local_rank() == 0:
print('hello world', name)
return rt.get_global_rank()
result = hello_world.distributed('SGC')
Warning: serverless_gpu is in Beta. The API is subject to change.
Using log_dir='/tmp/SGC_logs_7_yw1eno'
Warning: serverless_gpu is in Beta. The API is subject to change.
hello world SGC
Warning: serverless_gpu is in Beta. The API is subject to change.
assert result == [0, 1, 2, 3, 4, 5, 6, 7]
Next steps
- Best practices for Serverless GPU compute
- Troubleshoot issues on serverless GPU compute
- Multi-GPU and multi-node distributed training
- Serverless GPU API documentation