2
Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.
Collecting pytorch-lightning
Downloading pytorch_lightning-2.0.2-py3-none-any.whl (719 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 719.0/719.0 kB 7.0 MB/s eta 0:00:00
Requirement already satisfied: pillow in /databricks/python3/lib/python3.10/site-packages (9.2.0)
Collecting deltalake
Downloading deltalake-0.9.0-1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 29.8/29.8 MB 42.2 MB/s eta 0:00:00
Requirement already satisfied: PyYAML>=5.4 in /databricks/python3/lib/python3.10/site-packages (from pytorch-lightning) (6.0)
Requirement already satisfied: fsspec[http]>2021.06.0 in /databricks/python3/lib/python3.10/site-packages (from pytorch-lightning) (2022.7.1)
Requirement already satisfied: numpy>=1.17.2 in /databricks/python3/lib/python3.10/site-packages (from pytorch-lightning) (1.21.5)
Requirement already satisfied: tqdm>=4.57.0 in /databricks/python3/lib/python3.10/site-packages (from pytorch-lightning) (4.64.1)
Requirement already satisfied: typing-extensions>=4.0.0 in /databricks/python3/lib/python3.10/site-packages (from pytorch-lightning) (4.3.0)
Collecting lightning-utilities>=0.7.0
Downloading lightning_utilities-0.8.0-py3-none-any.whl (20 kB)
Requirement already satisfied: packaging>=17.1 in /databricks/python3/lib/python3.10/site-packages (from pytorch-lightning) (21.3)
Collecting torchmetrics>=0.7.0
Downloading torchmetrics-0.11.4-py3-none-any.whl (519 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 519.2/519.2 kB 57.4 MB/s eta 0:00:00
Requirement already satisfied: torch>=1.11.0 in /databricks/python3/lib/python3.10/site-packages (from pytorch-lightning) (1.13.1+cu117)
Requirement already satisfied: pyarrow>=7 in /databricks/python3/lib/python3.10/site-packages (from deltalake) (7.0.0)
Requirement already satisfied: aiohttp in /databricks/python3/lib/python3.10/site-packages (from fsspec[http]>2021.06.0->pytorch-lightning) (3.8.4)
Requirement already satisfied: requests in /databricks/python3/lib/python3.10/site-packages (from fsspec[http]>2021.06.0->pytorch-lightning) (2.28.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /databricks/python3/lib/python3.10/site-packages (from packaging>=17.1->pytorch-lightning) (3.0.9)
Requirement already satisfied: aiosignal>=1.1.2 in /databricks/python3/lib/python3.10/site-packages (from aiohttp->fsspec[http]>2021.06.0->pytorch-lightning) (1.3.1)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /databricks/python3/lib/python3.10/site-packages (from aiohttp->fsspec[http]>2021.06.0->pytorch-lightning) (2.0.4)
Requirement already satisfied: yarl<2.0,>=1.0 in /databricks/python3/lib/python3.10/site-packages (from aiohttp->fsspec[http]>2021.06.0->pytorch-lightning) (1.8.2)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /databricks/python3/lib/python3.10/site-packages (from aiohttp->fsspec[http]>2021.06.0->pytorch-lightning) (4.0.2)
Requirement already satisfied: frozenlist>=1.1.1 in /databricks/python3/lib/python3.10/site-packages (from aiohttp->fsspec[http]>2021.06.0->pytorch-lightning) (1.3.3)
Requirement already satisfied: multidict<7.0,>=4.5 in /databricks/python3/lib/python3.10/site-packages (from aiohttp->fsspec[http]>2021.06.0->pytorch-lightning) (6.0.4)
Requirement already satisfied: attrs>=17.3.0 in /databricks/python3/lib/python3.10/site-packages (from aiohttp->fsspec[http]>2021.06.0->pytorch-lightning) (21.4.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /databricks/python3/lib/python3.10/site-packages (from requests->fsspec[http]>2021.06.0->pytorch-lightning) (1.26.11)
Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.10/site-packages (from requests->fsspec[http]>2021.06.0->pytorch-lightning) (2022.9.14)
Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.10/site-packages (from requests->fsspec[http]>2021.06.0->pytorch-lightning) (3.3)
Installing collected packages: torchmetrics, lightning-utilities, deltalake, pytorch-lightning
Successfully installed deltalake-0.9.0 lightning-utilities-0.8.0 pytorch-lightning-2.0.2 torchmetrics-0.11.4
Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.
3
5
7
9
10
11
13
15
17
19
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
0%| | 0.00/97.8M [00:00<?, ?B/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
strategy is <pytorch_lightning.strategies.single_device.SingleDeviceStrategy object at 0x7fe6fbffb1c0>
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------
0 | model | ResNet | 25.6 M
---------------------------------
25.6 M Trainable params
0 Non-trainable params
25.6 M Total params
102.228 Total estimated model params size (MB)
Training: 0it [00:00, ?it/s]
Epoch 0 started at 1685645442.8318713 seconds
++ [0] Epoch: 0
21
INFO:TorchDistributor:Started local training with 4 processes
23
INFO:TorchDistributor:Started distributed training with 16 executor proceses
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
warnings.warn(
warnings.warn(
2023-06-01 19:24:09.634747: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.638089: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.635955: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.653088: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.657789: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.654541: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.656603: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.653706: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.655972: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.653779: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.639011: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.655034: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.731215: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.729585: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.727029: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-01 19:24:09.730542: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
/databricks/python/lib/python3.10/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
warnings.warn(
warnings.warn(
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 204MB/s]
100%|██████████| 97.8M/97.8M [00:00<00:00, 229MB/s]
100%|██████████| 97.8M/97.8M [00:00<00:00, 247MB/s]
100%|██████████| 97.8M/97.8M [00:00<00:00, 253MB/s]
100%|██████████| 97.8M/97.8M [00:00<00:00, 300MB/s]
100%|██████████| 97.8M/97.8M [00:00<00:00, 199MB/s]
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f48273aa080>
Initializing distributed: GLOBAL_RANK: 15, MEMBER: 16/16
100%|██████████| 97.8M/97.8M [00:00<00:00, 175MB/s]
100%|██████████| 97.8M/97.8M [00:00<00:00, 170MB/s]
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f09a935eb60>
Initializing distributed: GLOBAL_RANK: 12, MEMBER: 13/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f7678db4d00>
Initializing distributed: GLOBAL_RANK: 10, MEMBER: 11/16
100%|██████████| 97.8M/97.8M [00:00<00:00, 140MB/s]
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f4883e7acb0>
Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/16
100%|██████████| 97.8M/97.8M [00:00<00:00, 140MB/s]
100%|██████████| 97.8M/97.8M [00:00<00:00, 120MB/s]
100%|██████████| 97.8M/97.8M [00:00<00:00, 113MB/s]
100%|██████████| 97.8M/97.8M [00:01<00:00, 91.4MB/s]
100%|██████████| 97.8M/97.8M [00:01<00:00, 93.5MB/s]
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f15549a83d0>
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7ff434a55360>
Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f29f77a9ff0>
Initializing distributed: GLOBAL_RANK: 11, MEMBER: 12/16
100%|██████████| 97.8M/97.8M [00:01<00:00, 68.2MB/s]
100%|██████████| 97.8M/97.8M [00:01<00:00, 63.4MB/s]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f031857c190>
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7fd319fac2b0>
Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f2ccbfa4ca0>
Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f98707a6f20>
Initializing distributed: GLOBAL_RANK: 14, MEMBER: 15/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f2ff7aaef20>
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7ff357784190>
Initializing distributed: GLOBAL_RANK: 8, MEMBER: 9/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f35c0976b90>
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7fa469d72c80>
Initializing distributed: GLOBAL_RANK: 13, MEMBER: 14/16
strategy is <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f6c7c37e650>
Initializing distributed: GLOBAL_RANK: 9, MEMBER: 10/16
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 16 processes
----------------------------------------------------------------------------------------------------
You are using a CUDA device ('NVIDIA A10G') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [3]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [1]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
---------------------------------
0 | model | ResNet | 25.6 M
---------------------------------
25.6 M Trainable params
0 Non-trainable params
25.6 M Total params
102.228 Total estimated model params size (MB)
Epoch 0 started at 1685647520.2087996 seconds
Epoch 0 started at 1685647520.2089117 seconds
++ [5] Epoch: 0
++ [13] Epoch: 0
Epoch 0 started at 1685647520.2088752 seconds
Epoch 0 started at 1685647520.2089064 seconds
++ [11] Epoch: 0
++ [15] Epoch: 0
Epoch 0 started at 1685647520.208757 seconds
++ [2] Epoch: 0
Epoch 0 started at 1685647520.209095 seconds
Epoch 0 started at 1685647520.208909 seconds
++ [10] Epoch: 0
Epoch 0 started at 1685647520.2091427 seconds
++ [14] Epoch: 0
Epoch 0 started at 1685647520.2088459 seconds
++ [6] Epoch: 0
++ [8] Epoch: 0
Epoch 0 started at 1685647520.2086282 seconds
++ [4] Epoch: 0
Epoch 0 started at 1685647520.2088623 seconds
++ [7] Epoch: 0
Epoch 0 started at 1685647520.2086344 seconds
++ [12] Epoch: 0
Epoch 0 started at 1685647520.8743188 seconds
++ [3] Epoch: 0
Epoch 0 started at 1685647521.1677103 seconds
++ [1] Epoch: 0
Epoch 0 started at 1685647521.2961361 seconds
++ [9] Epoch: 0
Epoch 0: 0%| | 0/377 [00:00<?, ?it/s] Epoch 0 started at 1685647521.3167934 seconds
++ [0] Epoch: 0
Epoch 0: 8%|▊ | 29/...(truncated)26, 3.01s/it, v_num=533a]
Validation: 0it [00:00, ?it/s]
Validation: 0%| | 0/14 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/14 [00:00<?, ?it/s]
Validation DataLoader 0: 7%|▋ | 1/14 [00:00<00:02, 4.89it/s]
Validation DataLoader 0: 14%|█▍ | 2/14 [00:00<00:02, 4.32it/s]
Validation DataLoader 0: 21%|██▏ | 3/14 [00:00<00:02, 4.29it/s]
Validation DataLoader 0: 29%|██▊ | 4/14 [00:01<00:02, 3.57it/s]
Validation DataLoader 0: 36%|███▌ | 5/14 [00:01<00:02, 3.59it/s]
Validation DataLoader 0: 43%|████▎ | 6/14 [00:01<00:02, 3.69it/s]
Validation DataLoader 0: 50%|█████ | 7/14 [00:01<00:01, 3.70it/s]
Validation DataLoader 0: 57%|█████▋ | 8/14 [00:02<00:01, 3.73it/s]
Validation DataLoader 0: 64%|██████▍ | 9/14 [00:02<00:01, 3.59it/s]
Validation DataLoader 0: 71%|███████▏ | 10/14 [00:02<00:01, 3.60it/s]
Validation DataLoader 0: 79%|███████▊ | 11/14 [00:03<00:00, 3.63it/s]
Validation DataLoader 0: 86%|████████▌ | 12/14 [00:03<00:00, 3.67it/s]
Validation DataLoader 0: 93%|█████████▎| 13/14 [00:03<00:00, 3.68it/s]
Epoch 0: 100%|██████████| 377/377 [15:04<00:00, 2.40s/it, v_num=533a]s]
Epoch 1 started at 1685648472.836523 seconds
++ [13] Epoch: 1
Epoch 1 started at 1685648472.8362768 seconds
++ [12] Epoch: 1
Epoch 1 started at 1685648472.8365521 seconds
Epoch 1 started at 1685648472.8364458 seconds
Epoch 1 started at 1685648472.8365004 seconds
Epoch 1 started at 1685648472.8364177 seconds
Epoch 1 started at 1685648472.836327 seconds
Epoch 1 started at 1685648472.8363826 seconds
Epoch 1 started at 1685648472.836241 seconds
Epoch 1 started at 1685648472.8363469 seconds
Epoch 1 started at 1685648472.840834 seconds
Epoch 1 started at 1685648472.8365378 seconds
Epoch 1: 0%| | 0/377 [00:00<?, ?it/s, v_num=533a]Epoch 1 started at 1685648472.83661 seconds
Epoch 1 started at 1685648472.8365269 seconds
Epoch 1 started at 1685648472.8364668 seconds
++ [9] Epoch: 1
++ [15] Epoch: 1
++ [11] Epoch: 1
++ [3] Epoch: 1
++ [14] Epoch: 1
++ [6] Epoch: 1
++ [4] Epoch: 1
++ [8] Epoch: 1
++ [2] Epoch: 1
++ [1] Epoch: 1
++ [0] Epoch: 1
++ [5] Epoch: 1
++ [10] Epoch: 1
Epoch 1 started at 1685648474.21078 seconds
++ [7] Epoch: 1
Epoch 1: 8%...(truncated)9/377 [01:27<17:33, 3.03s/it, v_num=533a]
Validation: 0it [00:00, ?it/s]
Validation: 0%| | 0/14 [00:00<?, ?it/s]
Validation DataLoader 0: 0%| | 0/14 [00:00<?, ?it/s]
Validation DataLoader 0: 7%|▋ | 1/14 [00:01<00:24, 1.89s/it]
Validation DataLoader 0: 14%|█▍ | 2/14 [00:02<00:12, 1.08s/it]
Validation DataLoader 0: 21%|██▏ | 3/14 [00:02<00:08, 1.25it/s]
Validation DataLoader 0: 29%|██▊ | 4/14 [00:02<00:06, 1.44it/s]
Validation DataLoader 0: 36%|███▌ | 5/14 [00:03<00:05, 1.63it/s]
Validation DataLoader 0: 43%|████▎ | 6/14 [00:03<00:04, 1.80it/s]
Validation DataLoader 0: 50%|█████ | 7/14 [00:03<00:03, 1.95it/s]
Validation DataLoader 0: 57%|█████▋ | 8/14 [00:03<00:02, 2.07it/s]
Validation DataLoader 0: 64%|██████▍ | 9/14 [00:10<00:05, 1.20s/it]
Validation DataLoader 0: 71%|███████▏ | 10/14 [00:11<00:04, 1.10s/it]
Validation DataLoader 0: 79%|███████▊ | 11/14 [00:11<00:03, 1.03s/it]
Validation DataLoader 0: 86%|████████▌ | 12/14 [00:11<00:01, 1.04it/s]
Validation DataLoader 0: 93%|█████████▎| 13/14 [00:11<00:00, 1.09it/s]
Epoch 1: 100%|██████████| 377/377 [15:09<00:00, 2.41s/it, v_num=533a]s]
Epoch 1: 100%|██████████| 377/377 [15:11<00:00, 2.42s/it, v_num=533a]`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|██████████| 377/377 [15:11<00:00, 2.42s/it, v_num=533a]
INFO:TorchDistributor:Finished distributed training with 16 executor proceses