2
4
7
9
('imdb',): download_size=79.6M, dataset_size=127.0M
11
/databricks/python/lib/python3.11/site-packages/datasets/load.py:1486: FutureWarning: The repository for oscar contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/oscar
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
('oscar', 'unshuffled_deduplicated_en'): download_size=462.4G, dataset_size=1.2T
13
('tatsu-lab/alpaca',): download_size=23.1M, dataset_size=42.1M
14
/databricks/python/lib/python3.11/site-packages/datasets/load.py:1486: FutureWarning: The repository for mc4 contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/mc4
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
Dataset size for mc4 is not provided by uploader
/root/.cache/huggingface/modules/datasets_modules/datasets/mc4/78f7a2b7e2524fa44ee464ef429d011c365f5fe129283869e7fd76856aacb83a/mc4.py:284: FutureWarning: Dataset 'mc4' is deprecated and will be deleted. Use 'allenai/c4' instead.
warnings.warn(
17
Size Used Avail Use% Mounted on
147G 30G 110G 22% /
492K 4.0K 488K 1% /dev
147G 30G 110G 22% /mnt/readonly
206G 11G 185G 6% /local_disk0
29G 20G 9.7G 67% /ttyd
16G 4.0K 16G 1% /dev/shm
6.2G 92K 6.2G 1% /run
5.0M 0 5.0M 0% /run/lock
4.0M 0 4.0M 0% /sys/fs/cgroup
10G 0 10G 0% /Workspace
1.0P 0 1.0P 0% /Volumes
1.0P 0 1.0P 0% /dbfs
21
23
24
True
25
/databricks/python_shell/dbruntime/huggingface_patches/datasets.py:45: UserWarning: The cache_dir for this dataset is /local_disk0/hf_cache/, which is not a persistent path.Therefore, if/when the cluster restarts, the downloaded dataset will be lost.The persistent storage options for this workspace/cluster config are: [DBFS, UC Volumes].Please update either `cache_dir` or the environment variable `HF_DATASETS_CACHE`to be under one of the following root directories: ['/dbfs/', '/Volumes/']
warnings.warn(warning_message)
26
'/Volumes/main/default/my-volume/hf_imdb_cache'
29
/databricks/python/lib/python3.11/site-packages/datasets/load.py:1486: FutureWarning: The repository for oscar contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/oscar
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
/databricks/python_shell/dbruntime/huggingface_patches/datasets.py:127: UserWarning: The dataset would be saved to both local disk and PersistentStorageType.VOLUMES for better performance.
warnings.warn(
31
/databricks/python_shell/dbruntime/huggingface_patches/datasets.py:45: UserWarning: The cache_dir for this dataset is /root/.cache, which is not a persistent path.Therefore, if/when the cluster restarts, the downloaded dataset will be lost.The persistent storage options for this workspace/cluster config are: [DBFS, UC Volumes].Please update either `cache_dir` or the environment variable `HF_DATASETS_CACHE`to be under one of the following root directories: ['/dbfs/', '/Volumes/']
warnings.warn(warning_message)
/databricks/python/lib/python3.11/site-packages/datasets/load.py:1486: FutureWarning: The repository for oscar contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/oscar
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
/databricks/python_shell/dbruntime/huggingface_patches/datasets.py:99: UserWarning: This dataset will be stored in /root/.cache, which has a limited available space of 109.8GB, while the required size is 1.6TB. Set `cache_dir` or the environment variable `HF_DATASETS_CACHE` to be either under `/local_disk0/` to use elastic local disk or one of the available persistent storage options: [DBFS, UC Volumes].
warnings.warn(
<class 'datasets.dataset_dict.IterableDatasetDict'>