Databricks includes a variety of datasets mounted to the Databricks File System (DBFS) that you can use to either learn Apache Spark or test algorithms. You’ll see these throughout the documentation pages.
To browse these files, you can use Databricks Utilities. Here’s a code snippet that you can use to list all of the Databricks datasets.
With each of those you can can then print out the
README for any dataset to get some more information about it.
with open("/dbfs/databricks-datasets/README.md") as f: x = ''.join(f.readlines()) print(x)