Databricks datasets

Databricks includes a variety of datasets mounted to the Databricks File System (DBFS) that you can use to either learn Apache Spark or test algorithms. You’ll see these throughout the documentation pages.

To browse these files, you can use Databricks Utilities. Here’s a code snippet that you can use to list all of the Databricks datasets.

display(dbutils.fs.ls("/databricks-datasets"))

You can print out the README for any dataset to get more information about it.

with open("/dbfs/databricks-datasets/README.md") as f:
    x = ''.join(f.readlines())

print(x)