Databricks datasets

Databricks includes a variety of datasets mounted to Databricks File System (DBFS). These datasets are used in examples throughout the documentation.

To browse these files in Data Science & Engineering or Databricks Machine Learning using Python, R, or Scala, you can use Databricks Utilities. Here’s a Python example that you can use in a notebook to list all of the Databricks datasets.

display(dbutils.fs.ls("/databricks-datasets"))

To get more information about any dataset, you can use a local file API to print out the dataset README.

f = open("/dbfs/databricks-datasets/README.md", "r")
print(f.read())

Here’s how to create a table from a Databricks dataset in a Data Science & Engineering SQL notebook or in the Databricks SQL query editor:

CREATE TABLE default.people10m OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/people/people-10m.delta')