Databricks datasets

Databricks includes a variety of datasets mounted to Databricks File System (DBFS). These datasets are used in examples throughout the documentation.

To browse these files in Data Science & Engineering or Databricks Machine Learning using Python, R, or Scala, you can use Databricks Utilities. Here’s a Python example that you can use in a notebook to list all of the Databricks datasets.


To get more information about any dataset, you can use a local file API to print out the dataset README.

f = open("/dbfs/databricks-datasets/", "r")

Here’s how to create a table from a Databricks dataset in a Data Science & Engineering SQL notebook or in the Databricks SQL query editor:

CREATE TABLE default.people10m OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/people/')