Databricks includes a variety of datasets mounted to Databricks File System (DBFS). These datasets are used in examples throughout the documentation.
To browse these files in Data Science & Engineering or Databricks Machine Learning using Python, R, or Scala, you can use Databricks Utilities. Here’s a Python example that you can use in a notebook to list all of the Databricks datasets.
To get more information about any dataset, you can use a local file API to print out the dataset
f = open("/dbfs/databricks-datasets/README.md", "r") print(f.read())
Here’s how to create a table from a Databricks dataset in a Data Science & Engineering SQL notebook or in the Databricks SQL query editor:
CREATE TABLE default.people10m OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/people/people-10m.delta')