Programmatically interact with workspace files

You can interact with workspace files stored in Databricks programmatically. This enables tasks such as:

  • Storing small data files alongside notebooks and code.

  • Writing log files to directories synced with Git.

  • Importing modules using relative paths.

  • Creating or modifying an environment specification file.

  • Writing output from notebooks.

  • Writing output from execution of libraries such as Tensorboard.

You can read and import workspace files using Databricks Repos in Databricks Runtime 8.4 or above. You can programmatically create, edit, and delete workspace files in Databricks Runtime 11.2 and above.

Note

To disable writing to workspace files, set the cluster environment variable WSFS_ENABLE_WRITE_SUPPORT=false. For more information, see Environment variables.

Read data workspace files

You can programmatically read small data files such as .csv or .json files from code in your notebooks. The following example uses Pandas to query files stored in a /data directory relative to the root of the project repo:

import pandas as pd
df = pd.read_csv("./data/winequality-red.csv")
df

You can use Spark to read data files. You must provide Spark with the fully qualified path. Workspace files in Repos use the path file:/Workspace/Repos/<user-folder>/<repo-name>/file.

You can copy the absolute or relative path to a file in a repo from the drop-down menu next to the file:

file drop down menu

The example below shows the use of {os.getcwd()} to get the full path.

import os
spark.read.format("csv").load(f"file:{os.getcwd()}/my_data.csv")

To learn more about files on Databricks, see How to work with files on Databricks.

Programmatically create, update, and delete files and directories

In Databricks Runtime 11.2 and above, you can directly manipulate workspace files in Databricks. The following examples use standard Python packages and functionality to create and manipulate files and directories.

# Create a new directory

os.mkdir('dir1')

# Create a new file and write to it

with open('dir1/new_file.txt', "w") as f:
    f.write("new content")

# Append to a file

with open('dir1/new_file.txt', "a") as f:
    f.write(" continued")

# Delete a file

os.remove('dir1/new_file.txt')

# Delete a directory

os.rmdir('dir1')