What is the default current working directory in Databricks Runtime 14.0 and above?
In Databricks Runtime 14.0 and above, the default current working directory (CWD) for code executed locally is the directory containing the notebook or script being run. This includes code such as %sh
and Python or R code not using Spark.
For notebooks in the Workspace running Databricks Runtime 13.3 LTS and below, the CWD for these commands was the ephemeral storage volume attached to the driver.
This change brings notebook interaction with workspace files in line with behavior observed in Databricks Repos.
Note
Scala code cannot interact natively with workspace files and continues to use ephemeral storage attached to the driver as the CWD.
How does this impact workloads?
The biggest impacts to workloads have to do with file persistance and location.
In Databricks Runtime 13.3 LTS and below, many code snippets store data to a default location on an ephemeral storage volume that is permanently deleted when the cluster is terminated.
In Databricks Runtime 14.0 and above, the default behavior for these operations creates workspace files stored alongside the running notebook that persist until explicitly deleted.
For notes on performance differences and other limitations inherent in workspace files, see Limitations.
Where do deleted files go?
Deleting a workspace file sends it to the trash. You can either recover or permanently delete files from the trash using the UI.
See Delete an object.
Revert to legacy behavior
You can change the current working directory for any notebook using the Python method os.chdir()
. If you want to ensure that each notebook uses a CWD on the ephemeral storage volumes attached to the driver, you can add the following command to the first cell of each notebook and run it before any other code:
import os
os.chdir("/tmp")
You can also revert to the legacy behavior by setting the following Spark configuration:
spark.databricks.wsfs.workspaceNotebookCwd false
Limitations
The following limitations exist:
Executors cannot write to workspace files.
Write-heavy workloads (such as
unzip
andtar
) have increased latency.Workspace file size is limited to 200MB. Operations that attempt to download or create files larger than this limit fail.
You cannot use
git
commands when saving to workspace files. The creation of.git
directories is not allowed in workspace files.Symlinks are not supported.