In Databricks Runtime 14.0 and above, the default current working directory (CWD) for code executed locally is the directory containing the notebook or script being run. This includes code such as
%sh and Python or R code not using Spark.
For notebooks in the Workspace running Databricks Runtime 13.3 LTS and below, the CWD for these commands was the ephemeral storage volume attached to the driver.
This change brings notebook interaction with workspace files in line with behavior observed in Databricks Repos.
Scala code cannot interact natively with workspace files and continues to use ephemeral storage attached to the driver as the CWD.
The biggest impacts to workloads have to do with file persistance and location.
In Databricks Runtime 13.3 LTS and below, many code snippets store data to a default location on an ephemeral storage volume that is permanently deleted when the cluster is terminated.
In Databricks Runtime 14.0 and above, the default behavior for these operations creates workspace files stored alongside the running notebook that persist until explicitly deleted.
For notes on performance differences and other limitations inherent in workspace files, see Limitations.
Deleting a workspace file sends it to the trash. You can either recover or permanently delete files from the trash using the UI.
See Delete an object.
You can change the current working directory for any notebook using the Python method
os.chdir(). If you want to ensure that each notebook uses a CWD on the ephemeral storage volumes attached to the driver, you can add the following command to the first cell of each notebook and run it before any other code:
import os os.chdir("/tmp")
You can also revert to the legacy behavior by setting the following Spark configuration:
The following limitations exist:
Executors cannot write to workspace files.
Write-heavy workloads (such as
tar) have increased latency.
Workspace file size is limited to 200MB. Operations that attempt to download or create files larger than this limit fail.
You cannot use
gitcommands when saving to workspace files. The creation of
.gitdirectories is not allowed in workspace files.
Symlinks are not supported.