What are workspace files?
A workspace file is any file in the Databricks workspace that is not a Databricks notebook. Workspace files can be any file type. Common examples include:
.pyfiles used in custom modules.
.mdfiles, such as
.csvor other small data files.
Databricks provides functionality similar to local development for many workspace file types, including a built-in file editor. Not all use cases for all file types are supported. For example, while you can include images in an imported directory or repository, you cannot embed images in notebooks.
You can create, edit, and manage access to workspace files using familiar patterns from notebook interactions. You can use relative paths for library imports from workspace files, similar to local development. For more details, see:
Init scripts stored in workspace files have special behavior. See Store init scripts in workspace files.
Enabling workspace files
Workspace files are enabled everywhere by default for Databricks Runtime 11.2 and above. Files in Repos is enabled by default in Databricks Runtime 11.0 and above, and can be manually disabled or enabled. See Configure support for Files in Repos.
You can use workspace files to store and reference init scripts regardless of Databricks Runtime versions. See Store init scripts in workspace files.
In Databricks Runtime 8.4 and above, you can sync, import, and read non-notebook files within a Databricks repo. You can also view and edit files in the Databricks UI.
In Databricks Runtime 11.2 and above, you can programmatically write or delete workspace files within a Databricks repo.
While enabling Files in Repos changes the current working directory for driver operations to the directory containing the notebook executing code, notebooks outside of a repo behave differently when interacting with workspace files, with the current working directory defaulting to the driver block storage volume. See How to work with files on Databricks.
Configure support for Files in Repos
To work with non-notebook files in Databricks Repos, you must be running Databricks Runtime 8.4 or above. You must be running Databricks Runtime 11.2 or above to programmatically create or delete workspace files.
If support for File in Repos is not enabled, you still see non-notebook files in a Databricks repo, but you cannot work with them.
An admin can configure this feature as follows:
Go to the admin settings page.
Click the Workspace settings tab.
In the Repos section, select an option from the Files in Repos dropdown.
To ensure all configurations have been applied, you must refresh your browser and restart your compute cluster.
When you enable Files in Repos for the first time, you might need to open the Git dialog and perform a pull operation to sync non-notebook files in the repo. If there are any merge conflicts, a dialog appears giving you the option to either discard your conflicting changes or push your changes to a new branch.
Confirm Files in Repos is enabled
You can use the command
%sh pwd in a notebook inside a repo to check if Files in Repos is enabled.
If Files in Repos is not enabled, the response is
If Files in Repos is enabled, the response is
/Workspace/Repos/<path to notebook directory>.
Access files in Repos from a cluster that is using Databricks Container Services
You can use Files in Repos by default with Databricks Container Services (DCS) on clusters running Databricks Runtime 11.3 and above.
You can access files in Repos on a cluster with DCS in Databricks Runtime versions 10.4 LTS and 9.1 LTS by configuring the dockerfile. Refer to the following dockerfiles for the desired Databricks Runtime version: