What are Workspace Files?

Workspace Files allows you to work with non-notebook files in Databricks Repos. Workspace Files can be any file type. Common examples include:

  • .py files used in custom modules.

  • .md files, such as README.md.

  • .csv or other small data files.

  • .txt files.

  • Log files.

Workspace Files are enabled by default in Databricks Repos for Databricks Runtime 11.0 and above. See Configure support for Workspace Files.

In Databricks Runtime 8.4 and above, you can sync, import, and read non-notebook files within a Databricks repo. You can also view and edit files in the Databricks UI.

In Databricks Runtime 11.2 and above, you can programmatically write or delete Workspace Files within a Databricks repo.

While Databricks notebooks have different functionality and support, basic file manipulation in the Repos UI is nearly identical for Workspace Files and notebooks. See Workspace Files basic usage and Manage notebooks.

Configure support for Workspace Files

To work with non-notebook files in Databricks Repos, you must be running Databricks Runtime 8.4 or above. You must be running Databricks Runtime 11.2 or above to programmatically create or delete Workspace Files.

If support for File in Repos is not enabled, you still see non-notebook files in a Databricks repo, but you cannot work with them.

An admin can configure this feature as follows:

  1. Go to the Admin Console.

  2. Click the Workspace settings tab.

  3. In the Repos section, select an option from the Files in Repos dropdown.

To ensure all configurations have been applied, you must refresh your browser and restart your compute cluster.

Note

When you enable Files in Repos for the first time, you might need to open the Git dialog and perform a pull operation to sync non-notebook files in the repo. If there are any merge conflicts, a dialog appears giving you the option to either discard your conflicting changes or push your changes to a new branch.

Confirm Files in Repos is enabled

You can use the command %sh pwd in a notebook inside a repo to check if Files in Repos is enabled.

  • If Files in Repos is not enabled, the response is /databricks/driver.

  • If Files in Repos is enabled, the response is /Workspace/Repos/<path to notebook directory> .

Access files in Repos from a cluster that is using Databricks Container Services

You can use Files in Repos by default with Databricks Container Services (DCS) on clusters running Databricks Runtime 11.3 and above.

You can access files in Repos on a cluster with DCS in Databricks Runtime versions 10.4 LTS and 9.1 LTS by configuring the dockerfile. Refer to the following dockerfiles for the desired Databricks Runtime version:

See Customize containers with Databricks Container Services