Recommendations for working with DBFS root

Databricks uses the DBFS root directory as a default location for some workspace actions. Databricks recommends against storing any production data or sensitive information in the DBFS root. This article focuses on recommendations to avoid accidental exposure of sensitive data on the DBFS root.

Note

Databricks configures a separate private storage location for persisting data and configurations in customer-owned cloud storage, known as the internal DBFS. This location is not exposed to users.

Educate users not to store data on DBFS root

Because the DBFS root is accessible to all users in a workspace, all users can access any data stored here. It is important to instruct users to avoid using this location for storing sensitive data. The default location for managed tables in the Hive metastore on Databricks is the DBFS root; to prevent end users who create managed tables from writing to the DBFS root, declare a location on external storage when creating databases in the Hive metastore.

Unity Catalog managed tables use a secure storage location by default. Databricks recommends using Unity Catalog for managed tables.

Use audit logging to monitor activity

You can use cloud audit logs with workspace audit logs to monitor and identify users that are storing data to the DBFS root.

Databricks recommends that you enable S3 object-level logging for your DBFS root bucket to allow faster investigation of issues. Be aware that enabling S3 object-level logging can increase your AWS usage cost.

Encrypt DBFS root data with a customer-managed key

You can encrypt DBFS root data with a customer-managed key. See Customer-managed keys for workspace storage.