Compare Auto Loader file detection modes
Auto Loader supports two modes for detecting new files: directory listing and file notification. You can switch file discovery modes across stream restarts and still obtain exactly-once data processing guarantees.
Directory listing mode
In directory listing mode, Auto Loader identifies new files by listing the input directory. Directory listing mode allows you to quickly start Auto Loader streams without any permission configurations other than access to your data on cloud storage.
In Databricks Runtime 9.1 and above, Auto Loader can automatically detect whether files are arriving with lexical ordering to your cloud storage and significantly reduce the amount of API calls needed to detect new files. See Auto Loader streams with directory listing mode for more details.
File notification mode (recommended)
File notification mode leverages file notification and queue services in your cloud infrastructure account. Auto Loader can automatically set up a notification service and queue service that subscribe to file events from the input directory. If you enable file events on the external location that contains the files in question, you do not need to provide additional permissions when you set up the Auto Loader stream.
File notification mode with file events is more performant and scalable than directory listing. Databricks recommends file notification mode using file events instead of directory listing mode for most workloads. If you are using Auto Loader in directory listing mode today, Databricks recommends that you migrate to file notification mode using mfile events to see significant performance improvements. See Configure Auto Loader streams in file notification mode.
Cloud storage supported by modes
This table lists supported compute for each file detection mode, by cloud storage provider.
If you migrate from an external location or a DBFS mount to a Unity Catalog volume, Auto Loader continues to provide exactly-once guarantees.
Cloud storage | Directory listing | File notifications without file events | File notifications with file events |
---|---|---|---|
AWS S3 | All versions | All versions | Databricks Runtime 14.3 LTS and above |
ADLS | All versions | All versions | Databricks Runtime 14.3 LTS and above |
GCS | All versions | All versions | Databricks Runtime 14.3 LTS and above |
Azure Blob Storage | All versions | All versions | Unsupported |
DBFS | All versions | For mount points only | Databricks Runtime 14.3 LTS and above, if the DBFS mount point has an external location defined in Unity Catalog |
Unity Catalog volume | Databricks Runtime 13.3 LTS and above | Unsupported | Databricks Runtime 14.3 LTS and above |