Trigger jobs when new files arrive
You can use file arrival triggers to trigger a run of your Databricks job when new files arrive in an external location such as Amazon S3, Azure storage, or Google Cloud Storage. You can use this feature when a scheduled job might be inefficient because new data arrives on an irregular schedule.
File arrival triggers make a best effort to check for new files every minute, although this can be affected by the performance of the underlying cloud storage. File arrival triggers do not incur additional costs other than cloud provider costs associated with listing files in the storage location.
A file arrival trigger can be configured to monitor the root of a Unity Catalog external location or volume, or a subpath of an external location or volume. For example, for the Unity Catalog root volume /Volumes/mycatalog/myschema/myvolume/
, the following are valid paths for a file arrival trigger:
/Volumes/mycatalog/myschema/myvolume/
/Volumes/mycatalog/myschema/myvolume/mydirectory/
Requirements
The following are required to use file arrival triggers:
The workspace must have Unity Catalog enabled.
You must use a storage location that’s either a Unity Catalog volume or an external location added to the Unity Catalog metastore. See Create an external location to connect cloud storage to Databricks.
You must have
READ
permissions to the storage location and CAN MANAGE permissions on the job. For more information about job permissions, see Job ACLs.
Limitations
Only new files trigger runs. Overwriting an existing file with a file of the same name does not trigger a run.
A maximum of fifty jobs can be configured with a file arrival trigger in a Databricks workspace.
A storage location configured for a file arrival trigger can contain only up to 10,000 files. Locations with more files cannot be monitored for new file arrivals. If the configured storage location is a subpath of a Unity Catalog external location or volume, the 10,000 file limit applies to the subpath and not the root of the storage location. For example, the root of the storage location can contain more than 10,000 files across its subdirectories, but the configured subdirectory must not exceed the 10,000 file limit.
The path used for a file arrival trigger must not contain any external tables or managed locations of catalogs and schemas.
Add a file arrival trigger
To add a file arrival trigger to a job:
In the sidebar, click Workflows.
In the Name column on the Jobs tab, click the job name.
In the Job details panel on the right, click Add trigger.
In Trigger type, select File arrival.
In Storage location, enter the URL of the root or a subpath of a Unity Catalog external location or the root or a subpath of a Unity Catalog volume to monitor.
(Optional) Configure advanced options:
Minimum time between triggers in seconds: The minimum time to wait to trigger a run after a previous run completes. Files that arrive in this period trigger a run only after the waiting time expires. Use this setting to control the frequency of run creation.
Wait after last change in seconds: The time to wait to trigger a run after file arrival. Another file arrival in this period resets the timer. This setting can be used when files arrive in batches, and the whole batch needs to be processed after all files have arrived.
To validate the configuration, click Test connection.
Click Save.
Receive notifications of failed file arrival triggers
To be notified if a file arrival trigger fails to evaluate, configure email or system destination notifications on job failure. See Add email and system notifications for job events.