Trigger jobs when source tables are updated
You can use table update triggers to trigger a run of your job when source tables are updated. Use this feature to run a job when new data is ready without the need for a continuously running cluster or knowledge of the processes that update a table.
How do table update triggers work?
A table update trigger checks for table updates, and when a table is updated, the job is run. The trigger can run when one table updates, or when all tables being monitored by the trigger are updated. Table update triggers do not incur additional costs other than cloud provider costs associated with listing tables and reading updates from the storage location.
A table update trigger can be configured to monitor one or more tables for data changes such as updates, merges and deletes. These tables can be Unity Catalog Delta and Iceberg managed tables, Unity Catalog external tables backed by Delta Lake, materialized views, and streaming tables are supported. Views, and tables shared via Delta Sharing are not supported. When selecting multiple tables you can specify if a job is triggered when any or all of the tables are updated.
Add a table update trigger
To add a table update trigger to an existing job:
-
In the left navigation of your workspace, click Jobs & Pipelines.
-
In the list of jobs, click the name of the job to which you want to add a trigger.
-
On the right panel, under Schedules & Triggers, click Add trigger.
-
In Trigger type, select Table update.
-
Under Tables, add the tables you want to monitor for updates.
If you select more than one table, configure an option under Trigger when to specify if you want a job run to be triggered when All tables are updated or when Any table is updated.
-
(Optional) Configure advanced options, by clicking Advanced.
- Minimum time between triggers in seconds: The minimum time to wait to trigger a run after a previous run completes. Tables updated during this period will trigger a run only after the waiting time expires. Databricks waits this amount of time before triggering a run, even if monitored tables are updated.
- Wait after last change in seconds: The time to wait to trigger a run after a table update. Additional table updates during this period reset the timer. This setting can be used when table updates come in batches, and the whole batch needs to be processed after all tables are updated.
noteIf both options are used, then the trigger waits the minimum time between triggers, and then waits the set amount of time after the last change. For example, if you have a minimum time of 120 seconds, and a wait after last changes of 60 seconds, then it will not trigger the run until at least 120 seconds has elapsed, even if a table update happens within the first 60 seconds. Additionally, if one update comes in at 5 seconds, and then another at 115 seconds, then the wait after last change will mean that a run will not be triggered until after 175 seconds.
-
To validate the configuration, click Test trigger.
-
Click Save.
You can also configure table update triggers from the Jobs API. Add a trigger
object to a jobs/create
, jobs/update
, or jobs/reset
operation.
Table update triggers with managed file events
For best performance and scalability, enable file events on the external location where the tables are stored. This one-time setup step improves the efficiency of table update triggers and unlocks other features, including more performant Auto Loader and file arrival triggers.
When file events are enabled, Databricks automatically tracks ingestion metadata using cloud provider change notifications, resulting in faster and more efficient table updates.
If your tables are in the metastore root-level storage, first convert them to an external location, then enable file events on that location.
Job parameters associated with table update triggers
When you use table update triggers for a job, three new dynamic value references are available to use as parameter values in the job.
{{job.trigger.table.updated_tables}}
- A JSON list of tables updated since the last job run.{{job.trigger.table.<catalog.schema.table>.commit_timestamp.iso_datetime}}
- the most recent commit timestamp that triggered the job run.{{job.trigger.table.<catalog.schema.table>.version}}
- the most recent commit version that triggered the job run.
For commit_timestamp
and version
, there are multiple versions of the dynamic value reference. Each table monitored will have a <catalog.schema.table>
with the fully qualified name of the table for which you want data. If there is only one table being monitored in the trigger, you will see a value without the <catalog.schema.table>
. For example, you can use {{job.trigger.table.commit_timestamp.iso_datetime}}
.
For more information on job parameters, see Parameterize jobs.
Receive notifications of failed table update triggers
To be notified if a table update trigger fails to evaluate, configure email or system destination notifications on job failure. See Add notifications on a job.
Limitations
- You can have up to 1000 table update triggers per workspace.
- You can select up to 10 managed or Delta tables per trigger.