Skip to main content

Ingest Workday reports incrementally

Beta

Incremental ingestion is in Beta.

By default, Lakeflow Connect ingests Workday reports using snapshot mode, which reads all rows in the report on each pipeline update. Incremental ingestion instead reads only the data that has changed since the last pipeline update. This reduces the load on your Workday instance and lowers the cost and time of each pipeline run.

How incremental ingestion works

Incremental ingestion uses a cursor column (also called a sequence_by column) to track progress. The cursor column must be a date column in your Workday report. On each pipeline update, the connector uses the cursor to determine what data is new and reads only those rows.

To filter the report by date, your Workday admin defines prompts on the cursor column. Prompts are parameters in the Workday report URL that let you pass a date range into the report at query time. The prompt names are chosen by your Workday admin and must match the parameter names you specify in your pipeline URL. For example, a Workday admin might define prompts named Date_Start and Date_End on a Date column — but the actual names depend on how the report is configured in Workday.

Initial pipeline update

When you run the pipeline for the first time, the connector ingests all data from the Date_Start value you specify in the pipeline definition up to the current date. For example, if your report contains data from 2025-01-01 but you set Date_Start to 2025-02-06, the initial update ingests data from 2025-02-06 onward.

After the initial update completes, the connector stores the maximum cursor value found in the ingested data as the new sequence_by value. This value determines where the next pipeline update begins.

Subsequent pipeline updates

On each subsequent update, the connector ingests all new rows since the last update, using the stored sequence_by value as the start of the new ingestion window. The connector then updates the stored sequence_by value to reflect the latest cursor value. For example, if the initial update ingested data through 2025-02-12, the next update reads rows with a cursor value on or after 2025-02-12.

Keeping the pipeline a fixed number of days behind

If you want the pipeline to lag behind the current date by a fixed interval, set the Date_End prompt to an expression like current date - INTERVAL 2 DAY. The pipeline then maintains that lag on every subsequent update.

Limitations

The following limitations apply to incremental ingestion for Workday reports. For additional connector limitations, see Workday Reports connector limitations.

  • Monotonically increasing cursor required. The cursor column must increase with each new or updated row. Rows whose cursor value is greater than the last ingested cursor value are ingested — this includes both new rows and updated rows that advance the cursor. Rows that are deleted from the source report are never ingested. If your report contains rows that are deleted after initial ingestion, use snapshot mode instead.
  • Date cursor only. The cursor column must be a date column. Other column types are not supported as a cursor.
  • Inclusive prompts only. The Workday report prompts on the cursor column must use inclusive operators (greater than or equal to / less than or equal to). Reports with exclusive prompts (greater than / less than) can result in missing data. The report owner selects the prompt type when creating the report and its prompts in Workday.

Configure incremental ingestion

Incremental ingestion for Workday reports is API-only. You can't configure it using the Databricks UI. Use Declarative Automation Bundles or the Databricks CLI.

Prerequisites

Before you configure the pipeline:

  • Complete the source setup and confirm that a Unity Catalog connection to Workday exists.
  • Ask your Workday admin to define inclusive prompts on the cursor column in the Workday report. For example, prompts named Date_Start and Date_End on a Date column. The report owner selects the is on or after / is on or before operator when creating the prompts in Workday.

Pipeline definition

To enable incremental ingestion, add the following fields to your pipeline definition:

  • In table_configuration, add a sequence_by field set to the name of the cursor column in the report.
  • In table_configuration, add a workday_report_parameters.parameters map that sets the XML aliases of your Workday prompts to filter expressions. Use coalesce(current_offset(), date(...)) for the start prompt to resume from the last cursor value on subsequent runs, or fall back to a fixed start date on the first run.

The following example ingests the Payroll_Journal report incrementally, using Journal_Date as the cursor column and Date_Start and Date_End as the Workday prompt XML aliases. The pipeline starts from 2025-01-01 on the first run and lags one day behind the current date:

YAML
variables:
dest_catalog:
default: main
dest_schema:
default: ingest_destination_schema

resources:
pipelines:
pipeline_workday:
name: workday_incremental_pipeline
catalog: ${var.dest_catalog}
schema: ${var.dest_schema}
ingestion_definition:
connection_name: <workday-connection>
objects:
- report:
source_url: https://wd2-impl-services1.workday.com/ccx/service/customreport2/Payroll_Journal?format=json
destination_catalog: ${var.dest_catalog}
destination_schema: ${var.dest_schema}
destination_table: Payroll_Journal
table_configuration:
primary_keys:
- Journal_ID
sequence_by:
- Journal_Date
workday_report_parameters:
parameters:
Date_Start: '{coalesce(current_offset(), date("2025-01-01"))}'
Date_End: '{current_date() - INTERVAL 1 DAY}'

For more pipeline examples, including a rolling-window pattern, see Ingest incrementally, starting from a fixed date.