Reprocessing & Batching
This feature is exclusive for Premium users.
What it does​
The reprocessing feature extends the GA4 module's incremental pipeline to support targeted date range re-runs. Normally, each table tracks its own checkpoint - the most recent event_date (or session_date) where is_final = true - and only processes new data beyond that point. Reprocessing overrides this checkpoint with an explicit date range.
How it works​
Two Dataform variables control reprocessing:
| Variable | Effect |
|---|---|
REPROCESS_START_DATE | Replaces the incremental checkpoint with this date |
REPROCESS_END_DATE | Caps the processing window at this date (normally unlimited) |
When these variables are set, each table's pre_operations block:
- Sets
date_checkpointtoREPROCESS_START_DATE(instead of computing it from the table) - Sets
date_endtoREPROCESS_END_DATE(instead of9999-12-31) - Deletes existing rows where the date column falls between
date_checkpointanddate_end - Re-inserts freshly computed data for that range
This applies to all incremental tables in the pipeline:
ga4_events(partitioned byevent_date)int_ga4_sessions(partitioned bysession_date)ga4_sessions(partitioned bysession_date)- Custom lineage tables when enabled (
ga4_events_custom,int_ga4_sessions_custom,ga4_sessions_custom)
Interaction with other features​
User Stitching​
When lookAheadDays is configured, ga4_sessions extends the reprocess window to cover the look-ahead period. This ensures back-stitched sessions are correctly updated.
Custom Lineage​
Custom lineage tables with "incremental" materialization participate in reprocessing automatically - they use the same generatePreOperationsSQL() helper and respect the same date range variables.
Assertions​
Assertions run after reprocessing and validate the reprocessed data like any other run.
Batching​
For large date ranges (months or years), a single reprocess can hit BigQuery's 6-hour query timeout. The included scripts/batch-reprocess.sh script splits the range into configurable batches (default: 30 days) and runs them sequentially. Each batch labels its execution with LABEL_EXECUTION_OPERATION=reprocessing for audit tracking.
./scripts/batch-reprocess.sh 2024-01-01 2024-12-31 30 "events,sessions"
If any batch fails, the script stops immediately - completed batches are already persisted, so you can resume from where it stopped.
Practical usage​
For the full how-to (CLI, script parameters, UI instructions, and common scenarios), see the Reprocessing & Batching guide.