Skip to main content

Reprocessing & Batching

💎Premium Feature

This feature is exclusive for Premium users.

What it does​

The reprocessing feature extends the GA4 module's incremental pipeline to support targeted date range re-runs. Normally, each table tracks its own checkpoint - the most recent event_date (or session_date) where is_final = true - and only processes new data beyond that point. Reprocessing overrides this checkpoint with an explicit date range.

How it works​

Two Dataform variables control reprocessing:

VariableEffect
REPROCESS_START_DATEReplaces the incremental checkpoint with this date
REPROCESS_END_DATECaps the processing window at this date (normally unlimited)

When these variables are set, each table's pre_operations block:

  1. Sets date_checkpoint to REPROCESS_START_DATE (instead of computing it from the table)
  2. Sets date_end to REPROCESS_END_DATE (instead of 9999-12-31)
  3. Deletes existing rows where the date column falls between date_checkpoint and date_end
  4. Re-inserts freshly computed data for that range

This applies to all incremental tables in the pipeline:

  • ga4_events (partitioned by event_date)
  • int_ga4_sessions (partitioned by session_date)
  • ga4_sessions (partitioned by session_date)
  • Custom lineage tables when enabled (ga4_events_custom, int_ga4_sessions_custom, ga4_sessions_custom)

Interaction with other features​

User Stitching​

When lookAheadDays is configured, ga4_sessions extends the reprocess window to cover the look-ahead period. This ensures back-stitched sessions are correctly updated.

Custom Lineage​

Custom lineage tables with "incremental" materialization participate in reprocessing automatically - they use the same generatePreOperationsSQL() helper and respect the same date range variables.

Assertions​

Assertions run after reprocessing and validate the reprocessed data like any other run.

Batching​

For large date ranges (months or years), a single reprocess can hit BigQuery's 6-hour query timeout. The included scripts/batch-reprocess.sh script splits the range into configurable batches (default: 30 days) and runs them sequentially. Each batch labels its execution with LABEL_EXECUTION_OPERATION=reprocessing for audit tracking.

./scripts/batch-reprocess.sh 2024-01-01 2024-12-31 30 "events,sessions"

If any batch fails, the script stops immediately - completed batches are already persisted, so you can resume from where it stopped.

Practical usage​

For the full how-to (CLI, script parameters, UI instructions, and common scenarios), see the Reprocessing & Batching guide.