Skip to main content

Reprocessing & Batching

💎Premium Feature

This feature is exclusive for Premium users.

Sometimes you need to reprocess a specific date range - maybe you fixed a bug in your custom lineage, changed attribution logic, or GA4 revised its raw data. Instead of rebuilding the entire history, you can target exact dates.

Quick start​

To reprocess a single date range, pass REPROCESS_START_DATE and REPROCESS_END_DATE as Dataform variables:

dataform run \
--vars="REPROCESS_START_DATE=2025-06-01,REPROCESS_END_DATE=2025-06-30" \
--tags="events,sessions"

This will:

  1. Override the normal incremental checkpoint with your start date
  2. Delete existing data in that date range from the target tables
  3. Re-insert freshly computed data for those dates

Batch reprocessing script​

For large date ranges (months or years of data), a single dataform run can hit BigQuery's 6-hour query timeout. The batch-reprocess.sh script splits the range into manageable batches and runs them sequentially.

Location: scripts/batch-reprocess.sh (included with Premium installations)

Prerequisites:

  • dataform CLI installed and authenticated with GCP
  • Run from the Dataform repository root

Usage​

./scripts/batch-reprocess.sh <start_date> <end_date> [batch_size_days] [tags] [extra_vars]
ParameterRequiredDefaultDescription
start_dateYes-First date to reprocess (YYYY-MM-DD)
end_dateYes-Last date to reprocess (YYYY-MM-DD)
batch_size_daysNo30Number of days per batch
tagsNoevents,sessionsDataform tags to run
extra_varsNo-Additional Dataform variables (comma-separated key=value pairs)

Examples​

Reprocess all of 2024 in 30-day batches:

./scripts/batch-reprocess.sh 2024-01-01 2024-12-31

Reprocess with smaller batches (useful for high-traffic properties):

./scripts/batch-reprocess.sh 2024-01-01 2024-12-31 14

Reprocess only events (skip sessions):

./scripts/batch-reprocess.sh 2024-01-01 2024-06-30 30 "events"

Reprocess into a test dataset:

./scripts/batch-reprocess.sh 2024-01-01 2024-03-31 30 "events,sessions" \
"OUTPUTS_DATASET=superform_outputs_test,TRANSFORMATIONS_DATASET=superform_transformations_test"

What happens during a batch run​

The script loops through the date range in chunks:

--- Batch 1: 2024-01-01 -> 2024-01-30 ---
--- Batch 2: 2024-01-31 -> 2024-03-01 ---
--- Batch 3: 2024-03-02 -> 2024-03-31 ---
...
All 4 batches completed successfully.

Each batch runs a full dataform run with the date range variables. If any batch fails, the script stops immediately so you can investigate before continuing.

Reprocessing from Dataform UI​

You can trigger a reprocess directly from the Dataform UI by creating a dedicated Release configuration with the reprocessing variables, and a Workflow configuration that runs it on demand.

1. Create a Release configuration​

Navigate to Releases and Scheduling in your Dataform repository and create a new Release configuration:

  • Release ID: e.g. reprocess
  • Compilation variables: add the following:
    • REPROCESS_START_DATE = 2025-06-01
    • REPROCESS_END_DATE = 2025-06-30
    • LABEL_EXECUTION_OPERATION = reprocessing

2. Create a Workflow configuration​

Create a new Workflow configuration attached to the release you just created:

  • Release configuration: select your reprocess release
  • Schedule frequency: select On-demand (not daily)
  • Selection of tags: select the tags you want to reprocess (e.g. events, sessions)
  • Run with full refresh: leave unchecked (the reprocess variables handle the date range)

3. Run the workflow​

Click the three dots on the workflow configuration and select Run now.

tip

The UI method works well for small, one-off reprocesses. For anything spanning more than ~60 days, use the batch script to avoid timeouts. Remember to update the compilation variables in the Release configuration each time you need a different date range.

Important notes​

  • Reprocessing deletes and re-inserts data for the specified date range. It does not append.
  • Each batch is labeled with LABEL_EXECUTION_OPERATION=reprocessing for tracking in BigQuery audit logs.
  • If you have custom lineage tables enabled, they are also reprocessed within the same date range.