Reprocessing & Batching
This feature is exclusive for Premium users.
Sometimes you need to reprocess a specific date range - maybe you fixed a bug in your custom lineage, changed attribution logic, or GA4 revised its raw data. Instead of rebuilding the entire history, you can target exact dates.
Quick start​
To reprocess a single date range, pass REPROCESS_START_DATE and REPROCESS_END_DATE as Dataform variables:
dataform run \
--vars="REPROCESS_START_DATE=2025-06-01,REPROCESS_END_DATE=2025-06-30" \
--tags="events,sessions"
This will:
- Override the normal incremental checkpoint with your start date
- Delete existing data in that date range from the target tables
- Re-insert freshly computed data for those dates
Batch reprocessing script​
For large date ranges (months or years of data), a single dataform run can hit BigQuery's 6-hour query timeout. The batch-reprocess.sh script splits the range into manageable batches and runs them sequentially.
Location: scripts/batch-reprocess.sh (included with Premium installations)
Prerequisites:
dataformCLI installed and authenticated with GCP- Run from the Dataform repository root
Usage​
./scripts/batch-reprocess.sh <start_date> <end_date> [batch_size_days] [tags] [extra_vars]
| Parameter | Required | Default | Description |
|---|---|---|---|
start_date | Yes | - | First date to reprocess (YYYY-MM-DD) |
end_date | Yes | - | Last date to reprocess (YYYY-MM-DD) |
batch_size_days | No | 30 | Number of days per batch |
tags | No | events,sessions | Dataform tags to run |
extra_vars | No | - | Additional Dataform variables (comma-separated key=value pairs) |
Examples​
Reprocess all of 2024 in 30-day batches:
./scripts/batch-reprocess.sh 2024-01-01 2024-12-31
Reprocess with smaller batches (useful for high-traffic properties):
./scripts/batch-reprocess.sh 2024-01-01 2024-12-31 14
Reprocess only events (skip sessions):
./scripts/batch-reprocess.sh 2024-01-01 2024-06-30 30 "events"
Reprocess into a test dataset:
./scripts/batch-reprocess.sh 2024-01-01 2024-03-31 30 "events,sessions" \
"OUTPUTS_DATASET=superform_outputs_test,TRANSFORMATIONS_DATASET=superform_transformations_test"
What happens during a batch run​
The script loops through the date range in chunks:
--- Batch 1: 2024-01-01 -> 2024-01-30 ---
--- Batch 2: 2024-01-31 -> 2024-03-01 ---
--- Batch 3: 2024-03-02 -> 2024-03-31 ---
...
All 4 batches completed successfully.
Each batch runs a full dataform run with the date range variables. If any batch fails, the script stops immediately so you can investigate before continuing.
Reprocessing from Dataform UI​
You can trigger a reprocess directly from the Dataform UI by creating a dedicated Release configuration with the reprocessing variables, and a Workflow configuration that runs it on demand.
1. Create a Release configuration​
Navigate to Releases and Scheduling in your Dataform repository and create a new Release configuration:
- Release ID: e.g.
reprocess - Compilation variables: add the following:
REPROCESS_START_DATE=2025-06-01REPROCESS_END_DATE=2025-06-30LABEL_EXECUTION_OPERATION=reprocessing
2. Create a Workflow configuration​
Create a new Workflow configuration attached to the release you just created:
- Release configuration: select your
reprocessrelease - Schedule frequency: select On-demand (not daily)
- Selection of tags: select the tags you want to reprocess (e.g.
events,sessions) - Run with full refresh: leave unchecked (the reprocess variables handle the date range)
3. Run the workflow​
Click the three dots on the workflow configuration and select Run now.
The UI method works well for small, one-off reprocesses. For anything spanning more than ~60 days, use the batch script to avoid timeouts. Remember to update the compilation variables in the Release configuration each time you need a different date range.
Important notes​
- Reprocessing deletes and re-inserts data for the specified date range. It does not append.
- Each batch is labeled with
LABEL_EXECUTION_OPERATION=reprocessingfor tracking in BigQuery audit logs. - If you have custom lineage tables enabled, they are also reprocessed within the same date range.