Changelog
Unreleasedβ
π Premiumβ
- Anomaly detection module
- BigQuery utilities module
- Google Search Console module
- Adding PII detection to parameter detection module
π§° Coreβ
- nothing planned
π€ Communityβ
- nothing planned
π οΈ Installerβ
Strict Act As mode updateβ
We have updated the installer to align with the Strict Act-As design pattern for Dataform execution, improving security, isolation of responsibilities, and auditability. Previously, all permissions were assigned to the Dataform Default Service Account, combining orchestration and BigQuery data access. With this update, permissions are now explicitly split between two service accounts.
- Dataform Default Service Account
- Dataform Service Agent
- Secret Manager Secret Accessor
- Service Account Token Creator
- Service Account User
This service account is now limited to orchestration, scheduling, metadata access, and impersonation only.
It no longer has direct access to BigQuery data.
- Custom Service Account
- BigQuery Data Editor
- BigQuery Data Viewer
- BigQuery Job User
This service account is used exclusively for query execution in BigQuery and is accessed only via impersonation during Dataform runs.
This change brings the installer in line with Google Cloud security best practices and Dataform Strict Act-As mode requirements.
Releasedβ
π [Premium v2.0.5] - 2025-12-18β
Schema Changesβ
Updating to this version will require some tables to be fully rebuild because of a addition of new columns
int_ga4_sessionsandga4_sessionswith newfirst_user_idcolumnint_ga4_transactionsandga4_transactionswith newproperty_idcolumnga4_eventswith new records in event_params
π Addedβ
unique_search term,gad_campaingid,gad_sourceas standard event paramsgclsrcas param to check to classify as Googlefirst_user_idinint_ga4_sessionsandga4_sessionsproperty_idas column inint_ga4_transactionsandga4_transactionsis_multiday_sessioncolumn : Flag indicating if the session has events across more than one day
π Fixedβ
- Add missing column documentation fields
Changedβ
- Standardized SQL style
- Renaming modules and tables
- Changed directory structure and main config file moved
π§° π€ [Core v1.19.0] - 2025-12-18β
Schema Changesβ
Updating to this version will require some tables to be fully rebuild because of a addition of new columns
int_ga4_sessionsandga4_sessionswith newfirst_user_idcolumnint_ga4_transactionsandga4_transactionswith newproperty_idcolumnga4_eventswith new records in event_params
π Addedβ
unique_search term,gad_campaingid,gad_sourceas standard event paramsgclsrcas param to check to classify as Googlefirst_user_idinint_ga4_sessionsandga4_sessionsproperty_idas column inint_ga4_transactionsandga4_transactionsis_multiday_sessioncolumn : Flag indicating if the session has events across more than one day
π Fixedβ
- Add missing column documentation fields
Changedβ
- Standardized SQL style
- Renaming modules and tables
- Changed directory structure and main config file moved
π [Premium v2.0.4]β
2025-11-11
π Addedβ
- Custom Channel Groupings
π [Premium v2.0.3]β
2025-11-06
π Addedβ
- Multi-property support
- Google Ads module
π [Premium v2.0.2]β
2025-10-23
π Addedβ
- Extend custom session-level parameter filtering options
π [Premium v2.0.1]β
2025-10-10
π Fixedβ
- Superform Labs internal bugfix iteration
π [Premium v2.0.0]β
2025-09-22
π Addedβ
- User stitching
- Support for using 'intraday' and 'fresh' tables
- BigQuery cost monitoring + dashboard
- Session totals in ga4_sessions
- Custom parameter aggregation at session level
- Custom parameter detection + dashboard
- Items reporting table
- Conversion rate reporting table
π§° π€ [Core v1.18.0]β
2025-08-27
π Addedβ
- AI referrers added to
source_categories.jsonfor improved default channel grouping logic. - Default channel grouping in
int_ga4_sessionsnow matches the source categories withLIKEoperator for enhanced pattern matching.
π Fixedβ
- Added
default_config.jswarning text + reordered file contents for better readability. - Updated
is_measurement_protocol_hitsince MP supports device parameters since 2025-06-11. - Added app_info.id check to fix
assertions_sessions_validityresulting in false positive for app events. - Added events_fresh_* filter to
assertions_tables_timelinessto prevent false positives.
π§° π€ [Core v1.17.0]β
2025-05-13
π Addedβ
- Labeling Tables and Jobs. 5 labels categories are availble: Operation, Brand, Licence type, Package version, Tool.
Changedβ
- Assertions set to
falseby default.
π οΈ [Installer]β
2025-05-13
π Addedβ
- Installer is now displaying Licence type, installer and models version.
- Backfilling job run from installer is now tracked with specific label
operation: backfilling.
π§° π€ [Core v1.16.0]β
2025-04-09
π Addedβ
- Bluesky referrals are now classified as
Organic Socialin the default channel group. - βOrganic AIβ added as new default channel group. Most common LLM referrers are added to this group. This feature can be enabled/disabled using the config option
EXTRA_CHANNEL_GROUPS.
π§° π€ [Core v1.15.0]β
2025-03-11
π Fixedβ
- Replaced
---with;while ensuring no duplicate semicolons appear, as the Dataform CLI was unable to handle them correctly. - Proper handling of sessions spanning midnight in incremental runs. Previously, sessions crossing midnight could result in duplicate session_ids. Multi-day sessions are now correctly appended to the intermediate table and properly deduplicated in the output.
π§° π€ [Core v1.14.0]β
2025-02-05
π Fixedβ
- Refactored
LOOKBACK_MILLISto be defined inline instead of a separate variable. - Removed
LAST_NON_DIRECT_LOOKBACK_MILLISfromdefault_config.js, ensuring calculations use a single inline formula.
π οΈ [Installer]β
2025-02-05
π Addedβ
- Added support for more than 50 datasets
π§° π€ [Core v1.13.0]β
2025-01-28
Changedβ
- Updated
generateFilterTypeFromListSQLfunction to properly handle NULL values. - Now uses
coalesce(column, "")to avoid NULL-related filtering issues.
π§° π€ [Core v1.12.0]β
2025-01-23
Changedβ
- Excluded "fresh" tables from exports in
ga4_events.sqlx. - Ensures that temporary fresh tables do not interfere with data processing.
π οΈ [Installer]β
2025-01-23
π Addedβ
- Added user menu with logout
π§° π€ [Core v1.10.0]β
2024-12-19
π Addedβ
- Initial commit introducing
ga4_transactionsmodel. - Documentation formatting and structure for
ga4_transactionscolumns. - Supports incremental processing with nested items.
- Expanded
ga4_sessionswith new columns for traffic source tracking. - Improved
demo_daily_sessions_reportschema with platform and stream ID tracking. - Introduced
_run_timestampcolumn inga4_eventsfor better traceability. - Updated documentation to include newly added columns in
ga4_events. - Additional columns to
ga4_sessionsandga4_eventstables. - Introduced
platformin session struct for better tracking.
π Fixedβ
- Improved logic for properties without "final" days, ensuring they now increment correctly in sessions, transactions, and events models.
Changedβ
- Adjusted schema definitions for
ga4_transactionsto improve refund tracking. - Updated documentation to reflect the schema modifications.
- Refined handling of refunds in transaction roll-ups.
- Refactored function names to improve readability.
- Added JSDoc comments for key helper functions.
- Default channel grouping logic to ensure "Direct" fallback for NULL values.
- Moved all event transformation logic to
outputs/ga4_events. - Simplified upstream intermediate models to reduce redundancy.
π§° π€ [Core v1.6.0]β
2024-10-24
Changedβ
- Streamlined data partitioning to improve performance.
- Introduced detailed column documentation for
ga4_eventsandga4_sessions.
Removedβ
- Removed
stream_idfromclusterByfields.
Changedβ
- Removed non-intraday columns from
session_traffic_source_last_clickstruct.
π οΈ [Installer]β
2024-10-24
π Addedβ
- Installer version
π§° π€ [Core v1.5.0]β
2024-10-16
π Addedβ
- Introduced
exit_content_groupto theexit_pagestruct. - Added all
click_idsto last non-direct session attribution logic. - Two new columns in the events table.
- Expanded
session_traffic_source_last_clickstruct to include additional ad tracking fields. - Added
publisherstruct to store publisher-level ad attributes.
π Fixedβ
- Updated default channel grouping logic to replace
NULLvalues with "Direct".
Changedβ
- Improves content categorization for session exits.
- Refactored configuration settings to be more user-friendly.
- Updated upstream tables to be compatible with newly introduced columns.
- Reclassified
newslettertraffic under "Email" in default channel grouping. - Ensured
cpcvalues labeled as "Other Advertising" rather than "Other". - Updated GA4 start date to
"2020-01-01"for improved backfilling capabilities. - Adjusted transformation dataset names to align with a more structured workflow.
π οΈ [Installer]β
2024-10-07
π Addedβ
- Added popup for checking package
π§° π€ [Core v1]β
2024-09-01
π Fixedβ
- Resolved an issue with session attribution logic by adjusting lookback window calculations.
- Corrected logic for determining the last non-direct traffic source.
- Fixed assertions referencing outdated schema names.
- Prevented session breakage due to incrementality by ensuring each session ID retains only the first occurrence within a day.
- Resolved an issue where
time.event_datedid not exist, replacing references withevent_date. - Ensured incremental deletion logic uses the correct column reference.
- Updated dataset references in SQL configurations to use
workflow_settings.yaml. - Ensured transformation and output datasets are correctly referenced.
- Resolved compilation issues across multiple files.
- Ensured validation checks pass successfully.
- Ensured
transaction_idis correctly included in the demo table. - Adjusted filtering logic to capture relevant purchase events.
- Addressed minor technical issues in diagnostic queries.
- Adjusted assertions schema references.
- Fixed issues with
sourceandmediumclassification to improve accuracy. - Ensured backward compatibility with older data structures.
- Resolved duplicate column issue in session data processing.
- Corrected
first_click_idsreference tolast_click_idsto align with intended logic. - Updated
generateClickIdTrafficSourceSQLandgenerateTrafficSourceSQLfunctions to properly handleNULLvalues in traffic source structs. - Ensured
has_sourceandis_direct_sessionfields are alwaysTRUEorFALSE, neverNULL. - Resolved
NULLvalues appearing inlast_non_direct_default_channel_grouping. - Resolved inconsistencies in how session source and medium were assigned.
- Resolved duplicate column issue related to
batch_page_idandbatch_ordering_id. - Corrected classification of
organic shoppingin channel grouping logic. - Ensured column name handling is consistent across custom parameters.
- Corrected
page_numbercalculation to usebatch_page_idordering. - Corrected incremental processing logic by ensuring
table_suffixis treated as a date where necessary. - Corrected typo in variable name (
EVENTS_TO_EXLUDEβEVENTS_TO_EXCLUDE).
π Addedβ
- Implemented detection of Measurement Protocol (MP) hits.
- Introduced
session_duration_sto track session lengths. - Added descriptions and comments for better maintainability.
- Introduced
ga4_int_events.sqlxto process and transform GA4 event data. - Introduced
ga4_int_sessions.sqlxfor session processing and modeling. - Included engagement time in event ID calculations to improve deduplication.
- Introduced logic to remove duplicate event IDs.
- Created
daily_sessions_report.sqlxfor aggregated session insights. - Enabled better clustering by adding
stream_idas a clustering key. - Assertion checks for event ID uniqueness, session duration validity, and table timeliness.
- Added logging table to store assertion results.
- Introduced
ga4_events.sqlx, a new events table for better tracking without intermediate dependencies. - Updated session processing to reference
ga4_eventsinstead ofga4_int_events. - Implemented a new diagnostics table to track GA4 data quality and anomalies.
- Added monitoring for self-referrals, duplicate transactions, and empty ecommerce item arrays.
- Introduced reporting on unique page and traffic source cardinality over time.
- Added timestamp conversion for local time zone handling.
- Ensured proper assignment of
cpcto paid traffic sources when necessary. - Added
click_idextraction to improve attribution tracking. - Added assertion checks for session duration validity and event ID uniqueness.
- Introduced new staging files for cleaner data ingestion.
- Added tagging for assertions to improve tracking and filtering.
- Introduced new columns:
is_active_user,batch_event_index,batch_page_id,batch_ordering_id. - Explicitly named all struct fields in
collected_traffic_sourceanditemsto avoid schema changes breaking incremental builds. - Introduced
session_traffic_source_last_clickin botheventsandsessionstables. - Added
last_non_direct_default_channel_groupingfor more comprehensive reporting. - Added
shopping_freeas a recognized medium in Organic Shopping classification. - Introduced
user_propertiessupport for event tracking. - Moved
source_categoriesconfiguration to a JSON file for easier maintenance. - Added a
package.jsonfile to manage dependencies and versioning. - Introduced hostname filters in
custom_config.js. - Added exclusion and inclusion lists for hostname filtering in event processing.
- Implemented
is_finallogic for incremental processing.
Changedβ
- Modified incremental logic to correctly process events without session identifiers.
- Refactored session attribution logic to ensure correct traffic source tracking.
- Updated incremental logic to improve event tracking accuracy.
- Adjusted incremental table configuration for performance improvements.
- Corrected logic for handling
gclid,wbraid, andgbraidin traffic source identification. - Replaced hardcoded values with variables in
custom_config.jsfor better flexibility. - Improved lookback window calculations for session attribution.
- Updated misattribution handling logic.
- Schema changes for
int_ga4_eventsto improve structure. - Updated event and session processing structure for better maintainability.
- Renamed
source_categories.source_categorytosc.source_categoryfor clarity and consistency. - Renamed models to follow consistent naming conventions.
- Updated reports to reference correct field names.
- Changed references from
stg_ga4_eventstoga4_eventsin session processing queries. - Updated function calls to correctly extract event parameters.
- Improved
click_idextraction logic. - Updated dataset suffix naming conventions to improve consistency.
- Adjusted assertion logic to use new dataset variables for better clarity.
- Improved first and last session attribution logic for traffic sources.
- Refactored logic to improve handling of click-based attribution.
- Defaulted missing values to
'Direct'for better reporting. - Ensured
source_categoriesare consistently applied usingfirst/lastlogic for accurate session attribution. - Adjusted event processing logic to ensure proper handling of UTM parameters across event batches.
- Replaced
event_dateandsession_datereferences withtable_suffixfor incremental processing. - Removed unnecessary
lower()function frompage_pathprocessing. - Renamed variables in configuration to better reflect their actual purpose.
- Cleaned up unused code blocks and added clarifying comments.
- Adjusted last non-direct logic to ensure accurate attribution.
- Updated classification for mobile push notifications to
Mobile Push Notifications. - Ensures older events are only updated when necessary, reducing unnecessary recomputation.
- Ensured
shopping_freemedium andShopping Free Listingscampaigns are classified properly. - Moved session logic into
int_ga4_sessionsfor better modularity. - Refactored
last_non_direct_traffic_sourcefields and addedsession_traffic_source_last_clickfields. - Updated delete statement to use
parse_date('%Y%m%d', table_suffix)instead ofdate(table_suffix). - Updated
generateParamsSQLto properly handleuser_propertiesin event processing. - Replaced temporary deduplication column with direct
row_number()approach. - Standardized
date_checkpointdeclaration across multiple queries. - Renamed session columns for clarity and consistency.
- Moved page parameters into a structured
pagefield. - Applied
LOWER()function to standardize traffic source data. - Converted
date(table_suffix)toparse_date('%Y%m%d', table_suffix)for consistency and improved performance.
Deprecatedβ
- Deprecated older logic for event and session processing, replacing it with a streamlined approach.
- Deprecated package.json in favor of workflow_settings.yaml
- Deleted
stg_ga4_sessionstable.
π οΈ [Installer]β
2024-09-01
π Addedβ
- Added link to doc
π οΈ [Installer]β
2024-08-01
π Addedβ
- Added check repository state during install process
π§° π€ [Core v0]β
2024-03-10