Skip to main content

Changelog

Unreleased​

πŸ’Ž Premium​

  • Anomaly detection module
  • BigQuery utilities module
  • Google Search Console module
  • Adding PII detection to parameter detection module

🧰 Core​

  • nothing planned

🀝 Community​

  • nothing planned

πŸ› οΈ Installer​

Strict Act As mode update​

We have updated the installer to align with the Strict Act-As design pattern for Dataform execution, improving security, isolation of responsibilities, and auditability. Previously, all permissions were assigned to the Dataform Default Service Account, combining orchestration and BigQuery data access. With this update, permissions are now explicitly split between two service accounts.

  • Dataform Default Service Account
    • Dataform Service Agent
    • Secret Manager Secret Accessor
    • Service Account Token Creator
    • Service Account User

This service account is now limited to orchestration, scheduling, metadata access, and impersonation only.
It no longer has direct access to BigQuery data.

  • Custom Service Account
    • BigQuery Data Editor
    • BigQuery Data Viewer
    • BigQuery Job User

This service account is used exclusively for query execution in BigQuery and is accessed only via impersonation during Dataform runs.

This change brings the installer in line with Google Cloud security best practices and Dataform Strict Act-As mode requirements.


Released​

πŸ’Ž [Premium v2.0.5] - 2025-12-18​

Schema Changes​

Updating to this version will require some tables to be fully rebuild because of a addition of new columns

  • int_ga4_sessions and ga4_sessions with new first_user_id column
  • int_ga4_transactions and ga4_transactions with new property_id column
  • ga4_events with new records in event_params

πŸš€ Added​

  • unique_search term, gad_campaingid, gad_source as standard event params
  • gclsrc as param to check to classify as Google
  • first_user_id in int_ga4_sessions and ga4_sessions
  • property_id as column in int_ga4_transactions and ga4_transactions
  • is_multiday_session column : Flag indicating if the session has events across more than one day

πŸ› Fixed​

  • Add missing column documentation fields

Changed​

  • Standardized SQL style
  • Renaming modules and tables
  • Changed directory structure and main config file moved

🧰 🀝 [Core v1.19.0] - 2025-12-18​

Schema Changes​

Updating to this version will require some tables to be fully rebuild because of a addition of new columns

  • int_ga4_sessions and ga4_sessions with new first_user_id column
  • int_ga4_transactions and ga4_transactions with new property_id column
  • ga4_events with new records in event_params

πŸš€ Added​

  • unique_search term, gad_campaingid, gad_source as standard event params
  • gclsrc as param to check to classify as Google
  • first_user_id in int_ga4_sessions and ga4_sessions
  • property_id as column in int_ga4_transactions and ga4_transactions
  • is_multiday_session column : Flag indicating if the session has events across more than one day

πŸ› Fixed​

  • Add missing column documentation fields

Changed​

  • Standardized SQL style
  • Renaming modules and tables
  • Changed directory structure and main config file moved

πŸ’Ž [Premium v2.0.4]​

2025-11-11

πŸš€ Added​

  • Custom Channel Groupings

πŸ’Ž [Premium v2.0.3]​

2025-11-06

πŸš€ Added​

  • Multi-property support
  • Google Ads module

πŸ’Ž [Premium v2.0.2]​

2025-10-23

πŸš€ Added​

  • Extend custom session-level parameter filtering options

πŸ’Ž [Premium v2.0.1]​

2025-10-10

πŸ› Fixed​

  • Superform Labs internal bugfix iteration

πŸ’Ž [Premium v2.0.0]​

2025-09-22

πŸš€ Added​

  • User stitching
  • Support for using 'intraday' and 'fresh' tables
  • BigQuery cost monitoring + dashboard
  • Session totals in ga4_sessions
  • Custom parameter aggregation at session level
  • Custom parameter detection + dashboard
  • Items reporting table
  • Conversion rate reporting table

🧰 🀝 [Core v1.18.0]​

2025-08-27

πŸš€ Added​

  • AI referrers added to source_categories.json for improved default channel grouping logic.
  • Default channel grouping in int_ga4_sessionsnow matches the source categories with LIKE operator for enhanced pattern matching.

πŸ› Fixed​

  • Added default_config.js warning text + reordered file contents for better readability.
  • Updated is_measurement_protocol_hit since MP supports device parameters since 2025-06-11.
  • Added app_info.id check to fix assertions_sessions_validity resulting in false positive for app events.
  • Added events_fresh_* filter to assertions_tables_timeliness to prevent false positives.

🧰 🀝 [Core v1.17.0]​

2025-05-13

πŸš€ Added​

  • Labeling Tables and Jobs. 5 labels categories are availble: Operation, Brand, Licence type, Package version, Tool.

Changed​

  • Assertions set to false by default.

πŸ› οΈ [Installer]​

2025-05-13

πŸš€ Added​

  • Installer is now displaying Licence type, installer and models version.
  • Backfilling job run from installer is now tracked with specific label operation: backfilling.

🧰 🀝 [Core v1.16.0]​

2025-04-09

πŸš€ Added​

  • Bluesky referrals are now classified as Organic Social in the default channel group.
  • β€œOrganic AI” added as new default channel group. Most common LLM referrers are added to this group. This feature can be enabled/disabled using the config option EXTRA_CHANNEL_GROUPS.

🧰 🀝 [Core v1.15.0]​

2025-03-11

πŸ› Fixed​

  • Replaced --- with ; while ensuring no duplicate semicolons appear, as the Dataform CLI was unable to handle them correctly.
  • Proper handling of sessions spanning midnight in incremental runs. Previously, sessions crossing midnight could result in duplicate session_ids. Multi-day sessions are now correctly appended to the intermediate table and properly deduplicated in the output.

🧰 🀝 [Core v1.14.0]​

2025-02-05

πŸ› Fixed​

  • Refactored LOOKBACK_MILLIS to be defined inline instead of a separate variable.
  • Removed LAST_NON_DIRECT_LOOKBACK_MILLIS from default_config.js, ensuring calculations use a single inline formula.

πŸ› οΈ [Installer]​

2025-02-05

πŸš€ Added​

  • Added support for more than 50 datasets

🧰 🀝 [Core v1.13.0]​

2025-01-28

Changed​

  • Updated generateFilterTypeFromListSQL function to properly handle NULL values.
  • Now uses coalesce(column, "") to avoid NULL-related filtering issues.

🧰 🀝 [Core v1.12.0]​

2025-01-23

Changed​
  • Excluded "fresh" tables from exports in ga4_events.sqlx.
  • Ensures that temporary fresh tables do not interfere with data processing.

πŸ› οΈ [Installer]​

2025-01-23

πŸš€ Added​

  • Added user menu with logout

🧰 🀝 [Core v1.10.0]​

2024-12-19

πŸš€ Added​

  • Initial commit introducing ga4_transactions model.
  • Documentation formatting and structure for ga4_transactions columns.
  • Supports incremental processing with nested items.
  • Expanded ga4_sessions with new columns for traffic source tracking.
  • Improved demo_daily_sessions_report schema with platform and stream ID tracking.
  • Introduced _run_timestamp column in ga4_events for better traceability.
  • Updated documentation to include newly added columns in ga4_events.
  • Additional columns to ga4_sessions and ga4_events tables.
  • Introduced platform in session struct for better tracking.

πŸ› Fixed​

  • Improved logic for properties without "final" days, ensuring they now increment correctly in sessions, transactions, and events models.

Changed​

  • Adjusted schema definitions for ga4_transactions to improve refund tracking.
  • Updated documentation to reflect the schema modifications.
  • Refined handling of refunds in transaction roll-ups.
  • Refactored function names to improve readability.
  • Added JSDoc comments for key helper functions.
  • Default channel grouping logic to ensure "Direct" fallback for NULL values.
  • Moved all event transformation logic to outputs/ga4_events.
  • Simplified upstream intermediate models to reduce redundancy.

🧰 🀝 [Core v1.6.0]​

2024-10-24

Changed​

  • Streamlined data partitioning to improve performance.
  • Introduced detailed column documentation for ga4_events and ga4_sessions.

Removed​

  • Removed stream_id from clusterBy fields.

Changed​

  • Removed non-intraday columns from session_traffic_source_last_click struct.

πŸ› οΈ [Installer]​

2024-10-24

πŸš€ Added​

  • Installer version

🧰 🀝 [Core v1.5.0]​

2024-10-16

πŸš€ Added​

  • Introduced exit_content_group to the exit_page struct.
  • Added all click_ids to last non-direct session attribution logic.
  • Two new columns in the events table.
  • Expanded session_traffic_source_last_click struct to include additional ad tracking fields.
  • Added publisher struct to store publisher-level ad attributes.

πŸ› Fixed​

  • Updated default channel grouping logic to replace NULL values with "Direct".

Changed​

  • Improves content categorization for session exits.
  • Refactored configuration settings to be more user-friendly.
  • Updated upstream tables to be compatible with newly introduced columns.
  • Reclassified newsletter traffic under "Email" in default channel grouping.
  • Ensured cpc values labeled as "Other Advertising" rather than "Other".
  • Updated GA4 start date to "2020-01-01" for improved backfilling capabilities.
  • Adjusted transformation dataset names to align with a more structured workflow.

πŸ› οΈ [Installer]​

2024-10-07

πŸš€ Added​

  • Added popup for checking package

🧰 🀝 [Core v1]​

2024-09-01

πŸ› Fixed​

  • Resolved an issue with session attribution logic by adjusting lookback window calculations.
  • Corrected logic for determining the last non-direct traffic source.
  • Fixed assertions referencing outdated schema names.
  • Prevented session breakage due to incrementality by ensuring each session ID retains only the first occurrence within a day.
  • Resolved an issue where time.event_date did not exist, replacing references with event_date.
  • Ensured incremental deletion logic uses the correct column reference.
  • Updated dataset references in SQL configurations to use workflow_settings.yaml.
  • Ensured transformation and output datasets are correctly referenced.
  • Resolved compilation issues across multiple files.
  • Ensured validation checks pass successfully.
  • Ensured transaction_id is correctly included in the demo table.
  • Adjusted filtering logic to capture relevant purchase events.
  • Addressed minor technical issues in diagnostic queries.
  • Adjusted assertions schema references.
  • Fixed issues with source and medium classification to improve accuracy.
  • Ensured backward compatibility with older data structures.
  • Resolved duplicate column issue in session data processing.
  • Corrected first_click_ids reference to last_click_ids to align with intended logic.
  • Updated generateClickIdTrafficSourceSQL and generateTrafficSourceSQL functions to properly handle NULL values in traffic source structs.
  • Ensured has_source and is_direct_session fields are always TRUE or FALSE, never NULL.
  • Resolved NULL values appearing in last_non_direct_default_channel_grouping.
  • Resolved inconsistencies in how session source and medium were assigned.
  • Resolved duplicate column issue related to batch_page_id and batch_ordering_id.
  • Corrected classification of organic shopping in channel grouping logic.
  • Ensured column name handling is consistent across custom parameters.
  • Corrected page_number calculation to use batch_page_id ordering.
  • Corrected incremental processing logic by ensuring table_suffix is treated as a date where necessary.
  • Corrected typo in variable name (EVENTS_TO_EXLUDE β†’ EVENTS_TO_EXCLUDE).

πŸš€ Added​

  • Implemented detection of Measurement Protocol (MP) hits.
  • Introduced session_duration_s to track session lengths.
  • Added descriptions and comments for better maintainability.
  • Introduced ga4_int_events.sqlx to process and transform GA4 event data.
  • Introduced ga4_int_sessions.sqlx for session processing and modeling.
  • Included engagement time in event ID calculations to improve deduplication.
  • Introduced logic to remove duplicate event IDs.
  • Created daily_sessions_report.sqlx for aggregated session insights.
  • Enabled better clustering by adding stream_id as a clustering key.
  • Assertion checks for event ID uniqueness, session duration validity, and table timeliness.
  • Added logging table to store assertion results.
  • Introduced ga4_events.sqlx, a new events table for better tracking without intermediate dependencies.
  • Updated session processing to reference ga4_events instead of ga4_int_events.
  • Implemented a new diagnostics table to track GA4 data quality and anomalies.
  • Added monitoring for self-referrals, duplicate transactions, and empty ecommerce item arrays.
  • Introduced reporting on unique page and traffic source cardinality over time.
  • Added timestamp conversion for local time zone handling.
  • Ensured proper assignment of cpc to paid traffic sources when necessary.
  • Added click_id extraction to improve attribution tracking.
  • Added assertion checks for session duration validity and event ID uniqueness.
  • Introduced new staging files for cleaner data ingestion.
  • Added tagging for assertions to improve tracking and filtering.
  • Introduced new columns: is_active_user, batch_event_index, batch_page_id, batch_ordering_id.
  • Explicitly named all struct fields in collected_traffic_source and items to avoid schema changes breaking incremental builds.
  • Introduced session_traffic_source_last_click in both events and sessions tables.
  • Added last_non_direct_default_channel_grouping for more comprehensive reporting.
  • Added shopping_free as a recognized medium in Organic Shopping classification.
  • Introduced user_properties support for event tracking.
  • Moved source_categories configuration to a JSON file for easier maintenance.
  • Added a package.json file to manage dependencies and versioning.
  • Introduced hostname filters in custom_config.js.
  • Added exclusion and inclusion lists for hostname filtering in event processing.
  • Implemented is_final logic for incremental processing.

Changed​

  • Modified incremental logic to correctly process events without session identifiers.
  • Refactored session attribution logic to ensure correct traffic source tracking.
  • Updated incremental logic to improve event tracking accuracy.
  • Adjusted incremental table configuration for performance improvements.
  • Corrected logic for handling gclid, wbraid, and gbraid in traffic source identification.
  • Replaced hardcoded values with variables in custom_config.js for better flexibility.
  • Improved lookback window calculations for session attribution.
  • Updated misattribution handling logic.
  • Schema changes for int_ga4_events to improve structure.
  • Updated event and session processing structure for better maintainability.
  • Renamed source_categories.source_category to sc.source_category for clarity and consistency.
  • Renamed models to follow consistent naming conventions.
  • Updated reports to reference correct field names.
  • Changed references from stg_ga4_events to ga4_events in session processing queries.
  • Updated function calls to correctly extract event parameters.
  • Improved click_id extraction logic.
  • Updated dataset suffix naming conventions to improve consistency.
  • Adjusted assertion logic to use new dataset variables for better clarity.
  • Improved first and last session attribution logic for traffic sources.
  • Refactored logic to improve handling of click-based attribution.
  • Defaulted missing values to 'Direct' for better reporting.
  • Ensured source_categories are consistently applied using first/last logic for accurate session attribution.
  • Adjusted event processing logic to ensure proper handling of UTM parameters across event batches.
  • Replaced event_date and session_date references with table_suffix for incremental processing.
  • Removed unnecessary lower() function from page_path processing.
  • Renamed variables in configuration to better reflect their actual purpose.
  • Cleaned up unused code blocks and added clarifying comments.
  • Adjusted last non-direct logic to ensure accurate attribution.
  • Updated classification for mobile push notifications to Mobile Push Notifications.
  • Ensures older events are only updated when necessary, reducing unnecessary recomputation.
  • Ensured shopping_free medium and Shopping Free Listings campaigns are classified properly.
  • Moved session logic into int_ga4_sessions for better modularity.
  • Refactored last_non_direct_traffic_source fields and added session_traffic_source_last_click fields.
  • Updated delete statement to use parse_date('%Y%m%d', table_suffix) instead of date(table_suffix).
  • Updated generateParamsSQL to properly handle user_properties in event processing.
  • Replaced temporary deduplication column with direct row_number() approach.
  • Standardized date_checkpoint declaration across multiple queries.
  • Renamed session columns for clarity and consistency.
  • Moved page parameters into a structured page field.
  • Applied LOWER() function to standardize traffic source data.
  • Converted date(table_suffix) to parse_date('%Y%m%d', table_suffix) for consistency and improved performance.

Deprecated​

  • Deprecated older logic for event and session processing, replacing it with a streamlined approach.
  • Deprecated package.json in favor of workflow_settings.yaml
  • Deleted stg_ga4_sessions table.

πŸ› οΈ [Installer]​

2024-09-01

πŸš€ Added​

  • Added link to doc

πŸ› οΈ [Installer]​

2024-08-01

πŸš€ Added​

  • Added check repository state during install process

🧰 🀝 [Core v0]​

2024-03-10

🌟 Init by Artem Korneev​