Unreleased
💎 Premium
🚀 Added
- User stitching
- Intraday support tables
🛠️ Installer
Released
[v14] - 2025-02-05
🐛 Fixed
- Refactored
LOOKBACK_MILLIS
to be defined inline instead of a separate variable.
- Removed
LAST_NON_DIRECT_LOOKBACK_MILLIS
from default_config.js
, ensuring calculations use a single inline formula.
🛠️ Installer
🚀 Added
- Added support for more than 50 datasets
[v13] - 2025-01-28
Changed
- Updated
generateFilterTypeFromListSQL
function to properly handle NULL values.
- Now uses
coalesce(column, "")
to avoid NULL-related filtering issues.
[v13] - 2025-01-23
Changed
- Excluded "fresh" tables from exports in
ga4_events.sqlx
.
- Ensures that temporary fresh tables do not interfere with data processing.
🛠️ Installer
🚀 Added
- Added user menu with logout
[v10] - 2024-12-19
🐛 Fixed
- Improved logic for properties without "final" days, ensuring they now increment correctly in sessions, transactions, and events models.
[v10] - 2024-12-10
Changed
- Adjusted schema definitions for
ga4_transactions
to improve refund tracking.
- Updated documentation to reflect the schema modifications.
- Refined handling of refunds in transaction roll-ups.
[v10] - 2024-11-27
🚀 Added
- Initial commit introducing
ga4_transactions
model.
- Documentation formatting and structure for
ga4_transactions
columns.
- Supports incremental processing with nested items.
[v10] - 2024-11-25
Changed
- Refactored function names to improve readability.
- Added JSDoc comments for key helper functions.
[v10] - 2024-11-07
🚀 Added
- Expanded
ga4_sessions
with new columns for traffic source tracking.
- Improved
demo_daily_sessions_report
schema with platform and stream ID tracking.
Changed
- Default channel grouping logic to ensure "Direct" fallback for NULL values.
[v10] - 2024-10-31
🚀 Added
- Introduced
_run_timestamp
column in ga4_events
for better traceability.
- Updated documentation to include newly added columns in
ga4_events
.
- Additional columns to
ga4_sessions
and ga4_events
tables.
- Introduced
platform
in session struct for better tracking.
Changed
- Moved all event transformation logic to
outputs/ga4_events
.
- Simplified upstream intermediate models to reduce redundancy.
[v6] - 2024-10-24
Changed
- Streamlined data partitioning to improve performance.
- Introduced detailed column documentation for
ga4_events
and ga4_sessions
.
Removed
- Removed
stream_id
from clusterBy
fields.
🛠️ Installer
🚀 Added
[v6] - 2024-10-21
Changed
- Removed non-intraday columns from
session_traffic_source_last_click
struct.
[v5] - 2024-10-18
Changed
- Updated GA4 start date to
"2020-01-01"
for improved backfilling capabilities.
- Adjusted transformation dataset names to align with a more structured workflow.
[v5] - 2024-10-16
Changed
- Reclassified
newsletter
traffic under "Email" in default channel grouping.
- Ensured
cpc
values labeled as "Other Advertising" rather than "Other".
[v5] - 2024-10-10
🚀 Added
- Two new columns in the events table.
- Expanded
session_traffic_source_last_click
struct to include additional ad tracking fields.
- Added
publisher
struct to store publisher-level ad attributes.
[v5] - 2024-10-09
🚀 Added
- Introduced
exit_content_group
to the exit_page
struct.
Changed
- Improves content categorization for session exits.
[v5] - 2024-10-07
🐛 Fixed
- Updated default channel grouping logic to replace
NULL
values with "Direct".
Changed
- Refactored configuration settings to be more user-friendly.
- Updated upstream tables to be compatible with newly introduced columns.
🚀 Added
- Added all
click_ids
to last non-direct session attribution logic.
🛠️ Installer
🚀 Added
- Added popup for checking package
[v0] - 2024-09-01
🐛 Fixed
- Corrected classification of
organic shopping
in channel grouping logic.
- Ensured column name handling is consistent across custom parameters.
- Corrected
page_number
calculation to use batch_page_id
ordering.
- Corrected incremental processing logic by ensuring
table_suffix
is treated as a date where necessary.
- Corrected typo in variable name (
EVENTS_TO_EXLUDE
→ EVENTS_TO_EXCLUDE
).
🚀 Added
- Introduced
user_properties
support for event tracking.
- Moved
source_categories
configuration to a JSON file for easier maintenance.
- Added a
package.json
file to manage dependencies and versioning.
- Introduced hostname filters in
custom_config.js
.
- Added exclusion and inclusion lists for hostname filtering in event processing.
- Implemented
is_final
logic for incremental processing.
Changed
- Ensures older events are only updated when necessary, reducing unnecessary recomputation.
- Ensured
shopping_free
medium and Shopping Free Listings
campaigns are classified properly.
- Moved session logic into
int_ga4_sessions
for better modularity.
- Refactored
last_non_direct_traffic_source
fields and added session_traffic_source_last_click
fields.
- Updated delete statement to use
parse_date('%Y%m%d', table_suffix)
instead of date(table_suffix)
.
- Updated
generateParamsSQL
to properly handle user_properties
in event processing.
- Replaced temporary deduplication column with direct
row_number()
approach.
- Standardized
date_checkpoint
declaration across multiple queries.
- Renamed session columns for clarity and consistency.
- Moved page parameters into a structured
page
field.
- Applied
LOWER()
function to standardize traffic source data.
- Converted
date(table_suffix)
to parse_date('%Y%m%d', table_suffix)
for consistency and improved performance.
Removed
- Deleted
stg_ga4_sessions
table.
🛠️ Installer
🚀 Added
[v0] - 2024-08-01
🐛 Fixed
- Ensured backward compatibility with older data structures.
- Resolved duplicate column issue in session data processing.
- Corrected
first_click_ids
reference to last_click_ids
to align with intended logic.
- Updated
generateClickIdTrafficSourceSQL
and generateTrafficSourceSQL
functions to properly handle NULL
values in traffic source structs.
- Ensured
has_source
and is_direct_session
fields are always TRUE
or FALSE
, never NULL
.
- Resolved
NULL
values appearing in last_non_direct_default_channel_grouping
.
- Resolved inconsistencies in how session source and medium were assigned.
- Resolved duplicate column issue related to
batch_page_id
and batch_ordering_id
.
🚀 Added
- Introduced new columns:
is_active_user
, batch_event_index
, batch_page_id
, batch_ordering_id
.
- Explicitly named all struct fields in
collected_traffic_source
and items
to avoid schema changes breaking incremental builds.
- Introduced
session_traffic_source_last_click
in both events
and sessions
tables.
- Added
last_non_direct_default_channel_grouping
for more comprehensive reporting.
- Added
shopping_free
as a recognized medium in Organic Shopping classification.
Changed
- Improved first and last session attribution logic for traffic sources.
- Refactored logic to improve handling of click-based attribution.
- Defaulted missing values to
'Direct'
for better reporting.
- Ensured
source_categories
are consistently applied using first/last
logic for accurate session attribution.
- Adjusted event processing logic to ensure proper handling of UTM parameters across event batches.
- Replaced
event_date
and session_date
references with table_suffix
for incremental processing.
- Removed unnecessary
lower()
function from page_path
processing.
- Renamed variables in configuration to better reflect their actual purpose.
- Cleaned up unused code blocks and added clarifying comments.
- Adjusted last non-direct logic to ensure accurate attribution.
- Updated classification for mobile push notifications to
Mobile Push Notifications
.
🛠️ Installer
🚀 Added
- Added check repository state during install process
[v0] - 2024-07-01
🐛 Fixed
- Updated dataset references in SQL configurations to use
workflow_settings.yaml
.
- Ensured transformation and output datasets are correctly referenced.
- Resolved compilation issues across multiple files.
- Ensured validation checks pass successfully.
- Ensured
transaction_id
is correctly included in the demo table.
- Adjusted filtering logic to capture relevant purchase events.
- Addressed minor technical issues in diagnostic queries.
- Adjusted assertions schema references.
- Fixed issues with
source
and medium
classification to improve accuracy.
Changed
- Improved
click_id
extraction logic.
- Updated dataset suffix naming conventions to improve consistency.
- Adjusted assertion logic to use new dataset variables for better clarity.
[v0] - 2024-06-01
🐛 Fixed
- Fixed assertions referencing outdated schema names.
- Prevented session breakage due to incrementality by ensuring each session ID retains only the first occurrence within a day.
- Resolved an issue where
time.event_date
did not exist, replacing references with event_date
.
- Ensured incremental deletion logic uses the correct column reference.
🚀 Added
- Added
click_id
extraction to improve attribution tracking.
- Added assertion checks for session duration validity and event ID uniqueness.
- Introduced new staging files for cleaner data ingestion.
- Added tagging for assertions to improve tracking and filtering.
Changed
- Updated misattribution handling logic.
- Schema changes for
int_ga4_events
to improve structure.
- Updated event and session processing structure for better maintainability.
- Renamed
source_categories.source_category
to sc.source_category
for clarity and consistency.
- Renamed models to follow consistent naming conventions.
- Updated reports to reference correct field names.
- Changed references from
stg_ga4_events
to ga4_events
in session processing queries.
- Updated function calls to correctly extract event parameters.
[v0] - 2024-05-01
🐛 Fixed
- Resolved an issue with session attribution logic by adjusting lookback window calculations.
- Corrected logic for determining the last non-direct traffic source.
🚀 Added
- Implemented detection of Measurement Protocol (MP) hits.
- Introduced
session_duration_s
to track session lengths.
- Added descriptions and comments for better maintainability.
- Introduced
ga4_int_events.sqlx
to process and transform GA4 event data.
- Introduced
ga4_int_sessions.sqlx
for session processing and modeling.
- Included engagement time in event ID calculations to improve deduplication.
- Introduced logic to remove duplicate event IDs.
- Created
daily_sessions_report.sqlx
for aggregated session insights.
- Enabled better clustering by adding
stream_id
as a clustering key.
- Assertion checks for event ID uniqueness, session duration validity, and table timeliness.
- Added logging table to store assertion results.
- Introduced
ga4_events.sqlx
, a new events table for better tracking without intermediate dependencies.
- Updated session processing to reference
ga4_events
instead of ga4_int_events
.
- Implemented a new diagnostics table to track GA4 data quality and anomalies.
- Added monitoring for self-referrals, duplicate transactions, and empty ecommerce item arrays.
- Introduced reporting on unique page and traffic source cardinality over time.
- Added timestamp conversion for local time zone handling.
- Ensured proper assignment of
cpc
to paid traffic sources when necessary.
Changed
- Modified incremental logic to correctly process events without session identifiers.
- Refactored session attribution logic to ensure correct traffic source tracking.
- Updated incremental logic to improve event tracking accuracy.
- Adjusted incremental table configuration for performance improvements.
- Corrected logic for handling
gclid
, wbraid
, and gbraid
in traffic source identification.
- Replaced hardcoded values with variables in
custom_config.js
for better flexibility.
- Improved lookback window calculations for session attribution.
Deprecated
- Deprecated older logic for event and session processing, replacing it with a streamlined approach.
- Deprecated package.json in favor of workflow_settings.yaml
[v0] - 2024-03-10
🌟 Init by Artem Korneev