Skip to main content

Product Architecture

We organize files and directories into two distinct buckets: core and custom. When we release new versions of GA4Dataform, the installer only updates the contents of the core directories, leaving custom untouched. We designed this approach to prevent accidental overwrites of any customizations you've made after the initial installation.

File and Directory Management

Do not add new files to the core directory or modify existing core files directly. Any changes made to core may be lost during updates. Place your custom files outside the core directory.

You can safely add new directories anywhere in the repository outside of core. We recommend placing new directories within the custom directory structure. For example: To add a new reporting directory, create it as: custom/reporting

Dataform Directories

DirectoryDescription
definitionsContains all directories and files related to building models
core/01_sourcesContains declarations.js and (future) staging models
core/02_intermediateContains intermediate models
core/03_outputsContains output models that should be used for downstream queries
core/assertionsContains all the assertions that check the data quality of our model
core/extraContains any extra files that fulfill an individual purpose
definitions/customContains all the custom models that are not part of the core package
includesContains all JS files with reusable variables and functions that help manage the repository
includes/core/documentationContains the JSON files of table fields and descriptions of the output tables
includes/core/extraContains all extra files that fulfill an individual purpose
includes/customContains all JS files that can be used to customize your setup (config.js)

Model Descriptions

ModelDescription
int_ga4_sessionsGA4 intermediate sessions table that incrementally queries ga4_events table and creates session-level dimensions and metrics
ga4_eventsGA4 output events table that incrementally queries the raw GA4 export and applies partitioning, clustering, cleaning, and several fixes
ga4_sessionsGA4 output sessions table that adds last non-direct click attribution and can be used for further transformations or aggregations
demo_daily_sessions_reportDemo daily session aggregate table that can be connected Looker Studio for reporting
demo_diagnosticsDemo diagnostics table that checks for several issues in the past 64 days
source_categoriesMaterializes the core/extra/source_categories.json file into a table. Used for Default Channel Grouping

JavaScript Files

FileDescription
core/default_config.jsContains default configuration options that are used as a fallback if custom/config.js is not populated
core/helpers.jsContains all helper functions that are used to produce SQL code for different use cases
custom/config.jsContains all configuration options that can be used to customize how and what data gets queried. It will always take precedence over core/default_config.js
core/extra/source_categories.jsonContains which source category a domain should be treated as. Used for Default Channel Grouping

Dataform Repository Structure

definitions
├── core
│ ├── 01_sources
│ │ ├── declarations.js
│ ├── 02_intermediate
│ │ ├── int_ga4_sessions.sqlx
│ ├── 03_outputs
│ │ ├── ga4_events.sqlx
│ │ ├── ga4_sessions.sqlx
│ ├── assertions
│ │ ├── assertion_logs.sqlx
│ │ ├── assertions_event_id_uniqueness.sqlx
│ │ ├── assertions_session_duration_validity.sqlx
│ │ ├── assertions_session_id_uniqueness.sqlx
│ │ ├── assertions_sessions_validity.sqlx
│ │ ├── assertions_tables_timeliness.sqlx
│ │ ├── assertions_transaction_id_completeness.sqlx
│ │ ├── assertions_user_pseudo_id_completeness.sqlx
│ ├── extra
│ │ ├── ga4
│ │ │ ├── source_categories.js
├── custom
│ ├── demo_daily_sessions_report.sqlx
│ ├── demo_diagnostics.sqlx
includes
├── core
│ ├── documentation
│ │ ├── ga4_events.json
│ │ ├── ga4_sessions.json
│ ├── extra
│ │ ├── source_categories.json
│ ├── default_config.js
│ ├── helpers.js
├── custom
│ ├── config.js
├── .gitignore
├── package-lock.json
├── package.json
├── workflow_settings.yaml

BigQuery Output

GA4Dataform produces tables to 3 datasets in BigQuery.

superform_outputs_123456: used for storing the output tables that should be used for downstream queries superform_quality_123456: used for storing the quality control results (assertions) superform_transformations_123456: used for storing the intermediate and staging tables that are used during the build process

If you leave the default dataset names untouched, you will see the following structure:

Dataform-package (project)
├── superform_outputs_123456 (dataset)
│ ├── demo_daily_sessions_report (tables)
│ ├── demo_diagnostics
│ ├── ga4_events
│ ├── ga4_sessions
├── superform_quality_123456
│ ├── assertion_logs
│ ├── assertions_event_id_uniqueness
│ ├── assertions_session_duration_validity
│ ├── assertions_session_id_uniqueness
│ ├── assertions_sessions_validity
│ ├── assertions_tables_timeliness
│ ├── assertions_transaction_id_completeness
│ ├── assertions_user_pseudo_id_completeness
├── superform_transformations_123456
│ ├── int_ga4_sessions
│ ├── source_categories