Skip to main content

Product Architecture

We organize files and directories into two distinct buckets: core and custom. When we release new versions of GA4Dataform, the installer only updates the contents of the core directories, leaving custom untouched. We designed this approach to prevent accidental overwrites of any customizations you've made after the initial installation.

File and Directory Management

Do not add new files to the core directory or modify existing core files directly. Any changes made to core may be lost during updates. Place your custom files outside the core directory.

You can safely add new directories anywhere in the repository outside of core. We recommend placing new directories within the custom directory structure. For example: To add a new reporting directory, create it as: custom/reporting

Dataform Directories​

DirectoryDescription
definitionsContains all directories and files related to building models
core/01_sourcesContains declarations.js and (future) staging models
core/02_intermediateContains intermediate models
core/03_outputsContains output models that should be used for downstream queries
core/assertionsContains all the assertions that check the data quality of our model
core/extraContains any extra files that fulfill an individual purpose
core/modules πŸ’ŽContains modules (currently only Premium)
core/modules/bq_cost_monitoringContains BigQuery cost analysis models with intermediate processing and comprehensive reporting
core/modules/cvr_per_eventContains conversion rate analysis models for event-level performance metrics
core/modules/ga4_parameter_detectionContains automated parameter discovery models with configuration suggestions
core/modules/items_funnel_reportContains e-commerce funnel analysis models with session-level item tracking
core/utilityContains utility operations for maintenance tasks like label management
definitions/customContains all the custom models that are not part of the core package
includesContains all JS files with reusable variables and functions that help manage the repository
includes/core/documentationContains the JSON files of table fields and descriptions of the output tables
includes/core/extraContains all extra files that fulfill an individual purpose
includes/core/modules πŸ’ŽContains configuration files and helpers for premium module functionality
includes/customContains all JS files that can be used to customize your setup (config.js)
includes/custom/modulesContains customizable configuration files for premium modules (JSON/YAML formats)

Model Descriptions​

ModelDescription
int_ga4_sessionsGA4 intermediate sessions table that incrementally queries ga4_events table and creates session-level dimensions and metrics
int_ga4_transactionsGA4 intermediate transactions table that incrementally queries ga4_events table and prepares the needed dimensions and metrics
int_info_schema_jobs πŸ’ŽIntermediate table processing BigQuery INFORMATION_SCHEMA.JOBS for cost monitoring and query analysis
int_info_schema_table_options πŸ’ŽIntermediate table processing table metadata and options for cost tracking
int_info_schema_table_storage_usage_timeline πŸ’ŽIntermediate table tracking storage usage over time for cost optimization
int_tables_labels πŸ’ŽIntermediate table managing table labels for cost allocation and organization
ga4_eventsGA4 output events table that incrementally queries the raw GA4 export and applies partitioning, clustering, cleaning, and several fixes
ga4_sessionsGA4 output sessions table that adds last non-direct click attribution and can be used for further transformations or aggregations
ga4_transactionsGA4 output transactions table that holds nested items, transaction totals and running totals from purchase and refund events
bq_cost_overview πŸ’ŽBigQuery cost overview table providing comprehensive cost analysis across projects and datasets
bq_cost_processing πŸ’ŽBigQuery processing cost table tracking query execution costs and resource usage
bq_cost_reporting πŸ’ŽBigQuery cost reporting table with aggregated cost metrics for dashboards
bq_cost_storage πŸ’ŽBigQuery storage cost table monitoring data storage costs and trends
info_schema_jobs_queries πŸ’ŽQuery-level job information table for detailed query performance and cost analysis
info_schema_jobs_scripts πŸ’ŽScript-level job information table tracking batch operations and scheduled queries
report_cvr_per_event πŸ’ŽConversion rate analysis table providing event-level performance metrics and optimization insights
ga4_parameter_detection_agg πŸ’ŽAggregated parameter detection table that aggregates information from the ga4_parameter_detection_daily table
ga4_parameter_detection_config_suggestions πŸ’ŽConfiguration suggestions view that returns a config.js compatible JavaScript object with discovered parameters
ga4_parameter_detection_daily πŸ’ŽDaily parameter detection table that queries from the raw GA4 export to monitor parameter usage patterns and trends
ga4_items_sessions πŸ’ŽItems-level session table aggregating product data per session for e-commerce analysis
report_items_funnel πŸ’ŽE-commerce funnel report table tracking product metrics from view to purchase
update_labelsUtility table managing label updates and maintenance operations across datasets
demo_daily_sessions_reportDemo daily session aggregate table that can be connected to Looker Studio for reporting
demo_diagnosticsDemo diagnostics table that checks for several issues in the past 64 days
source_categoriesMaterializes the core/extra/source_categories.json file into a table. Used for Default Channel Grouping

JavaScript Files​

FileDescription
core/default_config.jsContains default configuration options that are used as a fallback if custom/config.js is not populated
core/helpers.jsContains all helper functions that are used to produce SQL code for different use cases
core/documentation/helpers.jsContains helper functions for generating and maintaining table documentation
custom/config.jsContains all configuration options that can be used to customize how and what data gets queried. It will always take precedence over core/default_config.js
core/extra/source_categories.jsonContains which source category a domain should be treated as. Used for Default Channel Grouping

Dataform Repository Structure​

definitions/
β”œβ”€β”€ core/
β”‚ β”œβ”€β”€ 01_sources/
β”‚ β”‚ └── declarations.js
β”‚ β”œβ”€β”€ 02_intermediate/
β”‚ β”‚ β”œβ”€β”€ int_ga4_sessions.sqlx
β”‚ β”‚ └── int_ga4_transactions.sqlx
β”‚ β”œβ”€β”€ 03_outputs/
β”‚ β”‚ β”œβ”€β”€ ga4_events.sqlx
β”‚ β”‚ β”œβ”€β”€ ga4_sessions.sqlx
β”‚ β”‚ └── ga4_transactions.sqlx
β”‚ β”œβ”€β”€ assertions/
β”‚ β”‚ β”œβ”€β”€ assertion_logs.sqlx
β”‚ β”‚ β”œβ”€β”€ assertions_event_id_uniqueness.sqlx
β”‚ β”‚ β”œβ”€β”€ assertions_session_duration_validity.sqlx
β”‚ β”‚ β”œβ”€β”€ assertions_session_id_uniqueness.sqlx
β”‚ β”‚ β”œβ”€β”€ assertions_sessions_validity.sqlx
β”‚ β”‚ β”œβ”€β”€ assertions_tables_timeliness.sqlx
β”‚ β”‚ β”œβ”€β”€ assertions_transaction_id_completeness.sqlx
β”‚ β”‚ └── assertions_user_pseudo_id_completeness.sqlx
β”‚ β”œβ”€β”€ extra/
β”‚ β”‚ └── ga4/
β”‚ β”‚ └── source_categories.js
β”‚ β”œβ”€β”€ modules/ πŸ’Ž
β”‚ β”‚ β”œβ”€β”€ bq_cost_monitoring/
β”‚ β”‚ β”‚ β”œβ”€β”€ 02_intermediate/
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ int_info_schema_jobs.sqlx
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ int_info_schema_table_options.sqlx
β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ int_info_schema_table_storage_usage_timeline.sqlx
β”‚ β”‚ β”‚ β”‚ └── int_tables_labels.sqlx
β”‚ β”‚ β”‚ └── 03_outputs/
β”‚ β”‚ β”‚ β”œβ”€β”€ bq_cost_overview.sqlx
β”‚ β”‚ β”‚ β”œβ”€β”€ bq_cost_processing.sqlx
β”‚ β”‚ β”‚ β”œβ”€β”€ bq_cost_reporting.sqlx
β”‚ β”‚ β”‚ β”œβ”€β”€ bq_cost_storage.sqlx
β”‚ β”‚ β”‚ β”œβ”€β”€ info_schema_jobs_queries.sqlx
β”‚ β”‚ β”‚ └── info_schema_jobs_scripts.sqlx
β”‚ β”‚ β”œβ”€β”€ cvr_per_event/
β”‚ β”‚ β”‚ └── report_cvr_per_event.sqlx
β”‚ β”‚ β”œβ”€β”€ ga4_parameter_detection/
β”‚ β”‚ β”‚ β”œβ”€β”€ ga4_parameter_detection_agg.sqlx
β”‚ β”‚ β”‚ β”œβ”€β”€ ga4_parameter_detection_config_suggestions.sqlx
β”‚ β”‚ β”‚ └── ga4_parameter_detection_daily.sqlx
β”‚ β”‚ └── items_funnel_report/
β”‚ β”‚ β”œβ”€β”€ ga4_items_sessions.sqlx
β”‚ β”‚ └── report_items_funnel.sqlx
β”‚ └──utility/
β”‚ └── update_labels.sqlx
└── custom/
β”œβ”€β”€ demo_daily_sessions_report.sqlx
└── demo_diagnostics.sqlx

includes/
β”œβ”€β”€ core/
β”‚ β”œβ”€β”€ documentation/
β”‚ β”‚ β”œβ”€β”€ ga4_events.json
β”‚ β”‚ β”œβ”€β”€ ga4_sessions.json
β”‚ β”‚ β”œβ”€β”€ ga4_transactions.json
β”‚ β”‚ β”œβ”€β”€ helpers.js
β”‚ β”‚ └── modules/ πŸ’Ž
β”‚ β”‚ β”œβ”€β”€ bq_cost_monitoring/
β”‚ β”‚ β”‚ β”œβ”€β”€ bq_cost_overview.json
β”‚ β”‚ β”‚ β”œβ”€β”€ bq_cost_processing.json
β”‚ β”‚ β”‚ β”œβ”€β”€ bq_cost_reporting.json
β”‚ β”‚ β”‚ β”œβ”€β”€ bq_cost_storage.json
β”‚ β”‚ β”‚ β”œβ”€β”€ info_schema_jobs_queries.json
β”‚ β”‚ β”‚ └── info_schema_jobs_scripts.json
β”‚ β”‚ β”œβ”€β”€ cvr_per_event/
β”‚ β”‚ β”‚ └── report_cvr_per_event.json
β”‚ β”‚ β”œβ”€β”€ ga4_parameter_detection/
β”‚ β”‚ β”‚ β”œβ”€β”€ ga4_parameter_detection_agg.json
β”‚ β”‚ β”‚ └── ga4_parameter_detection_daily.json
β”‚ β”‚ └── items_funnel_report/
β”‚ β”‚ β”œβ”€β”€ ga4_items_sessions.json
β”‚ β”‚ └── report_items_funnel.json
β”‚ β”œβ”€β”€ extra/
β”‚ β”‚ └── source_categories.json
β”‚ β”œβ”€β”€ default_config.js
β”‚ └── helpers.js
β”œβ”€β”€ custom/
β”‚ β”œβ”€β”€ modules/ πŸ’Ž
β”‚ β”‚ β”œβ”€β”€ bq_cost_monitoring/
β”‚ β”‚ β”‚ └── config.json
β”‚ β”‚ β”œβ”€β”€ cvr_per_event/
β”‚ β”‚ β”‚ └── config.json
β”‚ β”‚ β”œβ”€β”€ ga4_parameter_detection/
β”‚ β”‚ β”‚ └── config.yaml
β”‚ β”‚ └── items_funnel_report/
β”‚ β”‚ └── config.json
β”‚ └── config.js
β”œβ”€β”€ .gitignore
β”œβ”€β”€ package-lock.json
β”œβ”€β”€ package.json
└── workflow_settings.yaml

BigQuery Output​

GA4Dataform produces tables to 3 datasets in BigQuery.

superform_outputs_123456: used for storing the output tables that should be used for downstream queries superform_quality_123456: used for storing the quality control results (assertions) superform_transformations_123456: used for storing the intermediate and staging tables that are used during the build process

If you leave the default dataset names untouched and enable all modules, you will see the following structure:

GCP project
β”œβ”€β”€ superform_outputs_123456 (dataset)
β”‚ β”œβ”€β”€ demo_daily_sessions_report (tables)
β”‚ β”œβ”€β”€ demo_diagnostics
β”‚ β”œβ”€β”€ ga4_events
β”‚ β”œβ”€β”€ ga4_sessions
β”‚ β”œβ”€β”€ ga4_transactions
β”‚ β”œβ”€β”€ bq_cost_overview πŸ’Ž
β”‚ β”œβ”€β”€ bq_cost_processing πŸ’Ž
β”‚ β”œβ”€β”€ bq_cost_reporting πŸ’Ž
β”‚ β”œβ”€β”€ bq_cost_storage πŸ’Ž
β”‚ β”œβ”€β”€ info_schema_jobs_queries πŸ’Ž
β”‚ β”œβ”€β”€ info_schema_jobs_scripts πŸ’Ž
β”‚ β”œβ”€β”€ report_cvr_per_event πŸ’Ž
β”‚ β”œβ”€β”€ ga4_parameter_detection_agg πŸ’Ž
β”‚ β”œβ”€β”€ ga4_parameter_detection_config_suggestions πŸ’Ž
β”‚ β”œβ”€β”€ ga4_parameter_detection_daily πŸ’Ž
β”‚ β”œβ”€β”€ ga4_items_sessions πŸ’Ž
β”‚ └── report_items_funnel πŸ’Ž
β”‚
β”œβ”€β”€ superform_quality_123456
β”‚ β”œβ”€β”€ assertion_logs
β”‚ β”œβ”€β”€ assertions_event_id_uniqueness
β”‚ β”œβ”€β”€ assertions_session_duration_validity
β”‚ β”œβ”€β”€ assertions_session_id_uniqueness
β”‚ β”œβ”€β”€ assertions_sessions_validity
β”‚ β”œβ”€β”€ assertions_tables_timeliness
β”‚ β”œβ”€β”€ assertions_transaction_id_completeness
β”‚ └── assertions_user_pseudo_id_completeness
β”‚
└── superform_transformations_123456
β”œβ”€β”€ int_ga4_sessions
β”œβ”€β”€ int_ga4_transactions
β”œβ”€β”€ source_categories
β”œβ”€β”€ int_info_schema_jobs πŸ’Ž
β”œβ”€β”€ int_info_schema_table_options πŸ’Ž
β”œβ”€β”€ int_info_schema_table_storage_usage_timeline πŸ’Ž
└── int_tables_labels πŸ’Ž