Custom Configuration
The configuration system is arguably the most important element built into GA4Dataform. With a proper configuration, you can manage variables, modularize features, and control your data transformation pipeline without modifying core SQLX files.
Overviewβ
Think of the configuration as a control panel of JavaScript variables that determine what the final SQL output will look like. You can customize elements such as:
- Start date for data processing
- User properties to extract
- Event and item parameters to unnest
- Events to filter out
- URL parameters to capture
- User stitching π
- Session totals π
- Session-level custom parameters π
- Support for
fresh
andintraday
tables π - ... and many more!
This approach enables distribution of the same Dataform repository without hardcoded variables, making your implementation portable and maintainable.
Configuration Architectureβ
GA4Dataform uses a dual-configuration approach with two separate files:
Core Configuration (includes/core/default_config.js
)β
- Contains default settings for standard event parameters and click IDs
- Provides fallback variables to prevent compilation errors
- Subject to updates when new features are introduced
- Should not be modified by users
Custom Configuration (includes/custom/config.js
)β
- Your customization layer
- Takes precedence over default configuration
- Safe from updates and overwrites
- Where all your specific settings should be defined
Configuration Mergingβ
The two configuration objects are merged using a JavaScript spread operator that prioritizes your custom settings:
const getConfig = () => {
const { coreConfig } = require("./default_config");
const { customConfig } = require("../custom/config");
return { ...coreConfig, ...customConfig };
};
Your values in config.js
will always take precedence over the defaults, ensuring your customizations are preserved during updates.
If you are interested in seeing how these can be implemented after the install, take a look at the Post-Installation Guide!
Click on each card below to see more details!
GA4 Start Date
GA4_START_DATE: "2020-01-01"
Description: Defines the starting date for the GA4 data that the model should process.
Customization: Change this date to reflect the earliest data you want to include in your reports. For example, if you started collecting GA4 data on a different date, modify this to match that date.
Use Fresh Events π
USE_FRESH_EVENTS: true
Description: If you set this to true, ga4_events
will query from fresh
table shards if they are available. If events_*
table also exists for the same date, that will be used. Priority for same date shards: 1. regular events_
2. fresh_
(if enabled) 3. intraday_
(if enabled)
Use Intraday Events π
USE_INTRADAY_EVENTS: false
Description: If you set this to true, ga4_events
will query from intraday
table shards if they are available. Priority for same date shards: 1. regular events_
2. fresh_
(if enabled) 3. intraday_
(if enabled)
Custom Event Parameters
CUSTOM_EVENT_PARAMS_ARRAY: []
Description: This allows you to specify custom event parameters that are not part of the standard GA4 data. The custom parameters will be added to the event_params_custom
column in the ga4_events
table.
Customization: Add parameters using the format { name: "paramname", type: "TYPE", renameTo: "outputcolumnname", description: "This parameter means XYZ" }
.
Field | Description |
---|---|
name | Required. The original event parameter name to extract |
type | Required. Data type for extraction and storage. int : Extracts from int_value, stores as INT; string : Extracts from all values, stores as STRING; decimal : Extracts from all numerical values, stores as FLOAT |
renameTo | Optional. New name for the output column. If not specified, original parameter name is used |
cleaningMethod | Optional. Transformation method to apply: lowerSQL : Converts string outputs to lowercase |
description | Optional. Add a description for a column that will show up in the table schema |
Example:
CUSTOM_EVENT_PARAMS_ARRAY: [
{ name: "gtm_info", type: "string", description: "GTM container ID + container version"},
{ name: "consent", type: "string", description: "The current consent status of the user"},
{ name: "page_type", type: "string", description: "The type of page: category, listing, product, home, checkout" }
]
User Properties
CUSTOM_USER_PROPERTIES_ARRAY: []
Description: Allows you to extract user properties into their own columns.
Customization: Define user properties you want to extract in the format { name: "param_name", cleaningMethod: method }
. These will be stored in the user_properties
column in the ga4_events
table.
Field | Description |
---|---|
name | Required. The original user property name to extract |
type | Required. Data type for extraction and storage. int : Extracts from int_value, stores as INT; string : Extracts from all values, stores as STRING; decimal : Extracts from all numerical values, stores as FLOAT |
renameTo | Optional. New name for the output column. If not specified, original parameter name is used |
cleaningMethod | Optional. Transformation method to apply: lowerSQL : Converts string outputs to lowercase |
description | Optional. Add a description for a column that will show up in the table schema |
Example:
CUSTOM_USER_PROPERTIES_ARRAY: [
{ name: "user_tier", type: "string", description: "The loyalty tier of the user (e.g., 'gold', 'silver')" },
{ name: "lifetime_value", type: "decimal", renameTo: "ltv", description: "The calculated lifetime value of the user" },
{ name: "first_purchase_date", type: "string", description: "The date of the user's first purchase" }
]
Custom Item Parameters
CUSTOM_ITEM_PARAMS_ARRAY: []
Description: Similar to event parameters, but for item custom dimensions and metrics. These will be stored in the items.item_params_custom.*
column in the ga4_events
table.
Customization: Add custom item parameters in the same format.
Field | Description |
---|---|
name | Required. The original item parameter name to extract |
type | Required. Data type for extraction and storage. int : Extracts from int_value, stores as INT; string : Extracts from all values, stores as STRING; decimal : Extracts from all numerical values, stores as FLOAT |
renameTo | Optional. New name for the output column. If not specified, original parameter name is used |
cleaningMethod | Optional. Transformation method to apply: lowerSQL : Converts string outputs to lowercase |
description | Optional. Add a description for a column that will show up in the table schema |
Example:
CUSTOM_ITEM_PARAMS_ARRAY: [
{ name: "stock_status", type: "string", description: "Whether the item is currenly in_stock/low_stock/out_of_stock", cleaningMethod: lowerSQL },
{ name: "stock_amount", type: "string", description: "The amount of items currently in stock" },
{ name: "margin", type: "decimal", description: "Encoded margin for the item. It can be decoded for readability." }
]
Custom URL Parameters
CUSTOM_URL_PARAMS_ARRAY: []
Description: Allows you to extract custom URL parameters into their own columns. These will be stored in the url_params_custom.*
column in the ga4_events
table.
Customization: Define custom URL parameters you want to extract in the format { name: "q", renameTo: "query_parameter", cleaningMethod: lowerSQL }
.
Note that URL params will always be strings.
Field | Description |
---|---|
name | Required. The original item parameter name to extract |
renameTo | Optional. New name for the output column. If not specified, original parameter name is used |
cleaningMethod | Optional. Transformation method to apply: lowerSQL : Converts string outputs to lowercase |
description | Optional. Add a description for a column that will show up in the table schema |
Example:
CUSTOM_URL_PARAMS_ARRAY: [
{ name: "q", cleaningMethod: lowerSQL },
{ name: "product-size", renameTo: "size" }
]
Custom Session Parameters π
CUSTOM_SESSION_PARAMS_ARRAY: []
Description: Include session-level custom parameters in the sessions table. These will appear in int_ga4_sessions
and ga4_sessions
tables in the session_params_custom
column.
Customization: Add parameters with aggregation methods to select a single value per session.
Field | Description |
---|---|
name | Required. The full path to the parameter to aggregate (e.g., event_params_custom.lead_id ). It should exist in the ga4_events table. |
pick | Required. The method to select a single value: first (first non-null value based on timestamp), last (last non-null value), min , max , boolean (true if the parameter exists with any value at least once). |
renameTo | Optional. New name for the output column. If not specified, original parameter name is used |
cleaningMethod | Optional. Transformation method to apply: lowerSQL : Converts string outputs to lowercase |
description | Optional. Add a description for a column that will show up in the table schema |
Example:
CUSTOM_SESSION_PARAMS_ARRAY: [
{ name: "event_params_custom.lead_id", pick: "first", renameTo: "initial_lead_id", description: "First lead ID captured in session" },
{ name: "event_params_custom.lead_status", pick: "last", renameTo: "final_lead_status", description: "Final lead qualification status in session" },
{ name: "safe_cast(event_params_custom.engagement_score as INT64)", pick: "max", renameTo: "peak_engagement_score", description: "Highest user engagement score achieved in session" },
{ name: "event_params_custom.feature_tier", pick: "boolean", renameTo: "viewed_premium_features", description: "Whether user explored premium feature tiers" }
]
This will throw an error:
{ name: "items.item_params_custom.stock_status", pick: "first" }
Custom Session Totals π
CUSTOM_SESSION_TOTALS: {
eventsToCount: [],
uniqueFields: [],
sumFields: [],
}
Description: Provide a list of fields you want to aggregate at the session level. These will end up in the int_ga4_sessions
ga4_sessions table in the corresponding struct within totals (e.g., totals.count, totals.distinct_count, totals.has_events, totals.sum).
Customization: Configure the eventsToCount
, uniqueFields
, and sumFields
arrays. You can use a simple string for the name or an object for more complex logic like renaming, filtering, or grouping events.
Column descriptions are auto-generated for session total fields based on the configuration!
Each array in CUSTOM_SESSION_TOTALS
supports multiple format options to accommodate different use cases. Choose the format that best fits your needs!
Events to Count (eventsToCount
)β
Counts occurrences of specific events per session. Results are stored in totals.count.*
and totals.has_event.*
in int_ga4_sessions
and ga4_sessions
tables.
Supported Formats:
Format | Use Case | Example |
---|---|---|
String | Simple event counting | 'page_view' |
Object with metadata | Event counting with custom naming | { name: 'sign_up', renameTo: 'registrations' } |
Object with event array | Count multiple related events as one metric | { name: ['video_start', 'video_progress'], renameTo: 'video_interactions' } |
Available Fields:
name
(required): Event name(s) to count - string or array of stringsrenameTo
(optional): Custom output column name
Unique Fields (uniqueFields
)β
Counts distinct values of specified fields per session. Results are stored in totals.distinct_count.*
in int_ga4_sessions
and ga4_sessions
tables.
Supported Formats:
Format | Use Case | Example |
---|---|---|
Object with field path | Count unique values across all events | { name: 'ecommerce.transaction_id', renameTo: 'unique_transactions' } |
Object with event filter | Count unique values only from specific events | { name: 'event_params.currency', eventFilter: ['purchase', 'refund'], renameTo: 'transaction_currencies' } |
Available Fields:
name
(required): Field path to count unique values fromeventFilter
(optional): Limit counting to specific event types - string or array of stringsrenameTo
(optional): Custom output column name
Sum Fields (sumFields
)β
Sums numeric values of specified fields per session. Results are stored in totals.sum.*
in int_ga4_sessions
and ga4_sessions
tables.
Supported Formats:
Format | Use Case | Example |
---|---|---|
String | Simple field summation across all events | 'event_params.video_duration' |
Object with metadata | Field summation with custom naming | { name: 'event_params.engagement_time_msec', renameTo: 'total_engagement' } |
Object with event filter | Sum values only from specific events | { name: 'ecommerce.purchase_revenue', eventFilter: ['purchase', 'refund'], renameTo: 'net_purchase_revenue' }, |
Available Fields:
name
(required): Field path to count unique values fromeventFilter
(optional): Limit counting to specific event types - string or array of stringsrenameTo
(optional): Custom output column name
Example:
CUSTOM_SESSION_TOTALS: {
eventsToCount: [
'page_view',
{ name: 'sign_up', renameTo: 'registrations', description: 'User registration events in session' },
{ name: ['video_start', 'video_progress', 'video_complete'], renameTo: 'video_engagement', description: 'All video interaction events' }
],
uniqueFields: [
{ name: 'ecommerce.transaction_id', renameTo: 'unique_transactions', description: 'Number of unique transaction IDs in session' },
{ name: 'event_params.currency', renameTo: 'funnel_currencies', eventFilter: ['begin_checkout', 'purchase'], description: 'Unique currencies used in checkout funnel' },
],
sumFields: [
{ name: 'event_params.engagement_time_msec', renameTo: 'total_engagement_time', description: 'Sum of engagement time in milliseconds' },
{ name: 'ecommerce.purchase_revenue', eventFilter: ['purchase', 'refund'], renameTo: 'net_purchase_revenue' },
]
}
This will throw an error:
{ name: "items.item_params_custom.item_sku", eventFilter: ['purchase'] }
Event and Hostname Filters
Events to Exclude
EVENTS_TO_EXCLUDE: []
Description: List the event names that should be excluded from the events table.
Customization: Add event names you donβt want to process.
Example:
EVENTS_TO_EXCLUDE: ["user_engagement", "scroll"]
Hostname Exclude/Include
HOSTNAME_EXCLUDE: []
HOSTNAME_INCLUDE_ONLY: []
Description: Exclude or include specific hostnames from the data.
Customization: Add hostnames to either list based on whether you want to include or exclude them from the data.
Example:
HOSTNAME_EXCLUDE: []
HOSTNAME_INCLUDE_ONLY: [ "www.ga4dataform.com", "ga4dataform.com" ]
Last Non-Direct Lookback Days
LAST_NON_DIRECT_LOOKBACK_DAYS: 90
Description: Defines the number of days to look back when assigning a source for a user who lands on your site without any known sources.
Customization: Change the number of days to your liking.
Extra Channel Grouping
EXTRA_CHANNEL_GROUPS: true
Description: If you enable this, session_source.first/last.default_channel_grouping
in int_ga4_sessions
and session_source.default_channel_grouping
in ga4_sessions
will use an Organic AI grouping based on a set list of matching AI-related referrer domains listed in includes/core/extra/source_categories.json
.
User stitching π
USER_STITCHING: {
enabled: false,
dimensions: [],
lookBackDays: 90,
lookAheadDays: 10
}
Description: The user stitching functionality looks at the cookie of the user (user_pseudo_id) and "fills in gaps" of certain column(s) - most notably the user_id. It does this on the session level.
How it works:
- Lookback: we look back in time to find if that visitor in the past has had a session with a
user_id
, and assign thatuser_id
to the current session. The last seenuser_id
is used to fill in the blanks. - Lookahead: if there is no session in the past with a
user_id
, we look in the future to find one. This effectively rewrites the history and de-anonimises sessions leading up to a login (caution is advised not to breach privacy regulations, policies). - Results are stored in the
stitched_ids
struct column in thega4_sessions
table. - The stitching is not restricted to
user_id
only: if you have (custom) parameters, user_properties, url_params, you can stitch them via the same procedure using thedimensions
array.
Configuration Options:
Field | Description |
---|---|
enabled | Enable/disable user stitching functionality |
dimensions | Array of objects that contain additional fields to stitch beyond user_id |
lookBackDays | Days to search backwards for user identification (default: 90) |
lookAheadDays | Days to search forward for retroactive identification (default: 10) |
Dimensions Format:
name
: the column name as it appears inga4_events
renameTo
: the column name as it will appear in thestitched_ids
struct inga4_sessions
If no renameTo
field is used, the full (sanitized) name is used. So event_params_custom.lead_id
will become event_params_custom_lead_id
.
Example:
USER_STITCHING: {
enabled: true,
dimensions: [
{ name: "user_properties.datatrics_segment", renameTo: "datatrics_segment" },
{ name: "event_params.showroom_name" }
],
lookBackDays: 90,
lookAheadDays: 3
}
Be mindul of user choices and privacy when stitching data. If you use stitched data to enrich customer profiles with information gathered while they were logged out, it may not be what they expect. Or even what you promised in your own privacy policy.
Custom Event ID Parameter π
CUSTOM_EVENT_ID_PARAMETER: ''
Description: Use CUSTOM_EVENT_ID_PARAMETER
to add another custom event parameter that makes an event unique. This field will be concatenated with the default event_id fields and will ensure event uniqueness. We usually recommend using a UNIX timestamp or a custom event_id.
Example:
CUSTOM_EVENT_ID_PARAMETER: 'event_params_custom.hit_timestamp'
Which will return this SQL in ga4_events
:
farm_fingerprint(
event_name || '-' ||
time.event_timestamp || '-' ||
coalesce(user_pseudo_id, 'no') || '-' ||
coalesce(event_params.ga_session_id, 0) || '-' ||
coalesce(batch.batch_page_id, 0) || '-' ||
coalesce(batch.batch_ordering_id,0) || '-' ||
coalesce(batch.batch_event_index,0) || '-' ||
coalesce(ecommerce.ecommerce_transaction_id, "no") || '-' ||
coalesce(event_params.engagement_time_msec, 0)
|| safe_cast(coalesce(event_params_custom.hit_timestamp,'no') as string)
) as event_id
Make sure to add the custom field you are using as CUSTOM_EVENT_ID_PARAMETER
to the CUSTOM_EVENT_PARAMS_ARRAY
as well!
Deduplicating Transactions
TRANSACTIONS_DEDUPE: true
Description: By default, we leave duplicate transaction_ids alone in ga4_transactions
, but you can enable deduplication with this setting.
Setting this to true will still keep all transactions with NULL transaction_id
Customization: enable/disable
Transaction Totals User Identifier
TRANSACTION_TOTALS_UID: 'user_pseudo_id'
Description: Specifies the user identifier field for calculating running transaction counts in ga4_transactions
.
Customization: Default identifier is user_pseudo_id
. Change to your preferred user identifier (e.g., user_id
for authenticated users). Ensure the specified column exists in your data.
Data Quality Assertions
These assertions check the data for consistency and quality. You can enable or disable specific checks by setting them to true or false.
Event ID Uniqueness
ASSERTIONS_EVENT_ID_UNIQUENESS: true
Ensures that each event has a unique event ID.
Session ID Uniqueness
ASSERTIONS_SESSION_ID_UNIQUENESS: true
Ensures that each session has a unique session ID.
Session Duration Validity
ASSERTIONS_SESSION_DURATION_VALIDITY: true
Ensures that session durations are valid and within reasonable limits.
Session Validity
ASSERTIONS_SESSIONS_VALIDITY: true
Validates that session data is correct.
Tables Timeliness
ASSERTIONS_TABLES_TIMELINESS: true
Checks if the GA4 tables are up to date.
Transaction ID Completeness
ASSERTIONS_TRANSACTION_ID_COMPLETENESS: false
Checks if transaction IDs are present for purchase events.
User Pseudo ID Completeness
ASSERTIONS_USER_PSEUDO_ID_COMPLETENESS: false
Ensures that the "user_pseudo_id" is present for all hits.
If you use Advanced Consent Mode, this assertion will always fail.