Skip to main content

Custom Configuration

The configuration system is arguably the most important element built into GA4Dataform. With a proper configuration, you can manage variables, modularize features, and control your data transformation pipeline without modifying core SQLX files.

Overview​

Think of the configuration as a control panel of JavaScript variables that determine what the final SQL output will look like. You can customize elements such as:

  • Start date for data processing
  • User properties to extract
  • Event and item parameters to unnest
  • Events to filter out
  • URL parameters to capture
  • User stitching πŸ’Ž
  • Session totals πŸ’Ž
  • Session-level custom parameters πŸ’Ž
  • Support for fresh and intraday tables πŸ’Ž
  • ... and many more!

This approach enables distribution of the same Dataform repository without hardcoded variables, making your implementation portable and maintainable.

Configuration Architecture​

GA4Dataform uses a dual-configuration approach with two separate files:

Core Configuration (includes/core/default_config.js)​

  • Contains default settings for standard event parameters and click IDs
  • Provides fallback variables to prevent compilation errors
  • Subject to updates when new features are introduced
  • Should not be modified by users

Custom Configuration (includes/custom/config.js)​

  • Your customization layer
  • Takes precedence over default configuration
  • Safe from updates and overwrites
  • Where all your specific settings should be defined

Configuration Merging​

The two configuration objects are merged using a JavaScript spread operator that prioritizes your custom settings:

const getConfig = () => {
const { coreConfig } = require("./default_config");
const { customConfig } = require("../custom/config");
return { ...coreConfig, ...customConfig };
};

Your values in config.js will always take precedence over the defaults, ensuring your customizations are preserved during updates.

If you are interested in seeing how these can be implemented after the install, take a look at the Post-Installation Guide!

Click on each card below to see more details!

GA4 Start Date
GA4_START_DATE: "2020-01-01"

Description: Defines the starting date for the GA4 data that the model should process.
Customization: Change this date to reflect the earliest data you want to include in your reports. For example, if you started collecting GA4 data on a different date, modify this to match that date.

Use Fresh Events πŸ’Ž
USE_FRESH_EVENTS: true

Description: If you set this to true, ga4_events will query from fresh table shards if they are available. If events_* table also exists for the same date, that will be used. Priority for same date shards: 1. regular events_ 2. fresh_ (if enabled) 3. intraday_ (if enabled)

Use Intraday Events πŸ’Ž
USE_INTRADAY_EVENTS: false

Description: If you set this to true, ga4_events will query from intraday table shards if they are available. Priority for same date shards: 1. regular events_ 2. fresh_ (if enabled) 3. intraday_ (if enabled)

Custom Event Parameters
CUSTOM_EVENT_PARAMS_ARRAY: []

Description: This allows you to specify custom event parameters that are not part of the standard GA4 data. The custom parameters will be added to the event_params_custom column in the ga4_events table.
Customization: Add parameters using the format { name: "paramname", type: "TYPE", renameTo: "outputcolumnname", description: "This parameter means XYZ" }.

FieldDescription
nameRequired. The original event parameter name to extract
typeRequired. Data type for extraction and storage. int: Extracts from int_value, stores as INT; string: Extracts from all values, stores as STRING; decimal: Extracts from all numerical values, stores as FLOAT
renameToOptional. New name for the output column. If not specified, original parameter name is used
cleaningMethodOptional. Transformation method to apply: lowerSQL: Converts string outputs to lowercase
descriptionOptional. Add a description for a column that will show up in the table schema

Example:

CUSTOM_EVENT_PARAMS_ARRAY: [
{ name: "gtm_info", type: "string", description: "GTM container ID + container version"},
{ name: "consent", type: "string", description: "The current consent status of the user"},
{ name: "page_type", type: "string", description: "The type of page: category, listing, product, home, checkout" }
]
User Properties
CUSTOM_USER_PROPERTIES_ARRAY: []

Description: Allows you to extract user properties into their own columns.
Customization: Define user properties you want to extract in the format { name: "param_name", cleaningMethod: method }. These will be stored in the user_properties column in the ga4_events table.

FieldDescription
nameRequired. The original user property name to extract
typeRequired. Data type for extraction and storage. int: Extracts from int_value, stores as INT; string: Extracts from all values, stores as STRING; decimal: Extracts from all numerical values, stores as FLOAT
renameToOptional. New name for the output column. If not specified, original parameter name is used
cleaningMethodOptional. Transformation method to apply: lowerSQL: Converts string outputs to lowercase
descriptionOptional. Add a description for a column that will show up in the table schema

Example:

  CUSTOM_USER_PROPERTIES_ARRAY: [
{ name: "user_tier", type: "string", description: "The loyalty tier of the user (e.g., 'gold', 'silver')" },
{ name: "lifetime_value", type: "decimal", renameTo: "ltv", description: "The calculated lifetime value of the user" },
{ name: "first_purchase_date", type: "string", description: "The date of the user's first purchase" }
]
Custom Item Parameters
CUSTOM_ITEM_PARAMS_ARRAY: []

Description: Similar to event parameters, but for item custom dimensions and metrics. These will be stored in the items.item_params_custom.* column in the ga4_events table. Customization: Add custom item parameters in the same format.

FieldDescription
nameRequired. The original item parameter name to extract
typeRequired. Data type for extraction and storage. int: Extracts from int_value, stores as INT; string: Extracts from all values, stores as STRING; decimal: Extracts from all numerical values, stores as FLOAT
renameToOptional. New name for the output column. If not specified, original parameter name is used
cleaningMethodOptional. Transformation method to apply: lowerSQL: Converts string outputs to lowercase
descriptionOptional. Add a description for a column that will show up in the table schema

Example:

CUSTOM_ITEM_PARAMS_ARRAY: [
{ name: "stock_status", type: "string", description: "Whether the item is currenly in_stock/low_stock/out_of_stock", cleaningMethod: lowerSQL },
{ name: "stock_amount", type: "string", description: "The amount of items currently in stock" },
{ name: "margin", type: "decimal", description: "Encoded margin for the item. It can be decoded for readability." }
]
Custom URL Parameters
CUSTOM_URL_PARAMS_ARRAY: []

Description: Allows you to extract custom URL parameters into their own columns. These will be stored in the url_params_custom.* column in the ga4_events table. Customization: Define custom URL parameters you want to extract in the format { name: "q", renameTo: "query_parameter", cleaningMethod: lowerSQL }. Note that URL params will always be strings.

FieldDescription
nameRequired. The original item parameter name to extract
renameToOptional. New name for the output column. If not specified, original parameter name is used
cleaningMethodOptional. Transformation method to apply: lowerSQL: Converts string outputs to lowercase
descriptionOptional. Add a description for a column that will show up in the table schema

Example:

CUSTOM_URL_PARAMS_ARRAY: [
{ name: "q", cleaningMethod: lowerSQL },
{ name: "product-size", renameTo: "size" }
]
Custom Session Parameters πŸ’Ž
  CUSTOM_SESSION_PARAMS_ARRAY: []

Description: Include session-level custom parameters in the sessions table. These will appear in int_ga4_sessions and ga4_sessions tables in the session_params_custom column. Customization: Add parameters with aggregation methods to select a single value per session.

FieldDescription
nameRequired. The full path to the parameter to aggregate (e.g., event_params_custom.lead_id). It should exist in the ga4_events table.
pickRequired. The method to select a single value: first (first non-null value based on timestamp), last (last non-null value), min, max, boolean (true if the parameter exists with any value at least once).
renameToOptional. New name for the output column. If not specified, original parameter name is used
cleaningMethodOptional. Transformation method to apply: lowerSQL: Converts string outputs to lowercase
descriptionOptional. Add a description for a column that will show up in the table schema

Example:

  CUSTOM_SESSION_PARAMS_ARRAY: [
{ name: "event_params_custom.lead_id", pick: "first", renameTo: "initial_lead_id", description: "First lead ID captured in session" },
{ name: "event_params_custom.lead_status", pick: "last", renameTo: "final_lead_status", description: "Final lead qualification status in session" },
{ name: "safe_cast(event_params_custom.engagement_score as INT64)", pick: "max", renameTo: "peak_engagement_score", description: "Highest user engagement score achieved in session" },
{ name: "event_params_custom.feature_tier", pick: "boolean", renameTo: "viewed_premium_features", description: "Whether user explored premium feature tiers" }
]
Item fields are not (yet) supported!

This will throw an error:
{ name: "items.item_params_custom.stock_status", pick: "first" }

Custom Session Totals πŸ’Ž
CUSTOM_SESSION_TOTALS: {
eventsToCount: [],
uniqueFields: [],
sumFields: [],
}

Description: Provide a list of fields you want to aggregate at the session level. These will end up in the int_ga4_sessions ga4_sessions table in the corresponding struct within totals (e.g., totals.count, totals.distinct_count, totals.has_events, totals.sum).

Customization: Configure the eventsToCount, uniqueFields, and sumFields arrays. You can use a simple string for the name or an object for more complex logic like renaming, filtering, or grouping events.

note

Column descriptions are auto-generated for session total fields based on the configuration!

Each array in CUSTOM_SESSION_TOTALS supports multiple format options to accommodate different use cases. Choose the format that best fits your needs!

Events to Count (eventsToCount)​

Counts occurrences of specific events per session. Results are stored in totals.count.* and totals.has_event.* in int_ga4_sessions and ga4_sessions tables.

Supported Formats:

FormatUse CaseExample
StringSimple event counting'page_view'
Object with metadataEvent counting with custom naming{ name: 'sign_up', renameTo: 'registrations' }
Object with event arrayCount multiple related events as one metric{ name: ['video_start', 'video_progress'], renameTo: 'video_interactions' }

Available Fields:

  • name (required): Event name(s) to count - string or array of strings
  • renameTo (optional): Custom output column name
Unique Fields (uniqueFields)​

Counts distinct values of specified fields per session. Results are stored in totals.distinct_count.* in int_ga4_sessions and ga4_sessions tables.

Supported Formats:

FormatUse CaseExample
Object with field pathCount unique values across all events{ name: 'ecommerce.transaction_id', renameTo: 'unique_transactions' }
Object with event filterCount unique values only from specific events{ name: 'event_params.currency', eventFilter: ['purchase', 'refund'], renameTo: 'transaction_currencies' }

Available Fields:

  • name (required): Field path to count unique values from
  • eventFilter (optional): Limit counting to specific event types - string or array of strings
  • renameTo (optional): Custom output column name
Sum Fields (sumFields)​

Sums numeric values of specified fields per session. Results are stored in totals.sum.* in int_ga4_sessions and ga4_sessions tables.

Supported Formats:

FormatUse CaseExample
StringSimple field summation across all events'event_params.video_duration'
Object with metadataField summation with custom naming{ name: 'event_params.engagement_time_msec', renameTo: 'total_engagement' }
Object with event filterSum values only from specific events{ name: 'ecommerce.purchase_revenue', eventFilter: ['purchase', 'refund'], renameTo: 'net_purchase_revenue' },

Available Fields:

  • name (required): Field path to count unique values from
  • eventFilter (optional): Limit counting to specific event types - string or array of strings
  • renameTo (optional): Custom output column name

Example:

CUSTOM_SESSION_TOTALS: {
eventsToCount: [
'page_view',
{ name: 'sign_up', renameTo: 'registrations', description: 'User registration events in session' },
{ name: ['video_start', 'video_progress', 'video_complete'], renameTo: 'video_engagement', description: 'All video interaction events' }
],
uniqueFields: [
{ name: 'ecommerce.transaction_id', renameTo: 'unique_transactions', description: 'Number of unique transaction IDs in session' },
{ name: 'event_params.currency', renameTo: 'funnel_currencies', eventFilter: ['begin_checkout', 'purchase'], description: 'Unique currencies used in checkout funnel' },
],
sumFields: [
{ name: 'event_params.engagement_time_msec', renameTo: 'total_engagement_time', description: 'Sum of engagement time in milliseconds' },
{ name: 'ecommerce.purchase_revenue', eventFilter: ['purchase', 'refund'], renameTo: 'net_purchase_revenue' },
]
}
Item fields are not (yet) supported!

This will throw an error:
{ name: "items.item_params_custom.item_sku", eventFilter: ['purchase'] }

Event and Hostname Filters

Events to Exclude

EVENTS_TO_EXCLUDE: []

Description: List the event names that should be excluded from the events table.
Customization: Add event names you don’t want to process.
Example:

EVENTS_TO_EXCLUDE: ["user_engagement", "scroll"]

Hostname Exclude/Include

HOSTNAME_EXCLUDE: []
HOSTNAME_INCLUDE_ONLY: []

Description: Exclude or include specific hostnames from the data.
Customization: Add hostnames to either list based on whether you want to include or exclude them from the data. Example:

HOSTNAME_EXCLUDE: []
HOSTNAME_INCLUDE_ONLY: [ "www.ga4dataform.com", "ga4dataform.com" ]
Last Non-Direct Lookback Days
LAST_NON_DIRECT_LOOKBACK_DAYS: 90

Description: Defines the number of days to look back when assigning a source for a user who lands on your site without any known sources.
Customization: Change the number of days to your liking.

Extra Channel Grouping
EXTRA_CHANNEL_GROUPS: true

Description: If you enable this, session_source.first/last.default_channel_grouping in int_ga4_sessions and session_source.default_channel_grouping in ga4_sessions will use an Organic AI grouping based on a set list of matching AI-related referrer domains listed in includes/core/extra/source_categories.json.

User stitching πŸ’Ž
USER_STITCHING: {
enabled: false,
dimensions: [],
lookBackDays: 90,
lookAheadDays: 10
}

Description: The user stitching functionality looks at the cookie of the user (user_pseudo_id) and "fills in gaps" of certain column(s) - most notably the user_id. It does this on the session level.

How it works:

  • Lookback: we look back in time to find if that visitor in the past has had a session with a user_id, and assign that user_id to the current session. The last seen user_id is used to fill in the blanks.
  • Lookahead: if there is no session in the past with a user_id, we look in the future to find one. This effectively rewrites the history and de-anonimises sessions leading up to a login (caution is advised not to breach privacy regulations, policies).
  • Results are stored in the stitched_ids struct column in the ga4_sessions table.
  • The stitching is not restricted to user_id only: if you have (custom) parameters, user_properties, url_params, you can stitch them via the same procedure using the dimensions array.

Configuration Options:

FieldDescription
enabledEnable/disable user stitching functionality
dimensionsArray of objects that contain additional fields to stitch beyond user_id
lookBackDaysDays to search backwards for user identification (default: 90)
lookAheadDaysDays to search forward for retroactive identification (default: 10)

Dimensions Format:

  • name: the column name as it appears in ga4_events
  • renameTo: the column name as it will appear in the stitched_ids struct in ga4_sessions
Field naming

If no renameTo field is used, the full (sanitized) name is used. So event_params_custom.lead_id will become event_params_custom_lead_id.

Example:

  USER_STITCHING: {
enabled: true,
dimensions: [
{ name: "user_properties.datatrics_segment", renameTo: "datatrics_segment" },
{ name: "event_params.showroom_name" }
],
lookBackDays: 90,
lookAheadDays: 3
}
Respect user privacy

Be mindul of user choices and privacy when stitching data. If you use stitched data to enrich customer profiles with information gathered while they were logged out, it may not be what they expect. Or even what you promised in your own privacy policy.

Custom Event ID Parameter πŸ’Ž
CUSTOM_EVENT_ID_PARAMETER: ''

Description: Use CUSTOM_EVENT_ID_PARAMETER to add another custom event parameter that makes an event unique. This field will be concatenated with the default event_id fields and will ensure event uniqueness. We usually recommend using a UNIX timestamp or a custom event_id.

Example:

CUSTOM_EVENT_ID_PARAMETER: 'event_params_custom.hit_timestamp'

Which will return this SQL in ga4_events:

farm_fingerprint(
event_name || '-' ||
time.event_timestamp || '-' ||
coalesce(user_pseudo_id, 'no') || '-' ||
coalesce(event_params.ga_session_id, 0) || '-' ||
coalesce(batch.batch_page_id, 0) || '-' ||
coalesce(batch.batch_ordering_id,0) || '-' ||
coalesce(batch.batch_event_index,0) || '-' ||
coalesce(ecommerce.ecommerce_transaction_id, "no") || '-' ||
coalesce(event_params.engagement_time_msec, 0)
|| safe_cast(coalesce(event_params_custom.hit_timestamp,'no') as string)
) as event_id
warning

Make sure to add the custom field you are using as CUSTOM_EVENT_ID_PARAMETER to the CUSTOM_EVENT_PARAMS_ARRAY as well!

Deduplicating Transactions
TRANSACTIONS_DEDUPE: true

Description: By default, we leave duplicate transaction_ids alone in ga4_transactions, but you can enable deduplication with this setting.

note

Setting this to true will still keep all transactions with NULL transaction_id

Customization: enable/disable

Transaction Totals User Identifier
TRANSACTION_TOTALS_UID: 'user_pseudo_id'

Description: Specifies the user identifier field for calculating running transaction counts in ga4_transactions. Customization: Default identifier is user_pseudo_id. Change to your preferred user identifier (e.g., user_id for authenticated users). Ensure the specified column exists in your data.

Data Quality Assertions

These assertions check the data for consistency and quality. You can enable or disable specific checks by setting them to true or false.

Event ID Uniqueness

ASSERTIONS_EVENT_ID_UNIQUENESS: true

Ensures that each event has a unique event ID.

Session ID Uniqueness

ASSERTIONS_SESSION_ID_UNIQUENESS: true

Ensures that each session has a unique session ID.

Session Duration Validity

ASSERTIONS_SESSION_DURATION_VALIDITY: true

Ensures that session durations are valid and within reasonable limits.

Session Validity

ASSERTIONS_SESSIONS_VALIDITY: true

Validates that session data is correct.

Tables Timeliness

ASSERTIONS_TABLES_TIMELINESS: true

Checks if the GA4 tables are up to date.

Transaction ID Completeness

ASSERTIONS_TRANSACTION_ID_COMPLETENESS: false

Checks if transaction IDs are present for purchase events.

User Pseudo ID Completeness

ASSERTIONS_USER_PSEUDO_ID_COMPLETENESS: false

Ensures that the "user_pseudo_id" is present for all hits.

Advanced Consent Mode

If you use Advanced Consent Mode, this assertion will always fail.