Skip to main content

Custom Lineage

💎Premium Feature

This feature is exclusive for Premium users.

What it does​

Custom Lineage lets you inject your own SQL transformations between the core GA4 pipeline tables. Instead of forking or patching core models, you get dedicated interception points where you can add columns, fix data, apply business logic - and every downstream table automatically reads from your custom version.

Think of it as middleware for your data pipeline: the core model produces a table, your custom layer transforms it, and everything downstream picks up the result.

Interception points​

There are three interception points, each sitting between a core table and its consumers:

ga4_events
|
+---> ga4_events_custom -------+---> int_ga4_sessions
| +---> int_ga4_transactions
| +---> ga4_item_funnel_sessions
| +---> ga4_event_conversion_rates_report
|
+---> int_ga4_sessions
|
+---> int_ga4_sessions_custom ---> ga4_sessions
|
+---> ga4_sessions_custom ---> ga4_item_funnel_report
+---> ga4_item_list_attribution_report
+---> ga4_event_conversion_rates_report
Interception pointSits betweenDataset
ga4_events_customga4_events and its downstream consumersoutputs
int_ga4_sessions_customint_ga4_sessions and ga4_sessionstransformations
ga4_sessions_customga4_sessions and its downstream consumersoutputs

When a custom lineage point is enabled, every core and premium module that reads from the parent table automatically switches to reading from the custom version. No manual rewiring needed.

Configuration​

In your includes/custom/modules/ga4/config.js, add the CUSTOM_LINEAGE object:

CUSTOM_LINEAGE: {
ga4_events_custom: false, // disabled (default)
int_ga4_sessions_custom: false, // disabled (default)
ga4_sessions_custom: false // disabled (default)
},

Each key accepts one of three values:

ValueTypeCostUse when
falseDisabledNoneYou don't need this interception point
"view"BigQuery viewZero storage, zero processing overheadScalar transforms: REPLACE, EXCEPT, JOIN, CASE/IF
"incremental"Incremental tableDuplicates data, processes new partitions onlyWindow functions (ROW_NUMBER, LAST_VALUE, etc.) or transforms that prevent predicate pushdown

Template files​

The SQL templates live in definitions/custom/modules/ga4/:

  • ga4_events_custom.sqlx
  • int_ga4_sessions_custom.sqlx
  • ga4_sessions_custom.sqlx

Each template ships as a pass-through (SELECT * FROM source). You edit the SQL to add your logic. The config block and pre_operations are already wired up - you only touch the final SELECT.

Existing installations

These files live in definitions/custom/modules/ga4/, which is preserved during updates. Only fresh installations receive them automatically. If you're updating an existing installation, create the files manually using the code below.

ga4_events_custom.sqlx

Create at definitions/custom/modules/ga4/ga4_events_custom.sqlx:

config {
type: require("includes/core/modules/ga4/helpers").helpers.getModuleConfig('ga4').CUSTOM_LINEAGE.ga4_events_custom === "incremental" ? "incremental" : "view",
schema: dataform.projectConfig.vars.OUTPUTS_DATASET,
tags:["module_ga4", "events"],
description: "Custom lineage: intercept ga4_events before it flows into downstream tables. Edit to add/modify columns.",
...(require("includes/core/modules/ga4/helpers").helpers.getModuleConfig('ga4').CUSTOM_LINEAGE.ga4_events_custom === "incremental" ? {
onSchemaChange: "EXTEND",
bigquery: {
partitionBy: "event_date",
clusterBy: [...((require("includes/core/modules/ga4/helpers").helpers.getModuleConfig("ga4").CLUSTER_BY || {}).ga4_events || []).slice(0, 2), "event_name", "session_id"],
labels: require("includes/core/helpers.js").helpers.storageLabels()
}
} : {}),
...require("includes/core/helpers.js").helpers.isModuleEnabled('ga4')
}

js {
const { helpers } = require("includes/core/modules/ga4/helpers");
const config = helpers.getModuleConfig('ga4');
const isIncrementalTable = config.CUSTOM_LINEAGE.ga4_events_custom === "incremental";
}

pre_operations {
${isIncrementalTable ? helpers.generatePreOperationsSQL("event_date", when, self, incremental) : ""}
}

/*
Default: pass-through. Edit below to add your custom logic.
Examples:
SELECT * REPLACE(<your_struct> AS fixed_traffic_source) FROM ...
SELECT * EXCEPT(col), <expr> AS col FROM ...

MATERIALIZATION MODES (set in CUSTOM_LINEAGE config):
"view" - zero cost, scalar transforms only (REPLACE, EXCEPT, JOIN).
Predicates from downstream tables push through automatically.
"incremental" - materializes data with own date_checkpoint.
Required for window functions (ROW_NUMBER, LAST_VALUE, etc.)
to avoid full table scans.
*/
WITH source_ga4_events AS (
SELECT *
FROM ${ref({"database": dataform.projectConfig.defaultProject, "schema": dataform.projectConfig.vars.OUTPUTS_DATASET, "name": "ga4_events"})}
-- noqa: disable=all
${isIncrementalTable ? "WHERE event_date BETWEEN date_checkpoint AND date_end" : ""}
-- noqa: enable=all
)

SELECT *
FROM source_ga4_events
int_ga4_sessions_custom.sqlx

Create at definitions/custom/modules/ga4/int_ga4_sessions_custom.sqlx:

config {
type: require("includes/core/modules/ga4/helpers").helpers.getModuleConfig('ga4').CUSTOM_LINEAGE.int_ga4_sessions_custom === "incremental" ? "incremental" : "view",
schema: dataform.projectConfig.vars.TRANSFORMATIONS_DATASET,
tags:["module_ga4", "sessions"],
description: "Custom lineage: intercept int_ga4_sessions before it flows into ga4_sessions. Edit to add/modify columns.",
...(require("includes/core/modules/ga4/helpers").helpers.getModuleConfig('ga4').CUSTOM_LINEAGE.int_ga4_sessions_custom === "incremental" ? {
onSchemaChange: "EXTEND",
bigquery: {
partitionBy: "session_date",
clusterBy: [...((require("includes/core/modules/ga4/helpers").helpers.getModuleConfig("ga4").CLUSTER_BY || {}).ga4_sessions || []).slice(0, 3), "session_id"],
labels: require("includes/core/helpers.js").helpers.storageLabels()
}
} : {}),
...require("includes/core/helpers.js").helpers.isModuleEnabled('ga4')
}

js {
const { helpers } = require("includes/core/modules/ga4/helpers");
const config = helpers.getModuleConfig('ga4');
const isIncrementalTable = config.CUSTOM_LINEAGE.int_ga4_sessions_custom === "incremental";
}

pre_operations {
${isIncrementalTable ? helpers.generatePreOperationsSQL("session_date", when, self, incremental) : ""}
}

/*
Default: pass-through. Edit below to add your custom logic.
Examples:
SELECT * REPLACE(<your_struct> AS session_info) FROM ...
SELECT * EXCEPT(col), <expr> AS col FROM ...

MATERIALIZATION MODES (set in CUSTOM_LINEAGE config):
"view" - zero cost, scalar transforms only (REPLACE, EXCEPT, JOIN).
Predicates from downstream tables push through automatically.
"incremental" - materializes data with own date_checkpoint.
Required for window functions (ROW_NUMBER, LAST_VALUE, etc.)
to avoid full table scans.
*/
WITH source_ga4_sessions AS (
SELECT *
FROM ${ref({"database": dataform.projectConfig.defaultProject, "schema": dataform.projectConfig.vars.TRANSFORMATIONS_DATASET, "name": "int_ga4_sessions"})}
-- noqa: disable=all
${isIncrementalTable ? "WHERE session_date BETWEEN date_checkpoint AND date_end" : ""}
-- noqa: enable=all
)

SELECT *
FROM source_ga4_sessions
ga4_sessions_custom.sqlx

Create at definitions/custom/modules/ga4/ga4_sessions_custom.sqlx:

config {
type: require("includes/core/modules/ga4/helpers").helpers.getModuleConfig('ga4').CUSTOM_LINEAGE.ga4_sessions_custom === "incremental" ? "incremental" : "view",
schema: dataform.projectConfig.vars.OUTPUTS_DATASET,
tags:["module_ga4", "sessions"],
description: "Custom lineage: intercept ga4_sessions before it flows into downstream tables. Edit to add/modify columns.",
...(require("includes/core/modules/ga4/helpers").helpers.getModuleConfig('ga4').CUSTOM_LINEAGE.ga4_sessions_custom === "incremental" ? {
onSchemaChange: "EXTEND",
bigquery: {
partitionBy: "session_date",
clusterBy: [...((require("includes/core/modules/ga4/helpers").helpers.getModuleConfig("ga4").CLUSTER_BY || {}).ga4_sessions || []).slice(0, 3), "session_id"],
labels: require("includes/core/helpers.js").helpers.storageLabels()
}
} : {}),
...require("includes/core/helpers.js").helpers.isModuleEnabled('ga4')
}

js {
const { helpers } = require("includes/core/modules/ga4/helpers");
const config = helpers.getModuleConfig('ga4');
const isIncrementalTable = config.CUSTOM_LINEAGE.ga4_sessions_custom === "incremental";
}

pre_operations {
${isIncrementalTable ? helpers.generatePreOperationsSQL("session_date", when, self, incremental) : ""}
}

/*
Default: pass-through. Edit below to add your custom logic.
Examples:
SELECT * REPLACE(<your_struct> AS last_non_direct_traffic_source) FROM ...
SELECT * EXCEPT(col), <expr> AS col FROM ...

MATERIALIZATION MODES (set in CUSTOM_LINEAGE config):
"view" - zero cost, scalar transforms only (REPLACE, EXCEPT, JOIN).
Predicates from downstream tables push through automatically.
"incremental" - materializes data with own date_checkpoint.
Required for window functions (ROW_NUMBER, LAST_VALUE, etc.)
to avoid full table scans.
*/
WITH source_ga4_sessions AS (
SELECT *
FROM ${ref({"database": dataform.projectConfig.defaultProject, "schema": dataform.projectConfig.vars.OUTPUTS_DATASET, "name": "ga4_sessions"})}
-- noqa: disable=all
${isIncrementalTable ? "WHERE session_date BETWEEN date_checkpoint AND date_end" : ""}
-- noqa: enable=all
)

SELECT *
FROM source_ga4_sessions

Materialization modes​

View mode ("view")​

Creates a BigQuery view. Costs nothing to store and nothing to run on its own - the query only executes when a downstream table reads from it. BigQuery pushes predicates (like date filters) through the view automatically, so downstream incremental processing still works efficiently.

Safe for:

  • SELECT * REPLACE(...) - override specific columns
  • SELECT * EXCEPT(col), expr AS col - drop and re-add columns
  • LEFT JOIN to enrich with external data
  • CASE/IF expressions, scalar functions

Not safe for:

  • Window functions (ROW_NUMBER, LAST_VALUE, LEAD, LAG) - these block predicate pushdown, causing full table scans on every downstream run

Incremental mode ("incremental")​

Creates a materialized incremental table with its own date_checkpoint. Processes only new partitions on each run. Uses the same partitioning and clustering as the parent table.

Use when:

  • You need window functions
  • Your transformation prevents BigQuery from pushing date predicates through
  • You want to pre-compute expensive logic once rather than on every downstream read

Trade-off: Duplicates data storage and adds an extra processing step per run.

Example​

The template files are pass-throughs by default. To add your custom logic, edit the final SELECT statement. You don't need to touch the config, js, or pre_operations blocks.

Here's an example using ga4_events_custom as a view that adds a new column and enriches events with data from an external table:

Config:

CUSTOM_LINEAGE: {
ga4_events_custom: "view",
int_ga4_sessions_custom: false,
ga4_sessions_custom: false
},

ga4_events_custom.sqlx - only edit the SQL after the pre_operations block:

/* ... keep config, js, pre_operations as-is ... */

WITH source_ga4_events AS (
SELECT *
FROM ${ref({"database": dataform.projectConfig.defaultProject, "schema": dataform.projectConfig.vars.OUTPUTS_DATASET, "name": "ga4_events"})}
-- noqa: disable=all
${isIncrementalTable ? "WHERE event_date BETWEEN date_checkpoint AND date_end" : ""}
-- noqa: enable=all
)

SELECT
source.*,
lookup.customer_segment
FROM source_ga4_events AS source
LEFT JOIN ${ref("customer_lookup")} AS lookup
ON source.user_id = lookup.user_id

Common patterns for the final SELECT:

  • Add columns: SELECT *, <expr> AS new_col FROM source_ga4_events
  • Replace columns: SELECT * REPLACE(<expr> AS existing_col) FROM source_ga4_events
  • Drop and re-add: SELECT * EXCEPT(col), <expr> AS col FROM source_ga4_events
  • Join external data: SELECT source.*, lookup.field FROM source_ga4_events AS source LEFT JOIN ...

If your transformation uses window functions (ROW_NUMBER, LAST_VALUE, LEAD, LAG), switch the config value from "view" to "incremental" to avoid full table scans on every downstream run.

How to choose​

  1. Do I need this at all? If your transforms can be done via config (custom params, session totals, channel groupings), prefer config. Custom lineage is for logic that config can't express.

  2. View or incremental? Start with "view". Switch to "incremental" only if you use window functions or notice performance degradation from blocked predicate pushdown.

  3. Which interception point? Pick the one closest to where the data you need lives:

    • Event-level transforms: ga4_events_custom
    • Session intermediate transforms: int_ga4_sessions_custom
    • Final session-level transforms: ga4_sessions_custom

Limitations​

  • Custom lineage files live in definitions/custom/modules/ga4/, so they are not overwritten during installer updates. Your customizations are safe.
  • When using "incremental" mode, the custom table has its own checkpoint. If you change the SQL logic, you may need to rebuild the custom table to backfill.
  • The pass-through template (SELECT *) still creates a view/table in BigQuery even if you don't edit it. If you enable a lineage point, make sure you actually add your logic.