User Stitching
This feature is exclusive for Premium users
What it does​
The user stitching functionality will look at the cookie of the user (user_pseudo_id) and “fill in gaps” of certain column - most notably the user_id. It does this on the session level.
Concrete:
if a visitor (user_pseudo_id) has a session without a user_id…
- we will look back in time to find if that visitor in the past has had a session with a
user_id
, and assign theuser_id
to that session, and we will use the last seenuser_id
to fill in the blanks - if there is no session in the past with a
user_id
, we will look in the future to find one. This effectively rewrites the history and de-anonimises sessions leading up to a login (so be careful)
The stitching is not restricted to user_id
only: if you have (custom) parameters, user_properties, url_params, you can stitch them via the same procedure.
For example, you can stitch columns like url_params_custom.lead_id
, event_params_custom.email_hash
, user_properties.store_location
, etc.
The results are stored in the column stitched_ids
in the ga4_sessions
table, as a struct
Configuration​
Default config has 1 object with 4 keys:
USER_STITCHING: {
enabled: true,
dimensions: [],
lookBackDays: 90,
lookAheadDays: 10
},
enabled
can be
- true: this will stitch
user_id
(the built in column), and optional other dimensions - false: this will skip user stitching logic
dimensions
- this is an array of objects, the object must have 2 keys
name
: the column name as it appears inga4_events
renameTo
: the column name as it will appear in thestitched_ids
struct inga4_sessions
Note: If no renameTo
field is used, the full (sanitized) name is used. So event_params_custom.lead_id
will become event_params_custom_lead_id
lookBackDays
is the number of days to search for a previous value.
lookAheadDays
is the number of days to “rewrite the past” with a possible future value.
Example config:
USER_STITCHING: {
enabled: true,
dimensions: [
{ name: "user_properties.datatrics_segment", renameTo: "datatrics_segment" },
{ name: "event_params.showroom_name" }
],
lookBackDays: 90,
lookAheadDays: 3
},
How it works in the SQL​
int_ga4_sessions
will store the first and last values of the dimension in a struct stitched_ids
These will be used in ga4_sessions
to fetch the values. Of course: only when the stitching is enabled.
In ga4_sessions
lookBackDays
of days will be read from the intermediate table- like the “last non direct” session source, the last found value of the dimension will be filled in and saved
- backstitching will be applied afterwards using the
lookAheadDays
time window
Not that this order matters: leading NULL values will be filled in after trailing NULL values.
Let’s say you configure
- 1 day of lookback
- 1 day of lookahead
In the below diagram, “day 4” will be filled in with A
- because this is processed in the first round.
Had the configuration be
- 0 days of lookback
- 1 day of lookahead
The result would be to skip the first round, and backstich in round 2, like this:
Limitations and caveats​
Check if you lose too much information​
The stitching is done on session level, not on event level. This means that only 1 stitched value per session is stored.
If you have a lot of cases where one browser logs in to many user accounts multiple times per session, the stitching might lose some of its value, since more values get dropped.
A query to check if your user identifier is “session-stitchable” (or: less than 5% of sessions have multiple identifiers)
with freq as (
SELECT
session_id,
count(distinct event_params_custom.email_hash) as n_user_ids -- replace this with user_id or other identifier
FROM
`superform_outputs.ga4_events`
WHERE
event_date > "2025-09-01" -- choose date range
and session_id is not null
and user_pseudo_id is not null
group by session_id
),
agg as (
select
count(*) as sessions,
countif(n_user_ids = 0) as no_login,
countif(n_user_ids = 1) as one_login,
countif(n_user_ids > 1) as multiple_logins,
from freq
)
select
sessions as sessions,
"Of the " || round(100 * (1 - no_login/sessions),1) || "% of sessions where a user_id is found, " ||
round(100 * safe_divide(multiple_logins , one_login),1) || "% have multiple user_ids." as text,
if(round(100 * safe_divide(multiple_logins , one_login),1) < 5, "yes", "reconsider") as should_I_stitch
from agg
lookAheadDays rewrites history​
The main difference between lookback and lookahead is the following:
- lookback: I see an anonymous session today with a login 10 days ago: let’s assign a user_id to the session of today
- lookahead: the anonymous user where I stored an anonymous session 10 days ago, has logged in session today, so let’s change the user_id field for the session from 10 days ago
In the lookahead case, this means that a query you execute in week 22 to get data from “week 21”, can have a different output if you run it in week 23.
Respect user privacy​
As a final note, be mindul of user choices and privacy when stitching data. If you use stitched data to enrich customer profiles with information gathered while they were logged out, it may not be what they expect. Or even what you promised in your own privacy policy.