Skip to main content

User Stitching

đź’ŽPremium Feature

This feature is exclusive for Premium users

What it does​

The user stitching functionality will look at the cookie of the user (user_pseudo_id) and “fill in gaps” of certain column - most notably the user_id. It does this on the session level.

Concrete:

if a visitor (user_pseudo_id) has a session without a user_id…

  • we will look back in time to find if that visitor in the past has had a session with a user_id, and assign the user_id to that session, and we will use the last seen user_id to fill in the blanks
  • if there is no session in the past with a user_id, we will look in the future to find one. This effectively rewrites the history and de-anonimises sessions leading up to a login (so be careful)

The stitching is not restricted to user_id only: if you have (custom) parameters, user_properties, url_params, you can stitch them via the same procedure.

For example, you can stitch columns like url_params_custom.lead_id , event_params_custom.email_hash, user_properties.store_location, etc.

The results are stored in the column stitched_ids in the ga4_sessions table, as a struct

screenshot

Configuration​

Default config has 1 object with 4 keys:


USER_STITCHING: {
enabled: true,
dimensions: [],
lookBackDays: 90,
lookAheadDays: 10
},

enabled can be

  • true: this will stitch user_id (the built in column), and optional other dimensions
  • false: this will skip user stitching logic

dimensions - this is an array of objects, the object must have 2 keys

  • name: the column name as it appears in ga4_events
  • renameTo: the column name as it will appear in the stitched_ids struct in ga4_sessions

Note: If no renameTo field is used, the full (sanitized) name is used. So event_params_custom.lead_id will become event_params_custom_lead_id

lookBackDays is the number of days to search for a previous value.

lookAheadDays is the number of days to “rewrite the past” with a possible future value.

Example config:


USER_STITCHING: {
enabled: true,
dimensions: [
{ name: "user_properties.datatrics_segment", renameTo: "datatrics_segment" },
{ name: "event_params.showroom_name" }
],
lookBackDays: 90,
lookAheadDays: 3
},

How it works in the SQL​

int_ga4_sessions will store the first and last values of the dimension in a struct stitched_ids

These will be used in ga4_sessions to fetch the values. Of course: only when the stitching is enabled.

In ga4_sessions

  • lookBackDays of days will be read from the intermediate table
  • like the “last non direct” session source, the last found value of the dimension will be filled in and saved
  • backstitching will be applied afterwards using the lookAheadDays time window

Not that this order matters: leading NULL values will be filled in after trailing NULL values.

Let’s say you configure

  • 1 day of lookback
  • 1 day of lookahead

In the below diagram, “day 4” will be filled in with A - because this is processed in the first round.

screenshot

Had the configuration be

  • 0 days of lookback
  • 1 day of lookahead

The result would be to skip the first round, and backstich in round 2, like this:

screenshot

Limitations and caveats​

Check if you lose too much information​

The stitching is done on session level, not on event level. This means that only 1 stitched value per session is stored.

If you have a lot of cases where one browser logs in to many user accounts multiple times per session, the stitching might lose some of its value, since more values get dropped.

A query to check if your user identifier is “session-stitchable” (or: less than 5% of sessions have multiple identifiers)

with freq as (
SELECT
session_id,
count(distinct event_params_custom.email_hash) as n_user_ids -- replace this with user_id or other identifier
FROM
`superform_outputs.ga4_events`
WHERE
event_date > "2025-09-01" -- choose date range
and session_id is not null
and user_pseudo_id is not null
group by session_id
),

agg as (
select
count(*) as sessions,
countif(n_user_ids = 0) as no_login,
countif(n_user_ids = 1) as one_login,
countif(n_user_ids > 1) as multiple_logins,
from freq
)

select
sessions as sessions,
"Of the " || round(100 * (1 - no_login/sessions),1) || "% of sessions where a user_id is found, " ||
round(100 * safe_divide(multiple_logins , one_login),1) || "% have multiple user_ids." as text,
if(round(100 * safe_divide(multiple_logins , one_login),1) < 5, "yes", "reconsider") as should_I_stitch

from agg

lookAheadDays rewrites history​

The main difference between lookback and lookahead is the following:

  • lookback: I see an anonymous session today with a login 10 days ago: let’s assign a user_id to the session of today
  • lookahead: the anonymous user where I stored an anonymous session 10 days ago, has logged in session today, so let’s change the user_id field for the session from 10 days ago

In the lookahead case, this means that a query you execute in week 22 to get data from “week 21”, can have a different output if you run it in week 23.

Respect user privacy​

As a final note, be mindul of user choices and privacy when stitching data. If you use stitched data to enrich customer profiles with information gathered while they were logged out, it may not be what they expect. Or even what you promised in your own privacy policy.