Unified package - user identifier stitching

Hi there,

I’m totally new to Snowplow. I’ve installed unified package in my DBT project. It’s for an mobile app.

I’m trying set a custom entity contexts_com_theapp_user_1.value:email_hash as a primary user identifier and contexts_com_snowplowanalytics_snowplow_client_session_1[0]:userId::string if email hash is null.

I presumed that I should be updating the following variable?
snowplow__user_identifiers:
snowplow__user_stitching_id
snowplow__session_identifiers:

Not sure if the following vars need to be updated?
snowplow__user_sql
snowplow__session_sql

Much appreciated with any help

Here’s my setting
snowplow__user_identifiers: [
{‘schema’ : ‘contexts_com_theapp_user_1’,‘field’: ‘email_hash’},
{‘schema’: ‘contexts_com_snowplowanalytics_snowplow_client_session_1[0]’, ‘field’: ‘userId’},

contexts_com_snowplowanalytics_snowplow_client_session_1 is an array contains a JSON object and one of the field in the object is userId.

dbt run --selector snowplow_unified is completed successful. How and where shall i check in the data platform?

Thanks


The configs are rendered into SQL showing in the image
The derived tables (sessions, users and views) should have correct stitched user identifier

Hi Phoebe_Chen, welcome!

In this case you may not need snowplow__user_identifiers – it’s intended more for the case where you need a custom “device ID” or session ID for grouping the events together even if the user isn’t authenticated and you want to do it in a way different to how the tracker SDKs do it (e.g. if you have an app that maintains its own session ID and want to use that to count sessions instead of the Snowplow SDK cookies.

For something like a hashed email that’s probably only on certain events, you likely want the stitching feature from snowplow__user_stitching_id instead.
The default for this is user_id (the top level field, not the one in the client_session entity), so if you’re setting your email hash as the user ID already it should just work. If you’re not using that as the user_id, try setting it to something like coalesce(contexts_com_theapp_user_1[0]:email_hash::string, user_id) to prefer the value from your entity instead. The docs have more info.

1 Like

Thanks for your response.

If i have snowplow__user_stitching_id:coalesce(contexts_com_theapp_user_1[0]:email_hash::string, contexts_com_snowplowanalytics_snowplow_client_session_1[0]:userId, user_id) ,will the value in user_id column be replaced by email_hash or userId(device identifier)? Also, i won’t be needing snowplow__user_identifiers if i have snowplow__user_stitching_id configured?

What’s the difference between snowplow__user_identifiers and snowplow__user_stitching_id?

Many thanks

will the value in user_id column be replaced by email_hash or userId(device identifier)?

Almost! The user_id column in the model tables would become whichever of email_hash or user_id (this is different to the userId in client_session, which is confusingly actually the device ID - on web it will be the same value as domain_userid) values is set in the original events; and then the stitched_user_id should be the most recent of those values for any rows with the same user_identifier (which is the domain_userid or client_session.userId by default).

Also, i won’t be needing snowplow__user_identifiers if i have snowplow__user_stitching_id configured? What’s the difference between snowplow__user_identifiers and snowplow__user_stitching_id?

snowplow__user_identifiers controls user_identifier (not user_id) and is more for custom device identifiers; for example you might prefer identifying user devices by network_userid instead of domain_userid because it can be more resilient to ITP in some configurations.

snowplow__user_stitching_id is for deduping users that are the same but have multiple device identifiers; e.g. they have the same account ID in your site, but access via multiple devices/browsers or clear their cookies often. This lets you aggregate the behaviour from all those different device IDs together.

1 Like