High availability questions

Hi team, I have some quick questions.

  1. Is “**duid” in snowplow is a concept of “universally unique ID/visitor ID”?
  2. Can we use our own website visitor ID in Snowplow? We have a visitor ID on our website as well. By this, I mean that Snowplow is providing the flexibility to bring our own visitor ID ?
  3. sid” is for Session Id, whatever events are to be done by the visitor , that will be bound to the current sessionId which is running for that ?
  4. Can we terminate or close this session ID by some kind of method so that I can set this method on my LogOut button?
  5. How can we get the auto-trigger if the session is abandoned by the visitor, which means the user has been inactive for the last 30 minutes? Is there any native method that needs to be overridden for this so that I can get the real-time trigger in my backend that the visitor has abandoned the session?

Thanks

Why don’t you just set the user_id value to your website Id and allow the tracker to do it’s thing? You can specify session length in the initialisation settings, and even set user_id from a cookie.

If you start messing around with IDs and attempting to override the tracker functionality you’ll just back yourself into a corner. Plus the models, like the Snowplow dbt web model work best on “out if the box”, yes you can edit them but that’s a lot of work plus you fork and must maintain it yourself.

@pkr2 ,

Is “**duid” in snowplow is a concept of “universally unique ID/visitor ID”?

It is the user ID set by the tracker. While it is typically stored in the first party cookie, it is set by Javascript. This means that browsers like Safari can alter its life expectancy to as low as 1 day. Alternatively, there is nuid (network_userid) set by the collector as opposed to Javascript. If the collector is on the same domain as your web application (and preferably in the same IP network) nuid will also be stored in the 1st party cookie and thus would be considered the most reliable (not a subject to browser vendor tracking prevention mechanism).

Can we use our own website visitor ID in Snowplow? We have a visitor ID on our website as well. By this, I mean that Snowplow is providing the flexibility to bring our own visitor ID ?

By all means. There is a dedicated field for that, uid (user_id). If you have a designated user ID set by your application, do use it - see https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/javascript-trackers/javascript-tracker/javascript-tracker-v3/tracker-setup/additional-options/#setuserid.

sid” is for Session Id, whatever events are to be done by the visitor , that will be bound to the current sessionId which is running for that ?

Yes. Actually both duid (domain_userid) and sid (domain_sessionid) are stored in the same cookie as per https://docs.snowplow.io/docs/collecting-data/collecting-from-own-applications/javascript-trackers/browser-tracker/cookies-and-local-storage/. By default, the session last for 30 mins but could br changed to whatever duration is more suitable for your application.

Can we terminate or close this session ID by some kind of method so that I can set this method on my LogOut button?

Yes, you can. Depending on the version of the tracker, it looks like this

snowplow('clearUserData', { preserveSession: false, preserveUser: true});

See Need to generate new session_id if user_id changes for more details.

How can we get the auto-trigger if the session is abandoned by the visitor, which means the user has been inactive for the last 30 minutes? Is there any native method that needs to be overridden for this so that I can get the real-time trigger in my backend that the visitor has abandoned the session?

If user is inactive for longer than the session duration, the session will be reset automatically. I’m not aware of any native tracker means to detect the fact the session has expired. However, you should be able to do it client-side yourself as you have access to the session cookie with getDomainUserInfo.

@ihor Thanks for giving me your precious time to reply
As for the last answer can you please provide me more detail, How can i get the auto trigger notification that this session is expired in my python backend code?
How can i make use of domainuserInfo to get a real time notifier that session is expired
This will stuck only in one case
“When user open the screen from a long and can’t performed any event Session auto logged out / session auto deleted.”
other cases like force window close aur click on logout
will call this one
snowplow(‘clearUserData’, { preserveSession: false, preserveUser: true});
this one works perfect , this will close the session of the current user
and on their success callback of “clearUserData” method i will call my backend that user terminate this session.
So for the safer side, I want a real time notifier as well that this sessionId is expired in the snowplow for this duid (Visitor Id).

Here below is my code configuration ( i am using snowplow in Web Platform)

 <script>
 (function (p, l, o, w, i, n, g) {
        if (!p[i]) {
          p.GlobalSnowplowNamespace = p.GlobalSnowplowNamespace || [];
          p.GlobalSnowplowNamespace.push(i);
          p[i] = function () {
            (p[i].q = p[i].q || []).push(arguments);
          };
          p[i].q = p[i].q || [];
          n = l.createElement(o);
          g = l.getElementsByTagName(o)[0];
          n.async = 1;
          n.src = w;
          g.parentNode.insertBefore(n, g);
        }
      })(
        window,
        document,
        "script",
        "https://55d89c18-05af-482b-b9ce-ce4bc59a9bb9.app.try-snowplow.com/v3/try.js",
        "snowplow"
      );

      snowplow("newTracker", "try", "http://127.0.0.1:9090", {
        appId: "try-snowplow", // you can specify your own app name here
        platform: "web",
        stateStorageStrategy: "localStorage",
        contexts: {
          webPage: true, // this sets a unique id for each page view
          performanceTiming: true, // this captures performance metrics like load times
        },
      });

      // send first heartbeat after 10 seconds, and every 10 seconds thereafter
      snowplow("enableActivityTracking", {
        minimumVisitLength: 10,
        heartbeatDelay: 10,
      });
      snowplow("enableLinkClickTracking");
      snowplow("setOptOutCookie", "optoutcookie");
      // snowplow("trackPageView");
      
</script>
<body>
     <script>

             $(document).ready(function () {

        $("#btnSnowPlow").click(function () {
          window.snowplow("trackSelfDescribingEvent", {
            event: {
              schema: "iglu:com.example/my-schema/jsonschema/1-0-0",
              data: {
                name: "John",
                job_role: "CEO",
                promo_code: "3306330877",
              },
            },
          });
          console.log("Snowplow success");
        });

        $("#btnlogoutsnowplow").click(function () {
          window.snowplow("clearUserData", {
            preserveSession: false,
            preserveUser: true,
          });
          console.log("Snowplow logged out");
        });
      });
    </script>
</body>

@pkr2 , to start with, I would like to point out that your tracking code sample instructs to store the session data in Local Storage as opposed to cookie (stateStorageStrategy: "localStorage"). I’m not sure if the function getDomainUserInfo can read data from local storage. Assuming it has access (whether cookie or Local Storage), then the data itself has the following format (as per Cookies & Local Storage | Snowplow Documentation) at the minimum:

{domainUserId}.{createdTime}.{visitCount}.{nowTime}.{lastVisitTime}.{sessionId}.{previousSessionId}.{firstEventId}.{firstEventTsInMs}.{eventIndex}

For better understanding, here’s an example of the real cookie value

c2c93b78-8a62-41f3-bd28-a8a976fca7f9.1661886633.29.1682008057.1681412562.eb3dfcf5-45e3-4748-8a4d-f4f3363fe143.c9fda6ae-4001-4254-b3f6-bde6b90d8c75.88f772c8-16e9-48d3-bc77-4e1ce67e0172.1682007626015.23

When using getDomainUserInfo you will get an array like below

[0]: c2c93b78-8a62-41f3-bd28-a8a976fca7f9
[1]: 1661886633
[2]: 29
[3]: 1682008057
[4]: 1681412562
[5]: eb3dfcf5-45e3-4748-8a4d-f4f3363fe143
[6]: c9fda6ae-4001-4254-b3f6-bde6b90d8c75
[7]: 88f772c8-16e9-48d3-bc77-4e1ce67e0172
[8]: 1682007626015
[9]: 23

which corresponds to the following mapping

  1. The domain user ID
  2. The timestamp at which the cookie was created
  3. The number of times the user has visited the site
  4. The timestamp for the current visit
  5. The timestamp of the last visit
  6. The session id
  7. ID of the previous session (since version 3.5)
  8. ID of the first event in the current session (since version 3.5)
  9. Device created timestamp of the first event in the current session (since version 3.5)
  10. Index of the last event in the session (used to inspect order of events) (since version 3.5)

The element with index 3 is the timestamp of the current visit. If you have enableActivityTracking added to the tracking code (which you have in your sample code) you will get “ping” events with the interval set by that function, which will be updating the timestamp of the current visit as long as the user is actively engaged with the page. Thus, if you detect that the timestamp is larger than this expected interval + session timeout value, you can assume the session has ended and notify your backend. This should cover the scenario when the page is still open but user is not interacting with it. For more context, the ping events would typically be used at data modeling stage to determine the session duration.

Hopefully, this gives you more ideas and you can figure out how to cover all use cases.

@ihor
I got it, while defining page ping we are getting timestamp for the current visit and timestamp of the last visit through which we notify our backend that session is abandoned. below is my code for the same

snowplow("newTracker", "try", "http://127.0.0.1:9090", {
        appId: "try-snowplow", // you can specify your own app name here
        platform: "web",
        // stateStorageStrategy: "localStorage",
        contexts: {
          webPage: true, // this sets a unique id for each page view
          performanceTiming: true, // this captures performance metrics like load times
        },
      });

      function getPagePingContext() {
        console.log("page ping domain info going....")
        return {
          schema: "iglu:com.example/page_ping_context/jsonschema/1-0-0",
          data: {
            domainUserInfo: getDomainUserInfo(),
          },
        };
      }
      snowplow('setCustomContextGenerator', getPagePingContext);

In this setCustomContextGenerator I tried to send getDomainUserInfo to page ping event
but im getting error that setCustomContextGenerator is not available in this version
I m using docker snowplow micro
docker run -p 9090:9090 snowplow/snowplow-micro:1.6.0

I tried with latest version of snowplowmicro :
docker run -p 9090:9090 snowplow/snowplow-micro:latest
But still facing the same issue

I also tried with the below Docker image

FROM snowplow/snowplow-micro:1.6.0
USER root
# Create the snowplow user with UID 1000 and GID 1000
RUN groupadd -g 1000 snowplow \
  && useradd -u 1000 -g 1000 -s /bin/bash -m snowplow

# Install Node.js and npm
RUN apt-get update && apt-get install -y nodejs npm

USER snowplow
WORKDIR /usr/src/app

COPY index.js /usr/src/app
USER root
RUN npm install snowplow-tracker

@pkr2 , what is setCustomContextGenerator? I couldn’t locate this function in our tracker. The error is related to the tracker, not Snowplow Micro.

It also not clear to me what you are trying to do here. Did you mean to add custom context (session cookie values) to the event that generates “pings”? What does it have to do with passing the data over to your backend?

Also, bear in mind that as you introduced new custom context page_ping_context, the corresponding JSON schema should also be hosted in your Iglu repository. Without it, the event that would have this context attached to, will be rejected in the pipeline (at Enrich component).

Hi @ihor
I got this “setCustomContextGenerator” from ChatGPT.
the corresponding JSON schema should also be hosted in your Iglu repository
Yes I create the seperate schema for that:

"type": "object",
  "properties": {
    "domainUserInfo": {
      "type": "string"
    }
  },
  "required": [
    "domainUserInfo"
  ]

and I am assuming that we need to send “getDomainUserInfo” to the page ping events. Is it the right understanding? So that’s why I searched for how to send custom context to page ping events.
But it is throwing me this error.
setCustomContextGenerator is not an available function.

I tried to find out the domainInfo in page ping events, but it won’t be there.

[
    {
        "rawEvent": {
            "api": {
                "vendor": "com.snowplowanalytics.snowplow",
                "version": "tp2"
            },
            "parameters": {
                "e": "pp",
                "duid": "502b5559-b365-4709-8d56-c51792f0a6b1",
                "vid": "2",
                "eid": "e7b46ef2-d3e8-47cd-9e71-71a44ffcb909",
                "url": "http://localhost/snowplow/",
                "aid": "try-snowplow",
                "cx": "eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy9jb250ZXh0cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6W3sic2NoZW1hIjoiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3Muc25vd3Bsb3cvd2ViX3BhZ2UvanNvbnNjaGVtYS8xLTAtMCIsImRhdGEiOnsiaWQiOiIxMTNjZTZjZC05NWNlLTQ2ZjItYTVhNS1mODY4NTg4ZjE0MjAifX0seyJzY2hlbWEiOiJpZ2x1Om9yZy53My9QZXJmb3JtYW5jZVRpbWluZy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJuYXZpZ2F0aW9uU3RhcnQiOjE2ODM0ODQzNjQzNjEsInJlZGlyZWN0U3RhcnQiOjE2ODM0ODQzNjQzNzEsInJlZGlyZWN0RW5kIjoxNjgzNDg0MzY0Mzg1LCJmZXRjaFN0YXJ0IjoxNjgzNDg0MzY0Mzg1LCJkb21haW5Mb29rdXBTdGFydCI6MTY4MzQ4NDM2NDM4NSwiZG9tYWluTG9va3VwRW5kIjoxNjgzNDg0MzY0Mzg1LCJjb25uZWN0U3RhcnQiOjE2ODM0ODQzNjQzODUsInNlY3VyZUNvbm5lY3Rpb25TdGFydCI6MCwiY29ubmVjdEVuZCI6MTY4MzQ4NDM2NDM4NSwicmVxdWVzdFN0YXJ0IjoxNjgzNDg0MzY0Mzg4LCJyZXNwb25zZVN0YXJ0IjoxNjgzNDg0MzY0NDk1LCJyZXNwb25zZUVuZCI6MTY4MzQ4NDM2NDQ5NiwidW5sb2FkRXZlbnRTdGFydCI6MCwidW5sb2FkRXZlbnRFbmQiOjAsImRvbUxvYWRpbmciOjE2ODM0ODQzNjQ1MDIsImRvbUludGVyYWN0aXZlIjoxNjgzNDg0MzY0OTIwLCJkb21Db250ZW50TG9hZGVkRXZlbnRTdGFydCI6MTY4MzQ4NDM2NDkyMCwiZG9tQ29udGVudExvYWRlZEV2ZW50RW5kIjoxNjgzNDg0MzY0OTI0LCJkb21Db21wbGV0ZSI6MTY4MzQ4NDM2NjU4NywibG9hZEV2ZW50U3RhcnQiOjE2ODM0ODQzNjY1ODgsImxvYWRFdmVudEVuZCI6MTY4MzQ4NDM2NjU5OX19XX0",
                "tna": "try",
                "cs": "UTF-8",
                "cd": "24",
                "page": "Sitecore CDP - Boxever",
                "stm": "1683484436610",
                "tz": "Asia/Calcutta",
                "tv": "js-3.3.1",
                "vp": "1422x632",
                "ds": "1415x3278",
                "res": "1280x720",
                "cookie": "1",
                "p": "web",
                "dtm": "1683484436602",
                "lang": "en-US",
                "sid": "058aad71-c270-4696-8799-4d78bb04052d"
            },
            "contentType": "application/json",
            "source": {
                "name": "snowplow-micro-1.6.0-stdout$",
                "encoding": "UTF-8",
                "hostname": "127.0.0.1"
            },
            "context": {
                "timestamp": "2023-05-07T18:33:56.294Z",
                "ipAddress": "172.17.0.1",
                "useragent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
                "refererUri": "http://localhost/",
                "headers": [
                    "Timeout-Access: <function1>",
                    "Host: 127.0.0.1:9090",
                    "Connection: keep-alive",
                    "sec-ch-ua: \"Chromium\";v=\"112\", \"Google Chrome\";v=\"112\", \"Not:A-Brand\";v=\"99\"",
                    "sec-ch-ua-platform: \"Windows\"",
                    "sec-ch-ua-mobile: ?0",
                    "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
                    "Accept: */*",
                    "Origin: http://localhost",
                    "Sec-Fetch-Site: cross-site",
                    "Sec-Fetch-Mode: cors",
                    "Sec-Fetch-Dest: empty",
                    "Referer: http://localhost/",
                    "Accept-Encoding: gzip, deflate, br",
                    "Accept-Language: en-US, en;q=0.9",
                    "Cookie: _sp_id.dc78=92abbc19-37fb-460d-aed9-75b5b7c69e3c.1681358903.1.1681361795.1681358903.a7e4f055-747c-4c7a-9f1c-ca5eb51c76a6",
                    "application/json"
                ],
                "userId": "9ef197ec-7033-45d9-bdef-8625a50ab7b0"
            }
        },
        "eventType": "page_ping",
        "schema": "iglu:com.snowplowanalytics.snowplow/page_ping/jsonschema/1-0-0",
        "contexts": [
            "iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0",
            "iglu:org.w3/PerformanceTiming/jsonschema/1-0-0"
        ],
        "event": {
            "app_id": "try-snowplow",
            "platform": "web",
            "etl_tstamp": "2023-05-07T18:33:56.329Z",
            "collector_tstamp": "2023-05-07T18:33:56.294Z",
            "dvce_created_tstamp": "2023-05-07T18:33:56.602Z",
            "event": "page_ping",
            "event_id": "e7b46ef2-d3e8-47cd-9e71-71a44ffcb909",
            "txn_id": null,
            "name_tracker": "try",
            "v_tracker": "js-3.3.1",
            "v_collector": "snowplow-micro-1.6.0-stdout$",
            "v_etl": "snowplow-micro-1.6.0",
            "user_id": null,
            "user_ipaddress": "172.17.0.1",
            "user_fingerprint": null,
            "domain_userid": "502b5559-b365-4709-8d56-c51792f0a6b1",
            "domain_sessionidx": 2,
            "network_userid": "9ef197ec-7033-45d9-bdef-8625a50ab7b0",
            "geo_country": null,
            "geo_region": null,
            "geo_city": null,
            "geo_zipcode": null,
            "geo_latitude": null,
            "geo_longitude": null,
            "geo_region_name": null,
            "ip_isp": null,
            "ip_organization": null,
            "ip_domain": null,
            "ip_netspeed": null,
            "page_url": "http://localhost/snowplow/",
            "page_title": "Sitecore CDP - Boxever",
            "page_referrer": null,
            "page_urlscheme": "http",
            "page_urlhost": "localhost",
            "page_urlport": 80,
            "page_urlpath": "/snowplow/",
            "page_urlquery": null,
            "page_urlfragment": null,
            "refr_urlscheme": null,
            "refr_urlhost": null,
            "refr_urlport": null,
            "refr_urlpath": null,
            "refr_urlquery": null,
            "refr_urlfragment": null,
            "refr_medium": null,
            "refr_source": null,
            "refr_term": null,
            "mkt_medium": null,
            "mkt_source": null,
            "mkt_term": null,
            "mkt_content": null,
            "mkt_campaign": null,
            "contexts": {
                "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0",
                "data": [
                    {
                        "schema": "iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0",
                        "data": {
                            "id": "113ce6cd-95ce-46f2-a5a5-f868588f1420"
                        }
                    },
                    {
                        "schema": "iglu:org.w3/PerformanceTiming/jsonschema/1-0-0",
                        "data": {
                            "navigationStart": 1683484364361,
                            "redirectStart": 1683484364371,
                            "redirectEnd": 1683484364385,
                            "fetchStart": 1683484364385,
                            "domainLookupStart": 1683484364385,
                            "domainLookupEnd": 1683484364385,
                            "connectStart": 1683484364385,
                            "secureConnectionStart": 0,
                            "connectEnd": 1683484364385,
                            "requestStart": 1683484364388,
                            "responseStart": 1683484364495,
                            "responseEnd": 1683484364496,
                            "unloadEventStart": 0,
                            "unloadEventEnd": 0,
                            "domLoading": 1683484364502,
                            "domInteractive": 1683484364920,
                            "domContentLoadedEventStart": 1683484364920,
                            "domContentLoadedEventEnd": 1683484364924,
                            "domComplete": 1683484366587,
                            "loadEventStart": 1683484366588,
                            "loadEventEnd": 1683484366599
                        }
                    }
                ]
            },
            "se_category": null,
            "se_action": null,
            "se_label": null,
            "se_property": null,
            "se_value": null,
            "unstruct_event": null,
            "tr_orderid": null,
            "tr_affiliation": null,
            "tr_total": null,
            "tr_tax": null,
            "tr_shipping": null,
            "tr_city": null,
            "tr_state": null,
            "tr_country": null,
            "ti_orderid": null,
            "ti_sku": null,
            "ti_name": null,
            "ti_category": null,
            "ti_price": null,
            "ti_quantity": null,
            "pp_xoffset_min": null,
            "pp_xoffset_max": null,
            "pp_yoffset_min": null,
            "pp_yoffset_max": null,
            "useragent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
            "br_name": null,
            "br_family": null,
            "br_version": null,
            "br_type": null,
            "br_renderengine": null,
            "br_lang": "en-US",
            "br_features_pdf": null,
            "br_features_flash": null,
            "br_features_java": null,
            "br_features_director": null,
            "br_features_quicktime": null,
            "br_features_realplayer": null,
            "br_features_windowsmedia": null,
            "br_features_gears": null,
            "br_features_silverlight": null,
            "br_cookies": true,
            "br_colordepth": "24",
            "br_viewwidth": 1422,
            "br_viewheight": 632,
            "os_name": null,
            "os_family": null,
            "os_manufacturer": null,
            "os_timezone": "Asia/Calcutta",
            "dvce_type": null,
            "dvce_ismobile": null,
            "dvce_screenwidth": 1280,
            "dvce_screenheight": 720,
            "doc_charset": "UTF-8",
            "doc_width": 1415,
            "doc_height": 3278,
            "tr_currency": null,
            "tr_total_base": null,
            "tr_tax_base": null,
            "tr_shipping_base": null,
            "ti_currency": null,
            "ti_price_base": null,
            "base_currency": null,
            "geo_timezone": null,
            "mkt_clickid": null,
            "mkt_network": null,
            "etl_tags": null,
            "dvce_sent_tstamp": "2023-05-07T18:33:56.610Z",
            "refr_domain_userid": null,
            "refr_dvce_tstamp": null,
            "derived_contexts": {},
            "domain_sessionid": "058aad71-c270-4696-8799-4d78bb04052d",
            "derived_tstamp": "2023-05-07T18:33:56.286Z",
            "event_vendor": "com.snowplowanalytics.snowplow",
            "event_name": "page_ping",
            "event_format": "jsonschema",
            "event_version": "1-0-0",
            "event_fingerprint": null,
            "true_tstamp": null
        }
    }
]

then how can i send the domainUserInfo to page ping events (to run my abandoned session logic) ?

Thanks

@ihor
Sorry my bad, its already there. in the context headers section

Thanks

@pkr2 , you cannot add custom context (entity) to enableActivityTracking event by design (refer to this document).

It sounds that by “backend” you imply Snowplow pipeline (in your case Snowplow Micro)? I assumed you wanted to have it relayed to your web server.

What you see in your last screenshot is not the custom entity. It is the request to the collector made by the tracker. As it is an HTTP request, the cookies are also passed over to the server (here pipeline collector). However, you will not see it in your data - only some of them are captured in the canonical fields such as domain_userid, domain_sessionidx, domain_sessionid.

If you want to have the cookie in your data pipeline, you could have Cookie extractor enrichment enabled in your Enrich component. Note that in your case, the cookie is called _sp_id.dc78. The last 4 characters depend on your web application domain name.

Secondly, on a side note, your custom JSON schema does not look right. As I mentioned, the function getDomainUserInfo returns an array. If I’m not mistaken, it is an array of strings and integers (I’m not sure). To declare it as the value in the JSON schema, it would look like

{
  "type": "array",
  "items": [
    {"type": "string"},
    {"type": "integer"}
  ]
}

Though, you do not need all of them and could relay only those that you need for your session expiration exercise. For example, the the timestamp for the current visit is getDomainUserInfo()[3].