Iglu Validation

Hey,

I’m trying to use the addGlobalContexts. The goal is add our business userId into trackPageEvents + siteSearchEvents. However, when I add global context my my events go into bad events queue and fail schema. Is this error referring to my iglu url being incorrect? All I want is just to pass userId inside payload.

{"schema":"iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/2-0-0","data":{"processor":{"artifact":"snowplow-enrich-kinesis","version":"3.2.3"},"failure":{"timestamp":"2022-09-29T14:52:18.447008Z","messages":[{"schemaKey":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0","error":{"error":"ValidationError","dataReports":[{"message":"$[1].schema: does not match the regex pattern ^iglu:[a-zA-Z0-9-_.]+/[a-zA-Z0-9-_]+/[a-zA-Z0-9-_]+/[0-9]+-[0-9]+-[0-9]+$","path":"$[1].schema","keyword":"pattern","targets":["^iglu:[a-zA-Z0-9-_.]+/[a-zA-Z0-9-_]+/[a-zA-Z0-9-_]+/[0-9]+-[0-9]+-[0-9]+$"]}]}}]},"payload":{"enriched":{"app_id":null,"platform":"web","etl_tstamp":"2022-09-29 14:52:18.444","collector_tstamp":"2022-09-29 14:52:16.061","dvce_created_tstamp":"2022-09-29 14:52:16.010","event":"page_view","event_id":"5d6fc5d5-9838-4661-b550-0c7721594518","txn_id":null,"name_tracker":"sp1","v_tracker":"js-3.6.0","v_collector":"ssc-2.7.0-kinesis","v_etl":"snowplow-enrich-kinesis-3.2.3-common-3.2.3","user_id":null,"user_ipaddress":"172.17.0.1","user_fingerprint":null,"domain_userid":"14c1a08c-4991-42ce-81b8-dba18bb4d848","domain_sessionidx":5,"network_userid":"27c22481-e49a-49c8-939d-34bb99754f05","geo_country":null,"geo_region":null,"geo_city":null,"geo_zipcode":null,"geo_latitude":null,"geo_longitude":null,"geo_region_name":null,"ip_isp":null,"ip_organization":null,"ip_domain":null,"ip_netspeed":null,"page_url":"http://localhost:3000/shop/cutting-tools/hss-co-end-mills-regular-series/5-0-weldon-std-end-mill-8%25-co/p/ZT1150617X","page_title":"(Unbranded) 5.0 WELDON STD END MILL-8% CO E2595050| at Zoro","page_referrer":"http://localhost:3000/shop/cutting-tools/hss-co-end-mills-regular-series/5-0-weldon-std-end-mill-8%25-co/p/ZT1150617X","page_urlscheme":null,"page_urlhost":null,"page_urlport":null,"page_urlpath":null,"page_urlquery":null,"page_urlfragment":null,"refr_urlscheme":null,"refr_urlhost":null,"refr_urlport":null,"refr_urlpath":null,"refr_urlquery":null,"refr_urlfragment":null,"refr_medium":null,"refr_source":null,"refr_term":null,"mkt_medium":null,"mkt_source":null,"mkt_term":null,"mkt_content":null,"mkt_campaign":null,"contexts":"{\"schema\":\"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0\",\"data\":[{\"schema\":\"iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0\",\"data\":{\"id\":\"6e43bb04-fbe5-4c75-94d2-ae809d690652\"}},{\"schema\":\"iglucentral.com/schemas/com.mparticle.snowplow/session_context/jsonschema/1-0-0\",\"data\":{\"id\":\"test-id\"}}]}","se_category":null,"se_action":null,"se_label":null,"se_property":null,"se_value":null,"unstruct_event":null,"tr_orderid":null,"tr_affiliation":null,"tr_total":null,"tr_tax":null,"tr_shipping":null,"tr_city":null,"tr_state":null,"tr_country":null,"ti_orderid":null,"ti_sku":null,"ti_name":null,"ti_category":null,"ti_price":null,"ti_quantity":null,"pp_xoffset_min":null,"pp_xoffset_max":null,"pp_yoffset_min":null,"pp_yoffset_max":null,"useragent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","br_name":null,"br_family":null,"br_version":null,"br_type":null,"br_renderengine":null,"br_lang":"en-GB","br_features_pdf":null,"br_features_flash":null,"br_features_java":null,"br_features_director":null,"br_features_quicktime":null,"br_features_realplayer":null,"br_features_windowsmedia":null,"br_features_gears":null,"br_features_silverlight":null,"br_cookies":1,"br_colordepth":"24","br_viewwidth":1918,"br_viewheight":890,"os_name":null,"os_family":null,"os_manufacturer":null,"os_timezone":null,"dvce_type":null,"dvce_ismobile":null,"dvce_screenwidth":1920,"dvce_screenheight":1080,"doc_charset":"UTF-8","doc_width":1918,"doc_height":917,"tr_currency":null,"tr_total_base":null,"tr_tax_base":null,"tr_shipping_base":null,"ti_currency":null,"ti_price_base":null,"base_currency":null,"geo_timezone":null,"mkt_clickid":null,"mkt_network":null,"etl_tags":null,"dvce_sent_tstamp":"2022-09-29 14:52:16.012","refr_domain_userid":null,"refr_dvce_tstamp":null,"derived_contexts":null,"domain_sessionid":"499bd285-8792-4a49-894c-a2b2e03a5aef","derived_tstamp":null,"event_vendor":null,"event_name":null,"event_format":null,"event_version":null,"event_fingerprint":null,"true_tstamp":null},"raw":{"vendor":"com.snowplowanalytics.snowplow","version":"tp2","parameters":[{"name":"e","value":"pv"},{"name":"duid","value":"14c1a08c-4991-42ce-81b8-dba18bb4d848"},{"name":"vid","value":"5"},{"name":"eid","value":"5d6fc5d5-9838-4661-b550-0c7721594518"},{"name":"url","value":"http://localhost:3000/shop/cutting-tools/hss-co-end-mills-regular-series/5-0-weldon-std-end-mill-8%25-co/p/ZT1150617X"},{"name":"refr","value":"http://localhost:3000/shop/cutting-tools/hss-co-end-mills-regular-series/5-0-weldon-std-end-mill-8%25-co/p/ZT1150617X"},{"name":"cx","value":"eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5zbm93cGxvdy9jb250ZXh0cy9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6W3sic2NoZW1hIjoiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3Muc25vd3Bsb3cvd2ViX3BhZ2UvanNvbnNjaGVtYS8xLTAtMCIsImRhdGEiOnsiaWQiOiI2ZTQzYmIwNC1mYmU1LTRjNzUtOTRkMi1hZTgwOWQ2OTA2NTIifX0seyJzY2hlbWEiOiJpZ2x1Y2VudHJhbC5jb20vc2NoZW1hcy9jb20ubXBhcnRpY2xlLnNub3dwbG93L3Nlc3Npb25fY29udGV4dC9qc29uc2NoZW1hLzEtMC0wIiwiZGF0YSI6eyJpZCI6InRlc3QtaWQifX1dfQ"},{"name":"tna","value":"sp1"},{"name":"cs","value":"UTF-8"},{"name":"cd","value":"24"},{"name":"page","value":"(Unbranded) 5.0 WELDON STD END MILL-8% CO E2595050| at Zoro"},{"name":"stm","value":"1664463136012"},{"name":"tv","value":"js-3.6.0"},{"name":"vp","value":"1918x890"},{"name":"ds","value":"1918x917"},{"name":"res","value":"1920x1080"},{"name":"cookie","value":"1"},{"name":"p","value":"web"},{"name":"dtm","value":"1664463136010"},{"name":"lang","value":"en-GB"},{"name":"sid","value":"499bd285-8792-4a49-894c-a2b2e03a5aef"}],"contentType":"application/json","loaderName":"ssc-2.7.0-kinesis","encoding":"UTF-8","hostname":"localhost","timestamp":"2022-09-29T14:52:16.061Z","ipAddress":"172.17.0.1","useragent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","refererUri":"http://localhost:3000/","headers":["Timeout-Access: <function1>","Host: localhost:8080","Connection: keep-alive","sec-ch-ua: \"Google Chrome\";v=\"105\", \"Not)A;Brand\";v=\"8\", \"Chromium\";v=\"105\"","sec-ch-ua-mobile: ?0","User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","sec-ch-ua-platform: \"macOS\"","Accept: */*","Origin: http://localhost:3000","Sec-Fetch-Site: same-site","Sec-Fetch-Mode: cors","Sec-Fetch-Dest: empty","Referer: http://localhost:3000/","Accept-Encoding: gzip, deflate, br","Accept-Language: en-GB, en-US;q=0.9, en;q=0.8","Cookie: _ga=GA1.1.1231600890.1644595024; _uetvid=42ab58608b5311ecad393519e84363d4; OptanonAlertBoxClosed=2022-06-27T10:18:00.902Z; scarab.visitor=%2225BDCF605FF1A2C7%22; ajs_user_id=b954451e-b34e-4855-b79f-60e052f6aedd; ajs_anonymous_id=bcca0ffe-b613-453f-b043-0511d07f7616; scarab.profile=%22ZT1012656X%7C1663666389%22; jwtTokenLongSession=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjdXN0b21lcklkIjoiVDIzOGIxYzZjLTQ5MzItNDQ0Mi04ZTIyLTg2ZTg0NjliODdlMyIsInVzZXJDb2RlIjoiUzEyMjEzMCIsImVtYWlsIjoiY3JvbXdlbGwuYXV0b21hdGlvbitwbGF0aW51bUBnbWFpbC5jb20iLCJuYW1lIjoiS2FtcmFuIEtoYW4iLCJpc1JlZ2lzdGVyZWQiOnRydWUsImhhc0FjdGl2ZVRyYWRlQWNjb3VudCI6dHJ1ZSwicm9sZXMiOlsiRXZlcnlvbmUiXSwic3ViIjoidGVzdFN1YiIsImlhdCI6MTY2MzY2NjY3MiwiZXhwIjoxNjcxNDQyNjcyfQ.oRSkWacSzXsPDJormU5PpAYl-XvK8NEdUcjf2EZtFec; _gcl_au=1.1.1243491189.1663675896; sp=27c22481-e49a-49c8-939d-34bb99754f05; _gid=GA1.1.1143039725.1664449324; _uetsid=65a428c03fe611ed8376b7409656c702; _clck=1os5rx3|1|f5a|0; _sp_ses.1fff=*; _clsk=a6v1qp|1664463133530|5|1|j.clarity.ms/collect; _ga_TVHS9YQME1=GS1.1.1664457308.19.1.1664463135.0.0.0; OptanonConsent=geolocation=GB%3BENG&datestamp=Thu+Sep+29+2022+15%3A52%3A15+GMT%2B0100+(British+Summer+Time)&version=6.31.0&isIABGlobal=false&hosts=&consentId=3567a2e4-b015-4b34-82bc-ec307793b7c8&interactionCount=1&landingPath=NotLandingPage&groups=C0001%3A1%2CC0002%3A1%2CC0003%3A1%2CC0004%3A1&AwaitingReconsent=false; _sp_id.1fff=14c1a08c-4991-42ce-81b8-dba18bb4d848.1663146598.5.1664463136.1664451144.499bd285-8792-4a49-894c-a2b2e03a5aef.5d22215c-8f51-4f90-8f3e-10774801c92f.83e0d456-d266-42de-8bbc-3caf171f952e.1664456284841.752","application/json"],"userId":"27c22481-e49a-49c8-939d-34bb99754f05"}}}}
 const contextEntity = {
    schema: 'iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1,
    data: { id: 'test-id' }
  };

addGlobalContexts([contextEntity], [snowplowAppId]);

iglucentral.com/schemas/com.mparticle.snowplow/session_context/jsonschema/1-0-0 is an invalid uri. It needs to be of the format iglu:vendor/name/format/version.

iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1 is valid, but points to the metaschema for contexts - it doesn’t contain an id field. You wouldn’t ever need to refer to this schema in tracking - it’s used under the hood in the pipeline.

If you want to track a custom context, you need to create your own schema first - you can find a guide to schemas in our documentation.

You would need to create a schema, upload it to your own Iglu server, and point the tracking to that schema.

However it seems like your use case is only to set a userid, so if you’re not already using it, you can save some effort and just use the setUserId method in the tracker - that will set the user_id atomic field (and it’s intended for exactly this purpose).

Hey thanks - I tried setUserId method and I see it comes through in the pii_transformation event - not in the page view event. The page view event has a key value userId: 1db0b39ade839d8543622d2025efc12df6c7e034 - I can see that’s associated with string you pass to setUserId method.

We have service downstream that will want to link pageViewEvent to a userId inside pii_transformation event. Will the kinesis-enrichment app always sends these two events in same message group? So a Lambda downstream can group them together in a DB we save data?

OK, I understand what’s happening here, bear with me while I walk through the few different things involved.

PII transformation events relate to the PII pseudonymization enrichment. So your user_id value is being anonymised by that enrichment.

Just to preface talking about PII - we can’t advise what your strategy should be and what values should be anonymised - we’re not lawyers and it’s always use case dependent. So obviously whatever you do here should be compliant. I can lay out the various different technical things involved:

Disabling the PII enrichment for this field

The PII enrichment is aimed at cases where a given field can’t be stored in the database unhashed. It seems like in this case you might actually be fine with storing the raw value (this often would be the case where, for example, the value is only tracked after the user has consented to this level of identification, or more broadly, only when collecting the value is compliant).

If that’s the scenario, you can simply change the PII enrichment configuration to not hash this field.

On this theme, the more popular trackers (eg JS) have anonymous tracking features that are helpful for managing that more broadly.

This is usually the simplest approach, if suitable.

Keeping the PII enrichment for this field

If you do need to hash this field, then it’s important to note that the PII transformation event contains the raw value. This event is emitted only if the PII pseudonymization enrichment is configured with emitEvent set to true, and is provided with a stream to receive these events.

The intention is that this is separate and distinct from the enriched stream - if you’ve configured the enriched stream here then the sensitive values will be mixed in with anonymised data (and if you’re using our loaders it’ll all be loaded to the same place) - this is likely not the desired behaviour, as it seems to defeat the purpose of anonymising.

The idea behind these events is that they’re processed into a separate, more locked-down datastore, and that can subsequently be used to reverse the hash or access the original values if required.

Obviously care also needs to be taken to ensure you’re compliant if implementing that kind of use case.

So if you do have some scenario where sometimes the value can be accessed, and sometimes not, then you could set up a separate stream for PII transformation events, store that in a separate, more locked down location, and use that to join hashed values to original values (both of which are in PII transformation events).

Note that since this feature was added, we there has been basically no demand for it - at least not from anyone who’s been vocal to us about their use cases. Everyone has opted for the above option of managing it at collection, and have been happy to irreversibly hash when dealing with PII that does need masking (ie disable emitEvent in the PII enrichment config). So just a heads up that we may be limited in our ability to help iron out the kinks on this approach (but we’ll always do our best).

Misc

Finally, a quick note:

Will the kinesis-enrichment app always sends these two events in same message group? So a Lambda downstream can group them together in a DB we save data?

I think the above answers the use case but in the general case, but in general it might be helpful to know that there’s no guarantees like this about ordering or batching of data. The only guarantee is at-least-once delivery.

Apologies for such an essay, hope this makes sense and is useful!

3 Likes

Ah I see I had user_id pre configured in pii settings! @Colm - great answer, helped me understand a lot more about the process thanks for you help!

1 Like