Stream collector

Exploring snowplow for enterprise use case(Beginner)

Collector Version: 1.0.1
Proxy Server: Nginx
Sink: Kafka
Error: Seems to me, header is broken. Not able to figure it out.

Kafka message:

d183.82.109.224
�tHb�
�UTF-8
�ssc-1.0.1-kafka
,iMozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36
6jhttps://findmynewgame.myshopify.com/collections/all?_sp=a4060cf2-efb0-4cb3-9054-9a3673046fa1.1597844924071
@#/com.snowplowanalytics.snowplow/tp2
Th{“schema”:“iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4”,“data”:[{“e”:“pv”,“url”:“https://findmynewgame.myshopify.com/collections/all?_sp=a4060cf2-efb0-4cb3-9054-9a3673046fa1.1597844924071",“page”:"Products – findmynewgame”,“refr”:“https://findmynewgame.myshopify.com/collections/ps4-summer-collection?_sp=19ea6c6b-7922-4fcd-be73-ba75fbfc8da6.1597844738848",“tv”:“js-2.10.2”,“tna”:“cf”,“aid”:“staff”,“p”:“web”,“tz”:“Asia/Kolkata”,“lang”:“en-GB”,“cs”:“UTF-8”,“f_pdf”:“1”,“f_qt”:“0”,“f_realp”:“0”,“f_wma”:“0”,“f_dir”:“0”,“f_fla”:“0”,“f_java”:“0”,“f_gears”:“0”,“f_ag”:“0”,“res”:“1366x768”,“cd”:“24”,“cookie”:“1”,“eid”:“3192ace8-864d-4e94-b07c-c505406d2567”,“dtm”:“1597850018279”,“co”:"{“schema”:“iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0”,“data”:[{“schema”:“iglu:com.snowplowanalytics.snowplow/web_page/jsonschema/1-0-0”,“data”:{“id”:“7d673c93-fe57-4a96-a6bb-0de813ce58c4”}}]}”,“vp”:“1366x278”,“ds”:“1351x1758”,“vid”:“1”,“sid”:“ca0fbd77-ff87-4741-bf05-84fbfb8b2e1e”,“duid”:“5f3979b0-7d49-4082-a3f2-bbe0c7eba49d”,“fp”:“57401181”,“uid”:“28064579655”,“stm”:“1597850018281”}]}^
imeout-Access: X-Forwarded-For: 183.82.109.224’Host: collector.snowplow.gameopedia.comX-Forwarded_Proto: httpsuUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36
Accept: /+Origin: https://findmynewgame.myshopify.com�Sec-Fetch-Site: cross-siteSec-Fetch-Mode: corsSec-Fetch-Dest: emptysReferer: https://findmynewgame.myshopify.com/collections/all?_sp=a4060cf2-efb0-4cb3-9054-9a3673046fa1.1597844924071"Accept-Encoding: gzip, deflate, br-Accept-Language: en-GB, en-US;q=0.9, en;q=0.8fCookie: userEvent=a129db0d-ab50-47e3-ae5b-6be5127b1359; userEvent=a129db0d-ab50-47e3-ae5b-6be5127b1359application/json
happlication/json
�!collector.snowplow.gameopedia.com
�$a129db0d-ab50-47e3-ae5b-6be5127b1359
ziAiglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0

Running over: AWS
Thanks for any kind of help in advanced. Thanks

The raw Snowplow events are Thrift encoded so this looks mostly correct to me. In order to get enriched events out you’ll need to make sure you also have the enrichment process setup as well to read from your raw Kafka topic and push into an enriched topic.

Thanks @mike for reaching out :slight_smile:

Have a enricher in place which is working fine initially, once i enabled the http_header_extractor_config.json enrichment then restart the enricher get the following log

Aug 20 06:16:49 ip-172-31-33-8 java[2542]: [main] WARN com.networknt.schema.JsonMetaSchema - Unknown keyword exclusiveMinimum - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
Aug 20 06:16:49 ip-172-31-33-8 java[2542]: An error occured: NonEmptyList({“error”:“ValidationError”,“dataReports”:[{“message”:"[12].schema: is missing but it is required","path":"[12]",“keyword”:“required”,“targets”:[“schema”]},{“message”:"[12].data: is missing but it is required","path":"[12]",“keyword”:“required”,“targets”:[“data”]},{“message”:"[12].email: is not defined in the schema and the schema does not allow additional properties","path":"[12]",“keyword”:“additionalProperties”,“targets”:[“email”]},{“message”:"[12].unknown: is not defined in the schema and the schema does not allow additional properties","path":"[12]",“keyword”:“additionalProperties”,“targets”:[“unknown”]},{“message”:"[12].social: is not defined in the schema and the schema does not allow additional properties","path":"[12]",“keyword”:“additionalProperties”,“targets”:[“social”]},{“message”:"[12].paid: is not defined in the schema and the schema does not allow additional properties","path":"[12]",“keyword”:“additionalProperties”,“targets”:[“paid”]},{“message”:"[12].search: is not defined in the schema and the schema does not allow additional properties","path":"[12]",“keyword”:“additionalProperties”,“targets”:[“search”]}]})

After getting that error i have undo the changes but still getting the same error.

Any idea ?

the error seems to be indicating of missing schemas, are you sure you have all correct schemas your pushing the collector in your iglu repo?

Thanks @evaldas for reaching out.

As of now i am running with default setup as per the github documentation. Snowplow setup steps

  1. Loading JS tracker from d1fc8wv8zag5ca.cloudfront.net/ 2.10.2 /sp.js
  2. Build the collector and enricher from source.
  3. Nginx reverse proxy for collector
  4. Kafka

Collector, enricher and kafka all are on different machine.

Am i missed something in the process ? Please enlighten me. Thanks !

well when it comes to schema resolution, the default repo config is defined here - https://github.com/snowplow/enrich/blob/master/config/iglu_resolver.json - if your using standard events then they should pass schema validation with default iglu setup, but if your using custom schemas then you would need add your own iglu repo and change the config to point to it, for example:

          {
            "name": "Custom Schemas",
            "priority": 0,
            "vendorPrefixes": [ "com.company" ],
            "connection": {
              "http": {
                "uri": "http://{iglu-host}/api"
              }
            }

I don’t have as of now any custom schema just using the default one.

$vi iglu_resolver.json

{
“schema”: “iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-1”,
“data”: {
“cacheSize”: 500,
“repositories”: [
{
“name”: “Iglu Central”,
“priority”: 0,
“vendorPrefixes”: [ “com.snowplowanalytics” ],
“connection”: {
“http”: {
“uri”: “http://iglucentral.com
}
}
},
{
“name”: “Iglu Central - GCP Mirror”,
“priority”: 1,
“vendorPrefixes”: [ “com.snowplowanalytics” ],
“connection”: {
“http”: {
“uri”: “http://mirror01.iglucentral.com
}
}
}
]
}
}

Hi @mike @evaldas
I am able to figured it the the problem and find a solution as well. Thanks a lot guys for all support and help.