Bad event with content-type text/plain; charset=utf-8

We’re using Snowplow JavaScript Tracker v2.10.2 to send events to our snowplow instance. We seem to be getting errors processing some events.

The error is: “Content type of text/plain; charset=utf-8 provided, expected one of: application/json, application/json; charset=utf-8, application/json; charset=UTF-8”.

Based on my readings in github issues, it seems that text/plain should be supported.

Is this a know issue? Is there anything I can do to fix it? I apologize in advance if I haven’t provided enough information. I’m not familiar with snowplow internals as we’re using a hosted version.

Any help would be appreciated. Thanks!

@wookasz, how do you track the events that end up in bad bucket/index? Are they legitimate Snowplow events? What are they? Could you, provide a sample of the bad event with that error? Where do you process the events in (Stream Enrich, Spark Enrich, etc)?

@ihor thanks for the quick reply. The events are sent using the javascript tracker. We’re on version 2.10.2 in our app. The events use a custom schema. The same event type actually comes in as application/json most of the time so I’m not sure why some small subset appears as text/plain.

When I look at the events the data all seems valid. The Content-Type header seems to be what’s causing the issue.

Here’s a sample event. I obfuscated some UIDs, IP addresses, hostnames, parameters, and URLs for security. Also, base64 decoded where necessary.

d
XXX.XXX.XXX.XXX
lnAUTF-8ssc-0.15.0-kinesis,sMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.366https://www.site.com/c/sale/@#/com.snowplowanalytics.snowplow/tp2T{"schema":"iglu:com.snowplowanalytics.snowplow/payload_data/jsonschema/1-0-4","data":[{"e":"ue","ue_px":"{"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.gladly.sidekick/event_widget_loaded/jsonschema/1-0-0","data":{}}}","tv":"js-2.10.2","tna":"cf","aid":"sidekick","p":"web","tz":"America/New_York","lang":"en-US","cs":"UTF-8","f_pdf":"1","f_qt":"0","f_realp":"0","f_wma":"0","f_dir":"0","f_fla":"0","f_java":"0","f_gears":"0","f_ag":"0","res":"1536x864","cd":"24","cookie":"1","eid":"8983fcda-2b21-468f-80d7-05d102933fbf","dtm":"1565217818372","cx":"{"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0","data":[{"schema":"iglu:com.gladly.sidekick/context_widget/jsonschema/2-0-0","data":{"screen":"home","visibility":"hidden","isInOfficeHours":true,"isThrottled":true,"someOtherField":"thisisaguid","isCustomerAuthenticated":null}},{"schema":"iglu:com.gladly/context_gladly/jsonschema/1-0-0","data":{"orgId":"thisisaguid","stage":"production","site":"somesite.example.com"}},{"schema":"iglu:com.gladly.sidekick/context_ab_test/jsonschema/1-0-1","data":{"onboardingType":"interactive"}}]}","vp":"1536x731","ds":"1519x43406","vid":"7","sid":"sid","duid":"duid","fp":"3405116619","refr":"https://www.site.com/","url":"https://www.site.com/c/sale","stm":"1565217818380"}]}^*Host: host.hosty.com
Accept: */*"Accept-Encoding: gzip, deflate, br Accept-Language: en-US, en;q=0.9Origin: https://www.site.com%Referer: https://www.site.com/c/sale/Sec-Fetch-Mode: corsSec-Fetch-Site: cross-siteUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36X-Forwarded-For: 1.1.1.1X-Forwarded-Port: 443X-Forwarded-Proto: httpsConnection: keep-aliveTimeout-Access: <function1>text/plain; charset=utf-8htext/plain; charset=utf-8$host.hosty.com
$thisisaguid
ziAiglu:com.snowplowanalytics.snowplow/CollectorPayload/thrift/1-0-0

I’ll have to reach out to the company hosting our snowplow instance to find out about processing. I’ll update here when they respond.

@wookasz, I’m confused as you are

Where do you see this error? I can’t imagine the tracker treating (building) the same (presumably self-describing) events differently.

@ihor the error is showing up as the error message in our snowplow_bad bad events table in snowflake.

To me, it seems that it’s not necessarily the event that is being built differently but rather the request headers sent with the request? That doesn’t make the matter any more clear though :smiley:

Hi @wookasz,

It seems like you have something in front of your collector, modifying headers (like load balancer). JavaScript tracker sends data with Content-Type: application/json; charset=UTF-8 header. Please verify, if you forward Content type header to collector or set it literally to text.

I have AWS based implementation, with load balancer behind cloudfront distribution. CF is configured to pass all headers to load balancer (that passes them to collector instances).

Although i wasn’t able to find particular line in source code, I did couple experiments with my pipeline and it requires correct Content-Type header.

Hope this helps,
Cheers

Just thinking - this are all the events or particular ones?

If all - than something modifies your requests. If couple/negligible maybe automated tests/pentesting/some crawlers or robots?

Thanks @grzegorzewald!

@wookasz - the hosted version of Snowplow you are using sounds somewhat broken / non-standard. What version is it - so we can verify this isn’t a Snowplow open-source issue.

@grzegorzewald thanks. It’s just one event so far, but it’s the most common event (sent when a widget is loaded on a page) we have and we’ve only seen very few of these failures (a very small fraction of total). It does appear to be a valid (non-bot) event since the event data looks valid.

@alex I’ll find out and get back to you.

The hosted instance uses the stream enricher and the clojure collector.

@wookasz, this doesn’t sound right. I wouldn’t expect this combination - Clojure collector + Stream Enrich. If you use Stream Enrich (which you do as can be seen from the event) the expected collector is Scala Stream collector. How would you send the events collected by Clojure collector (meant for batch) to real-time processing? If this really the case then I would indeed expect a mediator in between as mentioned by @alex.

@ihor I’ll inquire more into the internals of how this is set up.

@alex based on information from the vendor, the collector is 0.15.0 and the enricher is 0.21.0.

Hey @wookasz - thanks for sharing. Really, this forum is for open-source users of Snowplow; if you are working with a commercial hosting provider of a non-standard Snowplow, well, hopefully they have a support ticketing system :slight_smile:

If the hosting vendor is no help and Snowplow is an important part of your operational analytics, two suggestions:

  1. Switch to running open-source Snowplow yourself. As you’ve seen in this thread, the open-source community is pretty helpful!
  2. Switch to Snowplow Insights, where we run Snowplow for you, https://snowplowanalytics.com/pricing/

I won’t lock this thread - feel free to share back what you find out from your hosting vendor!

2 Likes

@alex thanks, and of course. I was hoping this may be a known issue, but seems like it is not :smiley: I’ll share what I find in case anyone else runs into it.

@alex just to follow up, it turns out that it’s actually both the scala stream enricher and scala collector. It does appear that the enricher indeed does not support text/plain (https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-common-enrich/src/test/scala/com.snowplowanalytics.snowplow.enrich.common/adapters/registry/SnowplowAdapterSpec.scala#L162)

Still leaves the question about why events are being generated with this header, but oh well, I’m sure we can figure that out :smiley:

Thanks for all the help.