[RESOLVED] Issues upgrading to r118

Hello there!

I am really excited about the latest release with the new bad row format!
Good job on that!

Today I was looking into how to upgrade from r117 to r118.
Before deploying in our AWS environment I always try to run everything locally using the NSQ stack.

Basically I am using the example from the snowplow-docker, that I update on a regular basis.

I am running into an issue with the referer parser enrichment.

Following the upgrade manual, I got the following error when my stream-enrich container starts:

stream-enrich_1           | An error occured: Scheme s3 for file s3://snowplow-hosted-assets/third-party/referer-parser/referer-tests.json not supported
example_stream-enrich_1 exited with code

By looking at the code, my first guess is that it happens because scala.Source does not support that scheme.

To work around that I tried to mount a local volume to my container and use a file URI such as:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/2-0-0",
  "data": {
    "vendor": "com.snowplowanalytics.snowplow",
    "name": "referer_parser",
    "enabled": true,
    "parameters": {
      "database": "referer-parser.json",
      "internalDomains": [
        "www.subdomain1.snowplowanalytics.com"
      ],
      "uri": "file:///snowplow/bin"
    }
  }
}

But it is also failing with the same error except on the file scheme.

After some more testing, I managed to get something running thanks to the tests, by using the following configuration:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/2-0-0",
  "data": {
    "vendor": "com.snowplowanalytics.snowplow",
    "name": "referer_parser",
    "enabled": true,
    "parameters": {
      "database": "referer-tests.json",
      "internalDomains": [
        "www.subdomain1.snowplowanalytics.com"
      ],
      "uri": "https://s3-eu-west-1.amazonaws.com/snowplow-hosted-assets/third-party/referer-parser/"
    }
  }
}

So apparently HTTP(s) schemes are the only ones considered valid.

I also noticed that this new version of the enrichment configuration points to a referer-tests.json database which is significantly smaller that the referers-latest.yml.

Is there any plan to deploy the full referer-latest.yml as json in this S3 bucket?

Both of these issues are blockers for my upgrade as they seems to be regressions compared to r117.

Thanks in advance for your support and keep up the good work!

1 Like

Hi @AcidFlow,

Thanks for getting back to us.

When using stream-enrich with NSQ, you don’t get this dependency, which should explain why you can’t read from s3://. Using https instead as you did points to the same file and solves the issue.

You’re right that referer-tests.json is not up-to-date with referer-latest.yml, it’s on our list to automatically update it via CI so that they always have the same content. It should be done early next week, we’ll keep you updated.

Meanwhile if you have any other remarks or questions regarding R118, please do not hesitate to contact us.

Ben
Data Engineer @ Snowplow Analytics

1 Like

Hi @BenB,

Thanks for your quick response.

I missed the dependency part, and the fact the error was coming from the Enrich::download() method (and its overridden versions) , sorry about that.

Thanks for keeping me updated on the referrers database synchronization

I’ll upgrade to R118 today but except from this “issue” everything was fine :slight_smile:

Hello!

I tried to upgrade this morning to r118 this morning.

I am running the real-time pipeline in AWS using Fargate and docker images from Docker Hub.
The stack I’m using is the Kinesis one.

The stream-enrich-kinesis upgrade went fine, however when the scala-stream-collector-kinesis container started, I got the following error on startup:

Exception in thread "main" com.amazonaws.SdkClientException: Unable to marshall request to JSON: Jackson jackson-core/jackson-dataformat-cbor incompatible library version detected.
You have two possible resolutions:
		1) Ensure the com.fasterxml.jackson.core:jackson-core & com.fasterxml.jackson.dataformat:jackson-dataformat-cbor libraries on your classpath have the same version number
		2) Disable CBOR wire-protocol by passing the -Dcom.amazonaws.sdk.disableCbor property or setting the AWS_CBOR_DISABLE environment variable (warning this may affect performance)
	at com.amazonaws.services.kinesis.model.transform.DescribeStreamRequestProtocolMarshaller.marshall(DescribeStreamRequestProtocolMarshaller.java:59)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.executeDescribeStream(AmazonKinesisClient.java:861)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:846)
	at com.amazonaws.services.kinesis.AmazonKinesisClient.describeStream(AmazonKinesisClient.java:887)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.streamExists(KinesisSink.scala:125)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$2(KinesisSink.scala:52)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.createAndInitialize(KinesisSink.scala:50)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.$anonfun$main$2(KinesisCollector.scala:38)
	at scala.util.Either.flatMap(Either.scala:341)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.main(KinesisCollector.scala:30)
	at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector.main(KinesisCollector.scala)
Caused by: java.lang.RuntimeException: Jackson jackson-core/jackson-dataformat-cbor incompatible library version detected.
You have two possible resolutions:
		1) Ensure the com.fasterxml.jackson.core:jackson-core & com.fasterxml.jackson.dataformat:jackson-dataformat-cbor libraries on your classpath have the same version number
		2) Disable CBOR wire-protocol by passing the -Dcom.amazonaws.sdk.disableCbor property or setting the AWS_CBOR_DISABLE environment variable (warning this may affect performance)
	at com.amazonaws.protocol.json.SdkCborGenerator.getBytes(SdkCborGenerator.java:69)
	at com.amazonaws.protocol.json.internal.JsonProtocolMarshaller.finishMarshalling(JsonProtocolMarshaller.java:190)
	at com.amazonaws.protocol.json.internal.NullAsEmptyBodyProtocolRequestMarshaller.finishMarshalling(NullAsEmptyBodyProtocolRequestMarshaller.java:53)
	at com.amazonaws.services.kinesis.model.transform.DescribeStreamRequestProtocolMarshaller.marshall(DescribeStreamRequestProtocolMarshaller.java:57)
	... 11 more
Caused by: java.lang.NoSuchMethodError: com.fasterxml.jackson.dataformat.cbor.CBORGenerator.getOutputContext()Lcom/fasterxml/jackson/core/json/JsonWriteContext;
	at com.fasterxml.jackson.dataformat.cbor.CBORGenerator.close(CBORGenerator.java:903)
	at com.amazonaws.protocol.json.SdkJsonGenerator.close(SdkJsonGenerator.java:267)
	at com.amazonaws.protocol.json.SdkJsonGenerator.getBytes(SdkJsonGenerator.java:282)
	at com.amazonaws.protocol.json.SdkCborGenerator.getBytes(SdkCborGenerator.java:67)
	... 14 more

For now I did set up the environment variable AWS_CBOR_DISABLE to fix the issue.

Is this part of your normal deployment, or is it something that should be fixed in the collector dependencies?

Thanks in advance!

Hi @AcidFlow,

Thanks a lot for your feedback!

Indeed the environment variable AWS_CBOR_DISABLE or the JAVA option -Dcom.amazonaws.sdk.disableCbor now needs to be set when running the Docker image of the collector.

We missed this in the upgrade guide, sorry about that, we’re adding it now.

Exciting to hear that you’re running R118 ! Please keep up updated with any other issues or questions that you might have.

Ben
Data Engineer @ Snowplow Analytics

Hi @BenB,

No problem, I hope this will be useful for anyone upgrading to the latest release!

Everything is running now, and I love the new bad row format :slight_smile:

May I just ask why CBOR must be disabled starting from this release, and what impact it could have / has on performance?

Thanks!

1 Like

Hi @AcidFlow,

Great to hear that you love the new format!

By default AWS SDK uses CBOR to serialize the JSONs used to communicate with Kinesis. Under the hood, this is achieved with jackson library. Following the bump of many libraries’ version in R118, it seems that the versions of jackson-core and jackson-dataformat-cbor on the classpath of the collector are different, creating an incompatibility that makes it impossible to use CBOR (so JSONs are transmitted as is).

These JSONs are used only to communicate with Kinesis about the metadata, not for the actual serialization of the events that are being sent (they are serialized with Thrift). We didn’t notice impact on the performance, but if you do, please let us know. Nevertheless, we created an issue to fix the dependencies.

Ben
Data Engineer @ Snowplow Analytics

Hi @BenB

I see, thanks for the detailed explanation! :slight_smile:

Hi @AcidFlow,

We have now completed the setup of our CI to automatically push the updates of the referers database, after converting it to JSON.

We updated the upgrade guide. Please note that the filename is now referers-latest.json.

Ben
Data Engineer @ Snowplow Analytics

Hey @BenB!

Thanks for the update. I forgot to reply earlier but I updated my referer-parser enrichment configuration and everything went fine!

1 Like