Javascript enrichment configuration

Hi all,
I’m playing around with a javascript enrichment (in a test environment) using beam-enrich but am running into a bit of a terse error message when starting up the enricher.

If I turn off this enrichment in the configuration, everything runs ok, so I’m certain it is just this enrichment causing the error.

Initially I thought maybe there is something weird with my script, so I changed it to be the same as the example script here hoping for at least a different error, but no luck. Also tried the simplest possible script (with just a process function retunring a javascript object string).

The enrichment configuration I have looks like this:

{
“schema”: “iglu:com.snowplowanalytics.snowplow/javascript_script_config/jsonschema/1-0-0”,
“data”: {
“vendor”: “com.snowplowanalytics.snowplow”,
“name”: “javascript_script_config”,
“enabled”: true,
“parameters”: {
“script”: “base64 encoded script here
}
}
}

The error I get when starting the beam enrichment job (via it’s docker container) looks like this:

[main] WARN com.networknt.schema.JsonMetaSchema - Unknown keyword exclusiveMinimum - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
[main] INFO org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory - No stagingLocation provided, falling back to gcpTempLocation
[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
[main] INFO org.apache.beam.runners.dataflow.util.PackageUtil - Uploading 250 files from PipelineOptions.filesToStage to staging location to prepare for execution.
[main] INFO org.apache.beam.runners.dataflow.util.PackageUtil - Staging files complete: 250 files cached, 0 files newly uploaded in 3 seconds
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding raw-from-pubsub/PubsubUnboundedSource as step s1
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding raw-from-pubsub/MapElements/Map as step s2
Exception in thread "main" java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field com.spotify.scio.util.Functions$$anon$7.g of type scala.Function1 in instance of com.spotify.scio.util.Functions$$anon$7
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
        at org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:71)
        at org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:610)
        at org.apache.beam.runners.core.construction.ParDoTranslation.getSchemaInformation(ParDoTranslation.java:314)
        at org.apache.beam.runners.core.construction.ParDoTranslation.getSchemaInformation(ParDoTranslation.java:299)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$8.translateSingleHelper(DataflowPipelineTranslator.java:1003)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$8.translate(DataflowPipelineTranslator.java:995)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$8.translate(DataflowPipelineTranslator.java:992)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:494)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
        at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
        at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:433)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:192)
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:797)
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:188)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
        at com.spotify.scio.ScioContext.execute(ScioContext.scala:598)
        at com.spotify.scio.ScioContext$$anonfun$run$1.apply(ScioContext.scala:586)
        at com.spotify.scio.ScioContext$$anonfun$run$1.apply(ScioContext.scala:574)
        at com.spotify.scio.ScioContext.requireNotClosed(ScioContext.scala:694)
        at com.spotify.scio.ScioContext.run(ScioContext.scala:574)
        at com.snowplowanalytics.snowplow.enrich.beam.Enrich$.main(Enrich.scala:94)
        at com.snowplowanalytics.snowplow.enrich.beam.Enrich.main(Enrich.scala)

Any hints much appreciated.

Hi @rbkn,

This is an issue that has been recently identified with the javascript enrichment for Beam Enrich, this is not related to your configuration.

You can find more details here.

We will investigate this early next week.

1 Like

Hi @rbkn,

We rewrote (a89c9922) JS enrichment using Nashorn instead of Rhino under the hood and it fixes the issue.

We will release it as soon as the migration of all enrich projects (Common Enrich, Stream Enrich and Beam Enrich) to snowplow/enrich will be completed.

2 Likes

@rbkn, just in case - its been published as 1.3.0-rc6 already, but we still didn’t have chance to test it and changes in build configuration are huge.

If you’re okay to take the risk (of not launching the job at all, nothing like losing data) - feel free to try that asset.

Thanks for this!
I’m in early stages of testing configurations and planning a deployment, so will definitely try this out when i have the chance.

Just FYI, @rbkn we recently released 1.3.0-rc16 that is carefully tested and fixed this and many other bugs. It will become a final release early next week, so feel free to use it.

Thanks for the update.
The Binaries beyond rc3 are not yet available on bintray - possibly because Github is having a fit currently. Will check later in the day.

UPDATE: just checked - you’re updating the repo location for these yes?
Is there anywhere I can get my hands on the later versions as binaries to fit the current setup or would i need to manually compile?
Looked here: https://bintray.com/snowplow/snowplow-generic

Hi @rbkn!

Sorry, we’ve migrated the enrich to a new home recently and I just realised we disabled fatjar publishing. We use docker images for long time and I didn’t expect somebody is still using fatjars, but I’ll try to bring it back in the next release.

I didn’t expect somebody is still using fatjars

:slight_smile: TBH it’s just easier unless there is a k8s pod available. For GCP, I had this triggered on a container optimized VM but then you need to manage service accont keys etc (since gcloud/gsutil doesn’t come with these images that support docker and therefore auth is not setup by default) and if just using a normal VM, all you need is to grab the jar (and install jdk8 i guess) and use the configured default credentials without passing around encrypted keys to containers.

Hi @anton, I think the fatjars are no longer published. Is there somewhere a description of how to start beam-enrich containers with google compute engine?

Hi @Alexander_Simonow, the only guide we have is quite platform-agnostic. We recommend to use Docker images as described in there, but if you desperately need fatjars, you can assemble them yourself:

  1. Clone the repo
  2. Add addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0") to project/plugins.sbt
  3. Install SBT
  4. Run sbt assembly