Javascript enrichment configuration

rbkn · June 11, 2020, 5:15pm

Hi all,
I’m playing around with a javascript enrichment (in a test environment) using beam-enrich but am running into a bit of a terse error message when starting up the enricher.

If I turn off this enrichment in the configuration, everything runs ok, so I’m certain it is just this enrichment causing the error.

Initially I thought maybe there is something weird with my script, so I changed it to be the same as the example script here hoping for at least a different error, but no luck. Also tried the simplest possible script (with just a process function retunring a javascript object string).

The enrichment configuration I have looks like this:

{
“schema”: “iglu:com.snowplowanalytics.snowplow/javascript_script_config/jsonschema/1-0-0”,
“data”: {
“vendor”: “com.snowplowanalytics.snowplow”,
“name”: “javascript_script_config”,
“enabled”: true,
“parameters”: {
“script”: “base64 encoded script here”
}
}
}

The error I get when starting the beam enrichment job (via it’s docker container) looks like this:

[main] WARN com.networknt.schema.JsonMetaSchema - Unknown keyword exclusiveMinimum - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
[main] INFO org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory - No stagingLocation provided, falling back to gcpTempLocation
[main] INFO org.apache.beam.runners.dataflow.DataflowRunner - Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
[main] INFO org.apache.beam.runners.dataflow.util.PackageUtil - Uploading 250 files from PipelineOptions.filesToStage to staging location to prepare for execution.
[main] INFO org.apache.beam.runners.dataflow.util.PackageUtil - Staging files complete: 250 files cached, 0 files newly uploaded in 3 seconds
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding raw-from-pubsub/PubsubUnboundedSource as step s1
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineTranslator - Adding raw-from-pubsub/MapElements/Map as step s2
Exception in thread "main" java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field com.spotify.scio.util.Functions$$anon$7.g of type scala.Function1 in instance of com.spotify.scio.util.Functions$$anon$7
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2291)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
        at org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:71)
        at org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:610)
        at org.apache.beam.runners.core.construction.ParDoTranslation.getSchemaInformation(ParDoTranslation.java:314)
        at org.apache.beam.runners.core.construction.ParDoTranslation.getSchemaInformation(ParDoTranslation.java:299)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$8.translateSingleHelper(DataflowPipelineTranslator.java:1003)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$8.translate(DataflowPipelineTranslator.java:995)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$8.translate(DataflowPipelineTranslator.java:992)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:494)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
        at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
        at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
        at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:433)
        at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:192)
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:797)
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:188)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)
        at com.spotify.scio.ScioContext.execute(ScioContext.scala:598)
        at com.spotify.scio.ScioContext$$anonfun$run$1.apply(ScioContext.scala:586)
        at com.spotify.scio.ScioContext$$anonfun$run$1.apply(ScioContext.scala:574)
        at com.spotify.scio.ScioContext.requireNotClosed(ScioContext.scala:694)
        at com.spotify.scio.ScioContext.run(ScioContext.scala:574)
        at com.snowplowanalytics.snowplow.enrich.beam.Enrich$.main(Enrich.scala:94)
        at com.snowplowanalytics.snowplow.enrich.beam.Enrich.main(Enrich.scala)

Any hints much appreciated.

BenB · June 11, 2020, 8:01pm

Hi @rbkn,

This is an issue that has been recently identified with the javascript enrichment for Beam Enrich, this is not related to your configuration.

You can find more details here.

We will investigate this early next week.

BenB · June 30, 2020, 7:48am

Hi @rbkn,

We rewrote (a89c9922) JS enrichment using Nashorn instead of Rhino under the hood and it fixes the issue.

We will release it as soon as the migration of all enrich projects (Common Enrich, Stream Enrich and Beam Enrich) to snowplow/enrich will be completed.

anton · June 30, 2020, 10:23am

@rbkn, just in case - its been published as 1.3.0-rc6 already, but we still didn’t have chance to test it and changes in build configuration are huge.

If you’re okay to take the risk (of not launching the job at all, nothing like losing data) - feel free to try that asset.

rbkn · June 30, 2020, 11:09am

Thanks for this!
I’m in early stages of testing configurations and planning a deployment, so will definitely try this out when i have the chance.

anton · July 12, 2020, 2:17pm

Just FYI, @rbkn we recently released 1.3.0-rc16 that is carefully tested and fixed this and many other bugs. It will become a final release early next week, so feel free to use it.

rbkn · July 13, 2020, 7:29am

Thanks for the update.
The Binaries beyond rc3 are not yet available on bintray - possibly because Github is having a fit currently. Will check later in the day.

UPDATE: just checked - you’re updating the repo location for these yes?
Is there anywhere I can get my hands on the later versions as binaries to fit the current setup or would i need to manually compile?
Looked here: https://bintray.com/snowplow/snowplow-generic

anton · July 14, 2020, 4:25pm

Hi @rbkn!

Sorry, we’ve migrated the enrich to a new home recently and I just realised we disabled fatjar publishing. We use docker images for long time and I didn’t expect somebody is still using fatjars, but I’ll try to bring it back in the next release.

rbkn · July 14, 2020, 5:07pm

I didn’t expect somebody is still using fatjars

TBH it’s just easier unless there is a k8s pod available. For GCP, I had this triggered on a container optimized VM but then you need to manage service accont keys etc (since gcloud/gsutil doesn’t come with these images that support docker and therefore auth is not setup by default) and if just using a normal VM, all you need is to grab the jar (and install jdk8 i guess) and use the configured default credentials without passing around encrypted keys to containers.

Alexander_Simonow · December 6, 2020, 8:17pm

Hi @anton, I think the fatjars are no longer published. Is there somewhere a description of how to start beam-enrich containers with google compute engine?

anton · December 7, 2020, 1:25pm

Hi @Alexander_Simonow, the only guide we have is quite platform-agnostic. We recommend to use Docker images as described in there, but if you desperately need fatjars, you can assemble them yourself:

Clone the repo
Add addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0") to project/plugins.sbt
Install SBT
Run sbt assembly

Topic		Replies	Views
Pubsub Enricher failed to add enriched values into the events (Javascript enrichment) Enrichment	6	1080	November 18, 2021
Unable to run javascript enrichment in PUBSUB Enrich Enrichment	6	1038	October 1, 2021
Stream enrichment start issue Enrichment	5	1285	May 12, 2021
Beam Enrich failing in GCP Dataflow with java.lang.NullPointerException Enrichment	12	1331	July 7, 2020
Dataflow create by Beam Enrich got error Enrichment	2	2037	June 27, 2019

Javascript enrichment configuration

Related topics