Hi @BenB
we are running below command for enricher
sudo docker run -d -v /snowplow/config:/snowplow/config snowplow-docker-registry.bintray.io/snowplow/stream-enrich-kinesis:0.21.0 --config /snowplow/config/enrich.hocon --resolver file:/snowplow/config/resolver.json --enrichments file:/snowplow/config/enrichments/ --force-cached-files-download
Latest enrich version is 1.3.1. It’s available directly on Docker Hub. We recommend to use this version. Please note that it comes with a new format for the bad rows emitted by enrich. You can read about it on this blog post.
The upgrade guides until this version can be found here.
I tried 1.3.1 in that I am getting CPU utilisation as well as sql enricher failure issue both.
{"schemaKey":"iglu:com.snowplowanalytics.snowplow.enrichments/sql_query_enrichment_config/jsonschema/1-0-0","identifier":"sql-query"},"message":{"error":"The placeholder map error. The map: Some(IntMap()), where count is: 1"}}]}
Thanks for the SqlEnrichment Fix… 1.3.2 is working fine for Sql Enrichments now.
But CPU utilisation issue is still there. I tried 1.3.2 by removing Sql enrichment as well but there is no improvement in CPU utilisation, it keeps on getting over 100%.
Regarding the CPU going over 100%, it’s hard to guess what can be wrong. We managed to troubleshoot the same issue in one of our pipelines using profiling. To do that, you need to add -Dcom.sun.management.jmxremote.port=5555 -Dcom.sun.management.jmxremote.rmi.port=5555 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.rmi.server.hostname= to the JAVA options when running enrich (5555 being the port that you want to use) and then you can inspect the JVM that runs enrich with a tool like visualvm for instance.
In our case we discovered that enrich was constantly doing garbage collection, and we found out that this was due to a memory leak (fixed by 48e4ce8be913).
Using profiling you’ll be able to determine how the CPU is used and to find the culprit.
I tested the fix and sql enrichment is working as expected. Only concern I have here is that the enricher is lagging behind the collector means number of records pushed by the collector are more as compare to the number of records processed by the enricher. Due this there is huge amount of lag get introduced in pipeline.