KCL InitializeTask exception on Enrich 3.2.2

atordai · July 27, 2022, 9:42am

Hi

I’ve been trying to update our enrichment module from 2.0.5 however I keep getting the following errors:

Our dockerfile to run enrich is the following

FROM snowplow/stream-enrich-kinesis:3.2.2
COPY ./src/config.hocon /snowplow/config.hocon

ARG AWS_DEFAULT_REGION
ARG SP_SCHEMA_URI
ARG STAGE
ARG ENRICH_STREAMS_OUT_BAD="our-enrichment-bad-stream-$STAGE"
ARG ENRICH_STREAMS_IN_RAW="our-collector-good-stream-$STAGE"
ARG ENRICH_STREAMS_OUT_ENRICHED="our-enrichment-good-stream-$STAGE"

ENV STAGE $STAGE

ENV ENRICH_STREAMS_OUT_BAD $ENRICH_STREAMS_OUT_BAD
ENV ENRICH_STREAMS_IN_RAW $ENRICH_STREAMS_IN_RAW
ENV ENRICH_STREAMS_OUT_ENRICHED $ENRICH_STREAMS_OUT_ENRICHED
ENV SP_SCHEMA_URI $SP_SCHEMA_URI
ENV AWS_DEFAULT_REGION $AWS_DEFAULT_REGION
ENV ENRICH_STREAMS_SOURCE_SINK_REGION $AWS_DEFAULT_REGION
ENV ENRICH_STREAMS_APP_NAME our-enrichment

COPY src/ .
USER root
RUN sh modify_resolver.sh

ENTRYPOINT ["/usr/bin/env"]

CMD /home/snowplow/bin/snowplow-stream-enrich-kinesis --config /snowplow/config.hocon \
    --resolver file:resolver.json \
    --enrichments file:./enrichments \

Do we need to recreate the old DynamoDB associated or update them? I tried dropping the dynamodb table and while it got recreated it did not get populated with items, leading into error “LeaseTable is empty”.

Could you please help what am I missing here?

BenB · July 27, 2022, 2:34pm

Hi @atordai ,

Too bad that the error message is not more helpful.

You should not need to do that, thanks to KCL upgrading should be smooth, we’ve done that successfully for all our pipelines.

This is weird. Is your input stream with collector payloads still populated ?

Also, please note that stream-enrich is to be deprecated in favor of enrich-kinesis. We encourage you to roll out the later. The instructions can be found on our docs website.

atordai · July 29, 2022, 8:39am

Hey,
switching to enrich-kinesis indeed went way smoother, the new dynamodb works fine too, however my data is failing to pass through enrich without any error message.

This is my log:

The scheduler goes into sleeping, even though I am sending data and my collector works fine.
[cats-effect-blocker-1] INFO software.amazon.kinesis.coordinator.Scheduler - Sleeping ...

In the bad stream the reason for failure is adapter_failures in all cases with the following:

Can you help with this?

atordai · July 29, 2022, 10:41am

I have a weird new error related to PrefetchRecordsPublisher:

mike · July 31, 2022, 9:30pm

Are you sending data to actuator/env on the collector? This isn’t a valid path so is likely why you are seeing bad rows for what is being sent.

atordai · August 3, 2022, 8:11am

Hey, I figured that the reason why data did not arrive in the loader buckets was because enrich was configured on the “good” stream from collector, however our correct data ends up in the bad stream due to it’s >1MB size.

On the configuration page I did not see any specific property with which I could increase this limit of 1 MB.

Is it possible in any other way or should we compress the data that we send? Now I set the bad stream as input for the enrich and thus data flows though and ends up in our data warehouse, however this does not feel right

mike · August 3, 2022, 9:28am

This limit is non-configurable as it’s a built in protection to prevent any single event from exceeding 1 MB as that’s the limit of what Kinesis allows for a single message - so this is a Kinesis limit rather than a Snowplow one.

This is possible to overcome by moving to a different messaging service (Kafka / PubSub) but 1 MB is really large (assuming ~1 bytes per character that’s ~1 million characters per event). I’m curious as to what your use case as there may be a better solution depending on what you are trying to achieve (it’s possible to to compress events but not ideal as you’ll need something to decompress them again).

Topic		Replies	Views
Running Enricher Error	2	1226	September 14, 2020
Cuaght Exception for InitializeTask on Enrichment and Loader Enrichment	6	1099	February 11, 2021
Stream Enricher Error: Caught Exception	4	935	October 19, 2020
Scala Enrich using DynamoDB insted of files Enrichment	2	1501	February 19, 2017
Trying to set Stream Enrich with docker image - Caught exception when initializing LeaseCoordinator Enrichment	21	4439	October 17, 2021

KCL InitializeTask exception on Enrich 3.2.2

Related topics