KCL InitializeTask exception on Enrich 3.2.2

Hi

I’ve been trying to update our enrichment module from 2.0.5 however I keep getting the following errors:

Our dockerfile to run enrich is the following

FROM snowplow/stream-enrich-kinesis:3.2.2
COPY ./src/config.hocon /snowplow/config.hocon

ARG AWS_DEFAULT_REGION
ARG SP_SCHEMA_URI
ARG STAGE
ARG ENRICH_STREAMS_OUT_BAD="our-enrichment-bad-stream-$STAGE"
ARG ENRICH_STREAMS_IN_RAW="our-collector-good-stream-$STAGE"
ARG ENRICH_STREAMS_OUT_ENRICHED="our-enrichment-good-stream-$STAGE"

ENV STAGE $STAGE

ENV ENRICH_STREAMS_OUT_BAD $ENRICH_STREAMS_OUT_BAD
ENV ENRICH_STREAMS_IN_RAW $ENRICH_STREAMS_IN_RAW
ENV ENRICH_STREAMS_OUT_ENRICHED $ENRICH_STREAMS_OUT_ENRICHED
ENV SP_SCHEMA_URI $SP_SCHEMA_URI
ENV AWS_DEFAULT_REGION $AWS_DEFAULT_REGION
ENV ENRICH_STREAMS_SOURCE_SINK_REGION $AWS_DEFAULT_REGION
ENV ENRICH_STREAMS_APP_NAME our-enrichment

COPY src/ .
USER root
RUN sh modify_resolver.sh

ENTRYPOINT ["/usr/bin/env"]

CMD /home/snowplow/bin/snowplow-stream-enrich-kinesis --config /snowplow/config.hocon \
    --resolver file:resolver.json \
    --enrichments file:./enrichments \

Do we need to recreate the old DynamoDB associated or update them? I tried dropping the dynamodb table and while it got recreated it did not get populated with items, leading into error “LeaseTable is empty”.

Could you please help what am I missing here?

Hi @atordai ,

Too bad that the error message is not more helpful.

You should not need to do that, thanks to KCL upgrading should be smooth, we’ve done that successfully for all our pipelines.

This is weird. Is your input stream with collector payloads still populated ?

Also, please note that stream-enrich is to be deprecated in favor of enrich-kinesis. We encourage you to roll out the later. The instructions can be found on our docs website.

1 Like

Hey,
switching to enrich-kinesis indeed went way smoother, the new dynamodb works fine too, however my data is failing to pass through enrich without any error message.

This is my log:

The scheduler goes into sleeping, even though I am sending data and my collector works fine.
[cats-effect-blocker-1] INFO software.amazon.kinesis.coordinator.Scheduler - Sleeping ...

In the bad stream the reason for failure is adapter_failures in all cases with the following:

Can you help with this?

I have a weird new error related to PrefetchRecordsPublisher:

Are you sending data to actuator/env on the collector? This isn’t a valid path so is likely why you are seeing bad rows for what is being sent.

1 Like

Hey, I figured that the reason why data did not arrive in the loader buckets was because enrich was configured on the “good” stream from collector, however our correct data ends up in the bad stream due to it’s >1MB size.

On the configuration page I did not see any specific property with which I could increase this limit of 1 MB.

Is it possible in any other way or should we compress the data that we send? Now I set the bad stream as input for the enrich and thus data flows though and ends up in our data warehouse, however this does not feel right

This limit is non-configurable as it’s a built in protection to prevent any single event from exceeding 1 MB as that’s the limit of what Kinesis allows for a single message - so this is a Kinesis limit rather than a Snowplow one.

This is possible to overcome by moving to a different messaging service (Kafka / PubSub) but 1 MB is really large (assuming ~1 bytes per character that’s ~1 million characters per event). I’m curious as to what your use case as there may be a better solution depending on what you are trying to achieve (it’s possible to to compress events but not ideal as you’ll need something to decompress them again).