I’ve launched a new environment for our Snowplow pipeline and the Kinesis S3 Sink doesn’t appear to be correctly reading from the Kinesis stream.
The Scala Stream Collector is logging that it is publishing events to the stream, and I see in the AWS console monitoring that the stream is indeed receiving records.
The S3 sink runs, but never outputs any information and I never see files in S3. When the app starts it outputs the KinesisConnectorConfiguration which shows the correct input stream name, S3 bucket, etc.
That configuration output is the last thing it shows.
I do see some monitoring events (heartbeat) coming into the collector from the Kinesis S3 Sink app.
I’ve previously setup the pipeline with what I believe is the exact same configuration (with just different stream/bucket names) and it’s working properly, but this environment isn’t.
I tried deleting the DynamoDB table created by the sink and restarting the sink. That didn’t seem to help.
Any idea what could cause this?
Oh just kidding, I think deleting the DynamoDB table did solve the problem. Still not sure what was really going on though; if anybody knows I’d love to understand!
Did you re-use the same application name (and thus the same DynamoDB table) between your different environments? This plays havoc with consumers like Kinesis S3.
@alex The application names were different between environments.
However I did create this environment with a given name, then tear down and rebuild the infrastructure with the same name (including deleting/recreating the Kinesis streams).
I understand the DynamoDB table gets created when the Kinesis S3 Sink is started the first time. Do you think that I recreated the streams but didn’t manually delete the DynamoDB table before starting the sink in the rebuilt environment could cause similar havoc?
Thanks for your reply!
Yep, I suspect that your rebuild left the DynamoDB table referencing invalid Kinesis sequence numbers, so the Kinesis S3 app could not operate.
This makes sense! Thanks again!