Collector Server failing health check

Hi, i had snowplow pipeline running on AWS fine until i destroyed and re deployed the terraform modules today, there were AMI updates from snowplow. Since then collector has been failing health checks, everything is setup from Nat gateway to route tables to redshift access to rs loader. but none of the servers (collector, enrich, rs loader etc) getting access to internet, there are no changes in vpc whatsoever
any help ?? this is my collector server app version
Screenshot 2024-01-15 at 6.54.28 PM

are there any breaking changes in collector update ??

Hi @Jahanzaib_Younis, it’s a known issue, we have a PR in review to resolve it. TL;DR the default ami was updated by amazon, and it broke compatibility with the version of docker we were using.

We will release a fix very soon. In the meantime, you can work around it by editing this file in your local copy of the repo, to add the following line to collector_kinesis, and enrich_kinesis modules, and the same for any ec2-based modules in any target you’re using:

Edit - this is the one for eu-central-1, I initially forgot that the AMI ID is different for each region. I have descibed how to get the ami ID for your region in this comment below.

amazon_linux_2_ami_id = "ami-0090963cc60d485c3"

^^ This will fix the ami to the previous version. Long-term this workaround isn’t a good idea as you’ll want to get security updates etc., so once we have the fix out I would recommend going back to using what’s on master.



We are having the same problem as OP.
I’m getting an error from terraform after adding: ami-0090963cc60d485c3:
Error: updating Auto Scaling Group (sp-collector-server): ValidationError: You must use a valid fully-formed launch template. The image id '[ami-0090963cc60d485c3]' does not exist
I cannot find this AMI in the registry

:man_facepalming: silly me, I forgot that the ami ID you need depends on your region.

You can get the ami ID to supply using the aws sdk as follows:

aws ec2 describe-images --owners amazon --filters "Name=name,Values=amzn2-ami-hvm-2.0.20231218.0-x86_64-ebs" --filters "Name=root-device-type,Values=ebs" --filters "Name=virtualization-type,Values=hvm" --filters "Name=architecture,Values=x86_64" --filters "Name=description,Values=Amazon Linux 2 AMI 2.0.20231218.0 x86_64 HVM ebs" --region eu-central-1

(Replacing the region with the one you’re using)

I believe this will get you just a single result, which will contain the ami ID you need to use. I didn’t test it on every region however, if you run into issues please report back.

Apologies for that oversight, hope this resolves it for you!

1 Like

Hi @Colm, Thank you for this update. it was helpful in understanding the issue. I will postponed my deployment for now. Will there be an update on this issue? As in a Topic?

For reference this is the pull request @Colm talks about:

I choose use the suggested change and it fixed my problem.
I think we have to wait until this PR gets upstreamed then remove the hard-coded changes.

1 Like

Correct, that’s the one - thanks for digging it up and posting. :slight_smile:

Once it’s released we’ll need to update the downstream repos too, to use the new version - I’ll post in here when it’s all done.


Thank you @ZoliVeres for the reference :+1:
and yes @Colm, that would be great and thank you :raised_hands:

We face the similar issue so we lock the AMI version as per region we are using as per reference to @Colm reply.

Thanks @Jahanzaib_Younis for posting it.

1 Like

Hi all! Just sharing that this issue has now been patched and validated in the latest quickstart-examples release here: Release 23.10 (Patch.1) · snowplow/quickstart-examples · GitHub

As long as you are using the module versions noted in that release everything should work again.


Problem appeared again

Currently the target groups for iglu server and the collector do not pass health checks. I have tried with 23.1-patch.1 and 24.02-patch.1

default modules, public subnets

Hi @Youssef_Egla can you share any details from the server logs? The ones I nees should be in the following file: /var/log/user-data.log