Collector Server failing health check

Jahanzaib_Younis · January 15, 2024, 1:27pm

Hi, i had snowplow pipeline running on AWS fine until i destroyed and re deployed the terraform modules today, there were AMI updates from snowplow. Since then collector has been failing health checks, everything is setup from Nat gateway to route tables to redshift access to rs loader. but none of the servers (collector, enrich, rs loader etc) getting access to internet, there are no changes in vpc whatsoever
any help ?? this is my collector server app version
Screenshot 2024-01-15 at 6.54.28 PM

Jahanzaib_Younis · January 15, 2024, 1:29pm

are there any breaking changes in collector update ??

Colm · January 15, 2024, 1:58pm

Hi @Jahanzaib_Younis, it’s a known issue, we have a PR in review to resolve it. TL;DR the default ami was updated by amazon, and it broke compatibility with the version of docker we were using.

We will release a fix very soon. In the meantime, you can work around it by editing this file in your local copy of the repo, to add the following line to collector_kinesis, and enrich_kinesis modules, and the same for any ec2-based modules in any target you’re using:

Edit - this is the one for eu-central-1, I initially forgot that the AMI ID is different for each region. I have descibed how to get the ami ID for your region in this comment below.

amazon_linux_2_ami_id = "ami-0090963cc60d485c3"

^^ This will fix the ami to the previous version. Long-term this workaround isn’t a good idea as you’ll want to get security updates etc., so once we have the fix out I would recommend going back to using what’s on master.

Best,

ZoliVeres · January 15, 2024, 3:06pm

We are having the same problem as OP.
I’m getting an error from terraform after adding: ami-0090963cc60d485c3:
Error: updating Auto Scaling Group (sp-collector-server): ValidationError: You must use a valid fully-formed launch template. The image id '[ami-0090963cc60d485c3]' does not exist
I cannot find this AMI in the registry

Colm · January 15, 2024, 4:15pm

silly me, I forgot that the ami ID you need depends on your region.

You can get the ami ID to supply using the aws sdk as follows:

aws ec2 describe-images --owners amazon --filters "Name=name,Values=amzn2-ami-hvm-2.0.20231218.0-x86_64-ebs" --filters "Name=root-device-type,Values=ebs" --filters "Name=virtualization-type,Values=hvm" --filters "Name=architecture,Values=x86_64" --filters "Name=description,Values=Amazon Linux 2 AMI 2.0.20231218.0 x86_64 HVM ebs" --region eu-central-1

(Replacing the region with the one you’re using)

I believe this will get you just a single result, which will contain the ami ID you need to use. I didn’t test it on every region however, if you run into issues please report back.

Apologies for that oversight, hope this resolves it for you!

Jahanzaib_Younis · January 15, 2024, 6:15pm

Hi @Colm, Thank you for this update. it was helpful in understanding the issue. I will postponed my deployment for now. Will there be an update on this issue? As in a Topic?

ZoliVeres · January 16, 2024, 8:15am

For reference this is the pull request @Colm talks about:

github.com/snowplow-devops/terraform-aws-service-ec2

Install docker using amazon-linux-extras

snowplow-devops:main ← colmsnowplow:update-user-data

opened 04:52PM - 11 Jan 24 UTC

colmsnowplow

+1 -1

The latest AMI breaks when we import terraform, this PR is to fix that issue: … ```[ec2-user@ip-172-31-13-81 ~]$ sudo cat /var/log/user-data.log + install_docker_ce + sudo yum install -y docker-20.10.4-1.amzn2 Loaded plugins: extras_suggestions, langpacks, priorities, update-motd Resolving Dependencies --> Running transaction check ---> Package docker.x86_64 0:20.10.4-1.amzn2 will be installed --> Processing Dependency: containerd >= 1.3.2 for package: docker-20.10.4-1.amzn2.x86_64 --> Processing Dependency: libcgroup >= 0.40.rc1-5.15 for package: docker-20.10.4-1.amzn2.x86_64 --> Processing Dependency: runc >= 1.0.0 for package: docker-20.10.4-1.amzn2.x86_64 --> Processing Dependency: pigz for package: docker-20.10.4-1.amzn2.x86_64 --> Running transaction check ---> Package containerd.x86_64 0:1.7.2-1.amzn2.0.1 will be installed ---> Package libcgroup.x86_64 0:0.41-21.amzn2 will be installed ---> Package pigz.x86_64 0:2.3.4-1.amzn2.0.1 will be installed ---> Package runc.x86_64 0:1.1.7-4.amzn2 will be installed --> Processing Conflict: libseccomp-2.5.2-1.amzn2.0.1.x86_64 conflicts docker < 20.10.25 --> Finished Dependency Resolution Error: libseccomp conflicts with docker-20.10.4-1.amzn2.x86_64 You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest```

I choose use the suggested change and it fixed my problem.
I think we have to wait until this PR gets upstreamed then remove the hard-coded changes.

Colm · January 16, 2024, 11:30am

Correct, that’s the one - thanks for digging it up and posting.

Once it’s released we’ll need to update the downstream repos too, to use the new version - I’ll post in here when it’s all done.

Jahanzaib_Younis · January 16, 2024, 11:45am

Thank you @ZoliVeres for the reference
and yes @Colm, that would be great and thank you

ammar.ali · January 17, 2024, 7:40am

We face the similar issue so we lock the AMI version as per region we are using as per reference to @Colm reply.

Thanks @Jahanzaib_Younis for posting it.

josh · January 17, 2024, 6:32pm

Hi all! Just sharing that this issue has now been patched and validated in the latest quickstart-examples release here: Release 23.10 (Patch.1) · snowplow/quickstart-examples · GitHub

As long as you are using the module versions noted in that release everything should work again.

Youssef_Egla · May 7, 2024, 6:27am

Problem appeared again

Currently the target groups for iglu server and the collector do not pass health checks. I have tried with 23.1-patch.1 and 24.02-patch.1

default modules, public subnets

josh · May 7, 2024, 10:27am

Hi @Youssef_Egla can you share any details from the server logs? The ones I nees should be in the following file: /var/log/user-data.log

Topic		Replies	Views
AWS collector fails healthchecks Troubleshooting	2	345	February 6, 2024
502 Bad Gateway Troubleshooting	4	1114	July 19, 2023
CORS Preflight Failing Collectors	1	1147	August 2, 2022
Enrich Server Cannot Convert Configuration/Enrichment (No Longer Validating Events) Troubleshooting	7	1527	May 13, 2022
Aws quickstart templates load-balancer instance failure	0	770	August 12, 2022

Collector Server failing health check

Related topics