GCP Iglu Server Health Checks Failing

Ryan_Jansen · August 6, 2024, 11:51pm

I am going through the GCP Quick Start guide and having an issue where the Iglu server instance is failing health checks and keeps restarting. The terraform deploy will time out after 20 minutes waiting for the health checks to pass.
module.iglu_server.module.service.google_compute_region_instance_group_manager.grp: Still creating…

I can SSH into the instance with the GCP UI.
I have activated the 5 appropriate APIs in the quick start guide (Compute Engine API, Cloud Resource Manager API, Identity and Access Management (IAM) API, Cloud Pub/Sub API, Cloud SQL Admin API)
I setup a Cloud NAT, and confirmed I can receive a response when I curl example.com when I am SSH’d into the instance
The logs for the instance don’t appear to give any errors, only system message notices about it booting and being recreated

There are several topics with similar issues but none of them solved it. I am on windows and even tried converting the .tf files with dos2unix

I am not sure how to verify that the server is running correctly. When I SSH in there is no /opt/snowplow folder that I see mentioned in the startup-script of the iglu_server module.

Appreciate any help, thanks!

Here is my config, stripped of credentials

# Please accept the terms of the Snowplow Limited Use License Agreement to proceed. (https://docs.snowplow.io/limited-use-license-1.0/)
accept_limited_use_license = true

# Will be prefixed to all resource names
# Use this to easily identify the resources created and provide entropy for subsequent environments
prefix = "sp"

# The project to deploy the infrastructure into
project_id = "project-4358349857394"

# Where to deploy the infrastructure
region = "us-central1"

# --- Network
# NOTE: The network & sub-network configured must be configured with a Cloud NAT to allow the deployed Compute Engine instances to
#       connect to the internet to download the required assets
network    = "default"
subnetwork = ""

# --- SSH
# Update this to the internal IP of your Bastion Host
ssh_ip_allowlist = ["XX.XX.XX.XX/32"]
# Generate a new SSH key locally with `ssh-keygen`
# ssh-keygen -t rsa -b 4096 
# ssh_key_pairs = []
ssh_key_pairs = [
  {
    user_name  = "snowplow"
    public_key = "MY_PUBLIC_KEY"
  }
]

# --- Snowplow Iglu Server
iglu_db_name     = "iglu"
iglu_db_username = "iglu"
# Change and keep this secret!
iglu_db_password = "MY_PASSWORD"

# Used for API actions on the Iglu Server
# Change this to a new UUID and keep it secret!
iglu_super_api_key = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

# NOTE: To push schemas to your Iglu Server, you can use igluctl
# igluctl: https://docs.snowplowanalytics.com/docs/pipeline-components-and-applications/iglu/igluctl
# igluctl static push --public schemas/ http://CHANGE-TO-MY-IGLU-IP 00000000-0000-0000-0000-000000000000

# See for more information: https://github.com/snowplow-devops/terraform-google-iglu-server-ce#telemetry
# Telemetry principles: https://docs.snowplowanalytics.com/docs/open-source-quick-start/what-is-the-quick-start-for-open-source/telemetry-principles/
user_provided_id  = ""
telemetry_enabled = false

# --- SSL Configuration (optional)
ssl_information = {
  certificate_id = ""
  enabled        = false
}

# --- Extra Labels to append to created resources (optional)
labels = {}

Edit - I noticed HTTP traffic is off for the instances, not sure if that matters

jethron · August 7, 2024, 2:03am

If there’s no /opt/snowplow/config dir then something may have failed earlier in the startup script.

Is Docker successfully installed on the instance when you SSH into it? Do you get any error output from the startup script?

Ryan_Jansen · August 7, 2024, 2:48am

Appreciate the help!
Docker is not installed, and no errors are coming from the startup script.

jethron · August 7, 2024, 4:44am

Hm. There’s some churn in the Debian packages for Docker recently, the docker.io package has been split into docker.io and docker-cli but I’m not sure if/how/why that would have impacted the Ubuntu image used here.

Maybe try re-running the script manually with sudo google_metadata_script_runner startup (per this) to see if/where it continues to fail.

Ryan_Jansen · August 7, 2024, 5:08am

Looks like there is an error returned when I run that

Starting startup scripts (version 20231004.02-0ubuntu1~20.04.4).
Found startup-script in metadata.
startup-script: /bin/bash: /tmp/metadata-scripts583314085/startup-script: /bin/bash^M: bad interpreter: No such file or directory
startup-script exit status 126
Finished running startup scripts.

Funny enough, maybe it is related to line endings? I was going to see if I could edit the line endings but its in /tmp and changes each time: newline - Bash script – "/bin/bash^M: bad interpreter: No such file or directory" - Stack Overflow

Edit - Ok maybe not, I ran that command on /usr/bin/google_metadata_script_runner and tried again and got a Segmentation Fault

jethron · August 7, 2024, 5:54am

Ah, right. OK, so the line endings of the .tf file itself shouldn’t matter, but the template file that it builds from probably do.

So you probably want at least:

dos2unix .terraform/modules/iglu_server.service/templates/startup-script.sh.tmpl
dos2unix .terraform/modules/iglu_server/templates/startup-script.sh.tmpl
dos2unix .terraform/modules/iglu_server.telemetry/templates/user-data.sh.tmpl
dos2unix .terraform/modules/iglu_server/templates/config.hocon.tmpl
dos2unix .terraform/modules/iglu_server.telemetry/templates/gcp_ubuntu_20_04.sh.tmpl

after you have done terraform init, and then do another terraform apply to update the userdata and try again.

Ryan_Jansen · August 7, 2024, 6:32am

You are amazing. That solved my issue and allowed the startup script to start working. The logs also showed I had to activate the Cloud Logging API and now it is all working.

Thank you so much

Running this in git bash was a quick way to change all the .tmpl files up to 4 levels deep

find . -maxdepth 4 -type f -name "*.tmpl" -exec dos2unix -v {} +

Topic		Replies	Views
Terraform quick start on GCP is not working , Trying to find manual set up page but not able to locate it in documentation GCP pipeline	11	974	November 1, 2023
Timeout Apply Iglu Server For engineers	16	1953	April 12, 2023
Quick Start on GCP - Iglu Server instance group creation timeout Troubleshooting	9	1542	June 17, 2023
GCP Iglu server setup guide - Newbie For engineers	6	811	May 15, 2023
Getting health checks failed for iglu server when deployed through secure terraform configurations Troubleshooting	5	508	December 19, 2023

GCP Iglu Server Health Checks Failing

Related topics