Timeout running terraform

Hi, I’m getting a timeout running this step of the terraform process…


│ Error: timeout while waiting for state to become 'created' (last state: 'creating', timeout: 20m0s)
│   with module.iglu_server.google_compute_region_instance_group_manager.grp,
│   on .terraform\modules\iglu_server\main.tf line 228, in resource "google_compute_region_instance_group_manager" "grp":
│  228: resource "google_compute_region_instance_group_manager" "grp" {

any ideas?

I got some help from our SRE.
It looks like the startup script for iglu server isn’t working correctly.

This isn’t installing docker when run manually in the terminal for the server.

set -e -x

# -----------------------------------------------------------------------------
# -----------------------------------------------------------------------------

readonly CONFIG_DIR=/opt/snowplow/config

function install_base_packages() {
  sudo apt install wget curl unzip -y

function install_docker_ce() {
  sudo apt install docker.io -y
  sudo systemctl enable --now docker

sudo apt update -y


sudo mkdir -p ${CONFIG_DIR}

What’s failing as part of this script? Does your instance have an outbound connection to the internet to install these packages?

I was able to SSH into it and was able to install docker on it with this line. So I would say it does have a working connection.

sudo apt install docker.io -y

I’m not sure what was failing. I think the heartbeat wasn’t accessible from outside, hence the timeout.

I ran the entire script in the shell and it looked like docker wasn’t installed.

I assume that the terraform process normally works?

Yeah - it would be unusual for it to be able to install the base packages without any issues but then fail on docker (unless it was an intermittent connection issue).

I must have done something wrong then.
Our SRE got the AWS version installed.
I’m trying to send some data to it.

I’m still getting a timeout trying to run the terraform process.
It’s when attempting to spin this up:


I have increased the timeout from 20min to 60min and still getting the issue.

As stated above, our SRE did get the AWS quick start installed but I can’t have access to that so trying again with GCP.

Could something be failing to start?
How can I diagnose the issue?

If it’s still failing after 60 minutes I’d have a look at any health checks that have been provisioned and check if they are failing or succeeding as well as double checking that you aren’t hitting any GCP quotas (though this should usually fail immediately).

I just have the one health check

This is for the server:

And clicking through to the group shows:

The red is because of the timeout.
The Errors tab just shows timeouts.

The server listed in the error logs is running and I can ping it from my local machine.