Unstable Iglu server response times

medicinal-matt · October 11, 2021, 11:46am

Hi!

We tried to setup the Iglu server based on the secure quick start example. We have however noticed that the response time of API calls can be extreme slow from time to time, and we suspect this might be the issue we’re seeing with the RDB Loader.

The health call says everything is OK, but about 50 % of the API calls take forever to complete.

curl -kso /dev/null iglu-lb-<ACCOUNT>.<REGION>.elb.amazonaws.com/api/meta/health -w "==============\n\n
| dnslookup: %{time_namelookup}\n
| connect: %{time_connect}\n
| appconnect: %{time_appconnect}\n
| pretransfer: %{time_pretransfer}\n
| starttransfer: %{time_starttransfer}\n
| total: %{time_total}\n
| size: %{size_download}\n
| HTTPCode=%{http_code}\n\n"

| dnslookup: 0.001543

| connect: 75.232440

| appconnect: 0.000000

| pretransfer: 75.232495

| starttransfer: 75.275799

| total: 75.275937

| size: 2

| HTTPCode=200

time curl iglu-lb-<ACCOUNT>.<REGION>.elb.amazonaws.com/api/schemas/com.snowplowanalytics.snowplow/ua_parser_context/jsonschema/1-0-0 -X GET -H "apikey: <READ KEY>"

0.01s user 0.01s system 0% cpu 1:15.38 total

Moments before the connect was only 0.039454. Is this expected?

We’re having the server and database in two private subnets, and the load balancer in two public subnets of the same VPC. We tried setting the EC2 to t3.medium and the RDS to db.t3.medium, but no success.

mike · October 12, 2021, 12:27am

This is a pretty unusually long response time.

Is this for all endpoints or just a subset of endpoints?

I’d be tempted to work backwards from the connection (to the load balancer, to the EC2 instance and then to the RDS instance) to try and determine where that potential latency is being introduced as well as having a look through the Cloudwatch metrics for these services.

medicinal-matt · October 12, 2021, 2:49pm

It is all end points I’ve tried. What is so strange that works every now and then. It doesn’t consistently take long.

I now tried adding one subnet per availability zone, but there is no obvious improvement.

The Cloud Watch metrics of the Monitoring tab for the load balancer, Iglu server and Iglu database look calm.

Can you tell me a VPC setup that is verified to work? How many public and private subnets? What about availability zones?

medicinal-matt · October 13, 2021, 7:11am

Or maybe it is the security groups?

iglu-server security group
Inbound

type: SSH, protocol: TCP, port: 22, source: 0.0.0.0/0
type: Custom TCP, protocol: TCP, port: 8080, source: sg-08ad553c562946650 / iglu-lb

Outbound

type: HTTPS, protocol: TCP, port: 443, destination: 0.0.0.0/0
type: HTTP, protocol: TCP, port: 80, destination: 0.0.0.0/0
type: PostgreSQL, protocol: TCP, port: 5432, destination: sg-0f0cc0f43ab50c185 / iglu-rds
type: Custom UDP, protocol: UDP, port: 123, destination: 0.0.0.0/0

iglu-lb security group
Inbound

type: HTTPS, protocol: TCP, port: 443, source: 0.0.0.0/0
type: HTTP, protocol: TCP, port: 80, source: 0.0.0.0/0

Outbound

type: Custom TCP, protocol: TCP, port: 8080, destination: sg-008224009d024b58a / iglu-server

mike · October 13, 2021, 7:31am

Public / private subnets should be fine as well as any availability zones. I’d dig further into Cloudwatch as there should be some indicator as to where they slow response is coming from if those logs are being written out. The security groups shouldn’t have a material impact on response latency.

medicinal-matt · December 17, 2021, 3:33pm

I actually got a reply from AWS customer support for how to debug this, but it turns out the problem simply disappeared when we switched from using the new VPC setup for this purpose to older one someone else had setup.

Not entirely sure how the VPC and the surrounding settings differ, but at least this solves the issue for us.

mike · December 20, 2021, 12:03am

Yeah - that’s odd. Thanks for the update!

Topic		Replies	Views
RDB Loader can't connect to Iglu server or Iglu central? Storage targets	9	3332	December 17, 2021
Iglu Server responds with 502 For engineers	3	782	February 2, 2022
Iglu server connection issue Troubleshooting	6	843	September 12, 2023
Getting health checks failed for iglu server when deployed through secure terraform configurations Troubleshooting	5	508	December 19, 2023
Azure iglu FATAL: SSL connection is required For engineers	12	704	August 31, 2023

Unstable Iglu server response times

Related topics