Scala Collector + AWS ELB + SSL

Tully · May 30, 2018, 3:52pm

Have hit the end of the road of google/forum searches, and internal resource knowledge so time to post…

Step 1 - install collector

Snowplow Scala collector installed on an AWS EC2 instance
config for the collector using private EC2 IP and port 8000
TEST: Use javascript tracker pointing to 10…167:8000 - OK and data is available in Kinesis stream.
TEST: Load http://10…167:8000/com.snowplowanalytics.snowplow/tp2 in a browser shows the pixel - OK

Step 2 - ELB + HTTP

Setup load balancer (ELB) on AWS
configure listener on 80 (HTTP)
setup target on 8000
create CNAME for new subdomain pointing at ELB.
TEST: Javascript tracker pointing to new subdomain on HTTP - OK
TEST: GET request via browser to http://subdomain.example.com/com.snowplowanalytics.snowplow/tp2 - OK

Step 3 - ELB + HTTPS

configure new listener on 443 (HTTPS) using SSL certificate
setup target on 8000
add security group rules to allow 443 traffic.
TEST: Javascript using https://subdomain - 502 Bad Gateway
TEST: GET request via browser to https://subdomain.example.com/com.snowplowanalytics.snowplow/tp2 - 502 Bad Gateway

Troubleshooting so far

Have triple-checked security groups (OK) - allowing traffic from 0.0.0.0/0 for EC2 and ELB security groups have no affect - no network ACL issues either
certificate is valid
enable/disable HTTP/2 at ELB has same 502 result

check ELB access logs display no response back from target host for requests coming in as HTTPS
- The same request is being made to the target host on port 8000 for HTTP and HTTPS from ELB, but the HTTP request receives back 200 status from target whereas the HTTPS request receives nothing resulting in 502 response from the load balancer.

stdout/stderr logs for collector don’t show any results when HTTPS request is made, but do show for HTTP (eg. INFO com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink - Successfully wrote 1 out of 1 records)
forceSecureTracker in javascript does not change results - still 502s

changed config to interface=“0.0.0.0”
- INFO com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - REST interface bound to /0:0:0:0:0:0:0:0:8000
- same results - http OK, https 502

At this point we’re pretty much stumped because everything appears to be setup correctly. Our working theory is that the ELB is sending encrypted request to the collector and not terminating SSL at ELB but not sure how to prove/disprove this - are there additional logs somewhere we can see the incoming request to the collector?

Any input/questions/comments are greatly appreciated.

Thanks,

-WT.

fingerco · May 30, 2018, 5:26pm

Interesting - I might be wrong but your description of what’s going on (HTTP succeeding and HTTPS failing 502) would make me check the following:

Is your health check, on the ELB, the same configuration for both? Is it using HTTP for both? Is your Instance Port the same for both?

My initial guess would be that your ELB’s HTTPS health check is trying to use HTTPS (443 instead of 80) against the Scala stream collector and failing. Thus it thinks there are no healthy instances and can’t direct your request.

Tully · May 30, 2018, 5:57pm

Great suggestion and so I dug into the health checks… Currently both HTTP and HTTPS health checks on the ELB targets are failing, but HTTP requests to collector are succeeding… Lack of healthy target to my best understanding is that it just gets broadcast to all targets:
None of these Availability Zones contains a healthy target. Requests are being routed to all targets

The health checks are set to use traffic port, but overriding to 80 or 8000 seems to have no effect (if collector is setup to bind to port 8000, should that be the port used?). Also, not sure exactly what to use for a path… Currently using: /com.snowplowanalytics.snowplow/tp2

Tully · May 30, 2018, 6:08pm

Did some digging and saw that there is a /health endpoint for health checks!

Updated target health checks to use that endpoint and use port 8000 but still report unhealthy. Can hit /health by direct IP:port (bypassing ELB) and by subdomain on HTTP. Hitting it by HTTPS still 502s.

fingerco · May 30, 2018, 6:54pm

Interesting…

So if my understanding is correct, this should be working:

Health Check Tab:

HTTP:8000/health

Listeners Tab:

Load Balancer Protocol: HTTP
Load Balancer Port: 80
Instance Protocol: HTTP
Instance Port: 8000
Cipher: N/A
SSL Certificate: N/A

Load Balancer Protocol: HTTPS
Load Balancer Port: 443
Instance Protocol: HTTP
Instance Port: 8000
Cipher: …
SSL Certificate: …

It’s odd that none of the health checks are passing…

Maybe the path is one of:
/health
/com.snowplowanalytics.snowplow/tp2/health

fingerco · May 30, 2018, 6:56pm

If all that fails - Make sure your Availability Zone of the server is listed under the Availability Zones in the “Instances” tab of the ELB

Tully · May 30, 2018, 7:39pm

Woops, looks like there is a big delay between health checks - all are set to ping to /health on 8000 and all are now coming back healthy!

Unfortunately, the HTTPS is still coming back 502 when the ELB doesn’t receive a response from the collector. Direct to collector via IP, and http via subdomain through ELB still work. This is odd…

Do you know if there are there any collector webserver logs or something that can be looked at to see the inbound request (if any)?

fingerco · May 30, 2018, 7:55pm

That’s great to hear! One potential problem down!

Did you double check the listener settings?

Load Balancer Protocol: HTTPS
Load Balancer Port: 443
Instance Protocol: HTTP
Instance Port: 8000
Cipher: …
SSL Certificate: …

Specifically, Instance Protocol = “HTTP” and Instance Port = 8000

I’m not sure of those logs… It’s possible that the collector would output to STDOUT that you could redirect to a file for viewing.

That’s all I can think of at the moment! Hope you’re able to figure this out!

Tully · May 30, 2018, 7:56pm

I knew it was going to be something stupid. 3 of us are looking at it and missed the fact that we setup a target group using HTTP and 8000, then setup one using HTTPS using 8000… We then created a listener on 80 that points to HTTP:8000, and then created a listener on 443 that points to HTTPS:8000…

Resolution: Create one target group that uses HTTP and port 8000, but point both listeners at the same target group so that the target communicates via HTTP on port 8000.

Was staring back at us the entire time. All looks legit, but specifying a target group using HTTPS means that requests sent to the collector will be encrypted.

Really appreciate the help and eyes on this one @fingerco and apologies for wasting your time!!

fingerco · May 30, 2018, 8:40pm

No problem! Happy to help!

I’ve ran into that before and I’m sure that other people who run into this exact same thing will find this and be able to figure it out

David_D · September 24, 2020, 11:23pm

Hi guys,

I have setup the networking in AWS just as you have explained above, except using port 8080 instead of 8000. However, when I use:

curl https://mydomain.com:8080/health

I still get the error:

curl: (7) Failed to connect to mydomain.com port 8080: Connection refused

Was there something with the ssl section of the app.config file you had to alter?

Any help would be much appreciated.

Marco_Mai · May 27, 2021, 1:35pm

Hi David,

How did you resolve it? I face the same issue. Thanks in advance.

kfitzpatrick · May 27, 2021, 7:12pm

Hey @Marco_Mai

Probably best to start your own topic and post logs/config there, the guys will definitely help once you’ve outlined the specific issue you face.

Kyle

Topic		Replies	Views
JS tracker and scala stream collector For engineers	5	1394	December 15, 2017
Scala Stream Collector Collectors	4	2266	November 22, 2017
Collector does not accept HTTPS / port 443 calls Collectors	3	1406	March 2, 2022
Collector Accepts Connections, Delays for One Minute, Loops Collectors	3	1008	March 16, 2021
Error in setting up scala stream collector Troubleshooting	1	2670	March 12, 2020

Scala Collector + AWS ELB + SSL

Related topics