I am getting the below error very frequently while using the docker image snowplow/scala-stream-collector-kinesis:3.1.0 but it’s not happening while using the snowplow/scala-stream-collector-kinesis:2.10.0
Issue:
[io-compute-0] ERROR org.http4s.server.service-errors - Error servicing request: POST /com.snowplowanalytics.snowplow/tp2 from 10.x.x.x
org.http4s.InvalidBodyException: Received premature EOF.
As per the release log, I understood there is a transition from Akka HTTP to http4s as the HTTP framework.
We run the collector application on the ECS fargate container with ALB for incoming traffic. I cannot get the input payload since the events are not sent to the snowplow collector bad events kinesis stream. Can someone help me resolve this issue?
Hey @Vishal_Periyasamy, thank you for reaching out. The collector series 3.x is a major update that has a slightly different performance characteristics than its predecessor. We tend to configure and tune it carefully to meet our customers usage patterns. However, we haven’t seen the error you’re hitting.
To be able to provide any useful suggestions, it’d be good to understand your runtime environment.
How does your ALB configuration look like - idle timeout settings, active connections?
Have you set any overrides to networking configuration? - Previously this setting was available through akka.networking section, but is moved since 3.0.0.
What are the resources available to the collector container? Are you overriding any of JVM options?
The error you are seeing occurs when the incoming POST connection is shutdown before full request body is read by the collector. So anything related to network settings and behavior is of essence here.
But you also should have similar settings set for AWS so connections don’t get terminated there as that is what seems to be happening.
Also, have you observed any relationship between the failure and load?
Another approach that could help investigating this would be enabling tracing logs in the collector to see what is the behaviour and input that causes the requests to fail.
This could be done by setting -Dorg.slf4j.simpleLogger.showDateTime=true -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss.SSSZ -Dorg.slf4j.simpleLogger.log.org.http4s.blaze=TRACE -Dorg.slf4j.simpleLogger.levelInBrackets=true flags for collector container.
We’ve been able to synthesise that the kind of errors you’re reporting by POST requests which contain Content-Length longer than actual body.
An example that will inevitably cause the kind of error is this request:
Hi @peel, the suggested configuration changes didn’t resolve the issue. I tried setting the logger to TRACE level to identify any common patterns of failure. However, I’m having trouble differentiating each request. Is there a way to modify the logging format to include a request_id or any unique_id for distinguishing each log message?
Have you tried the most recent version - 3.2.0?
We believe that unusual behaviour is due to long-standing idle connections that are blocking the server when a Content-Length that’s longer than actual body is received. The server will wait for the connection to complete the body for the idleTime period. Tuning values in this section should prevent the issue from happening.
Also, idleTimeout is usually set to a high value in GCP where LB uses idle connections for pool management. It is not encouraged to keep a long idleTime in deployments where it’s not strictly necessary.