Enricher Kinesis InternalFailure - Internal Service Error

Daniel_Baron · July 28, 2023, 1:17pm

Hey there

We have deployed the Snowplow Stream Enricher JAR to an ECS task reading from a Kinesis stream and writing to another Kinesis stream. I notice the following error very often and wondered what was causing it and how it can be resolved.

[ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.

Our configuration

ECS

mem: 4096
cpu: 2048
count: 2
Base Docker Image: amazoncorretto:11

Kinesis Sink

Retention Period: 72 Hours
Provisioning Mode: ON_DEMAND

Enricher Configuration

Version: 3.2.3
configuration (redacted)

{
  "input": {
    "type": "Kinesis",
    "streamName": -,
    "appName": -,
    "initialPosition": { 
      "type": "TRIM_HORIZON",
    },
    "checkpointBackoff": {
      "minBackoff": 100 milliseconds
      "maxBackoff": 10 seconds
      "maxRetries": 10
    }
  },
  "output": {
    "good": {
      "type": "Kinesis",
      "streamName": -,
      "backoffPolicy": {
        "minBackoff": 100 milliseconds
        "maxBackoff": 10 seconds
        "maxRetries": 10
      },
    },
    "bad": {
      "type": "Kinesis",
      "streamName": -,
      "backoffPolicy": {
        "minBackoff": 100 milliseconds
        "maxBackoff": 10 seconds
        "maxRetries": 10
      }
    }
  },
  "monitoring": {
    "metrics": {
      "stdout": {
        "period": "1 minute"
      }
      "cloudwatch": true
    }
  }
}

Logs

Redacted logs providing context:

12:24:35.0082+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:35.0095+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:35.0095+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.Scheduler - Current stream shard assignments: shardId-000000000036	<redacted>
12:24:35.0095+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.Scheduler - Sleeping ...	<redacted>
12:24:35.0152+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:35.0233+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:35.0278+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 20 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:36.0246+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.Scheduler - Current stream shard assignments: shardId-000000000037	<redacted>
12:24:36.0246+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.Scheduler - Sleeping ...	<redacted>
12:24:38.0568+0000 [pool-15-thread-1] [INFO] software.amazon.kinesis.coordinator.DeterministicShuffleShardSyncLeaderDecider - Elected leaders: <worker-id>	<redacted>
12:24:44.0108+0000 [pool-15-thread-1] [INFO] software.amazon.kinesis.coordinator.DeterministicShuffleShardSyncLeaderDecider - Elected leaders: <worker-id>	<redacted>
12:24:49.0097+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.DiagnosticEventLogger - Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=3, maximumPoolSize=2147483647)	<redacted>
12:24:50.0254+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.DiagnosticEventLogger - Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=2, maximumPoolSize=2147483647)	<redacted>
12:25:02.0943+0000 [pool-17-thread-1] [INFO] software.amazon.kinesis.leases.LeaseCleanupManager - Number of pending leases to clean before the scan : 0	<redacted>
12:25:05.0954+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:06.0014+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:06.0265+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:06.0715+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:06.0752+0000 [pool-17-thread-1] [INFO] software.amazon.kinesis.leases.LeaseCleanupManager - Number of pending leases to clean before the scan : 0	<redacted>
12:25:07.0499+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:19.0100+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.DiagnosticEventLogger - Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=3, maximumPoolSize=2147483647)	<redacted>
12:25:20.0258+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.DiagnosticEventLogger - Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=2, maximumPoolSize=2147483647)	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.raw = 2880	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.good = 4766	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.bad = 0	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.invalid_enriched = 0	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.latency = 7112	<redacted>
12:25:27.0148+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.raw = 2947	<redacted>
12:25:27.0149+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.good = 4955	<redacted>
12:25:27.0149+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.bad = 0	<redacted>
12:25:27.0149+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.invalid_enriched = 0	<redacted>
12:25:27.0149+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.latency = 6844	<redacted>

We have considered upgrading although we are struggling with team resources and there are higher priority tasks. I mainly wondered if there’s anything obvious and if there’s a specific reason for the internal server error from Kinesis?

Thanks for any help you can provide!

istreeter · July 28, 2023, 4:55pm

Hi @Daniel_Baron sorry I don’t remember ever seeing so many internal service failures. The error type is mentioned on this AWS docs page but the description is not very helpful.

In Enrich version 3.4.1 we improved a bit how the app handles errors when writing to Kinesis. So maybe if you upgraded then you might see a reduction in this error type in the logs. (I do understand your point though about needing team resources to do this carefully)

Topic		Replies	Views
Kinesis stream enrich failing - TimeoutException AWS real-time pipeline	8	1357	February 19, 2021
Error in collector Collectors	2	1373	January 8, 2018
Kinesis stream enrich failing AWS real-time pipeline	5	3251	October 8, 2016
Snowplow Enrich Error For engineers	2	892	September 1, 2020
Enrichment process failed Enrichment	2	1847	August 8, 2017

Enricher Kinesis InternalFailure - Internal Service Error

Our configuration

ECS

Kinesis Sink

Enricher Configuration

Logs

Related topics