Enricher Kinesis InternalFailure - Internal Service Error

Hey there :wave:

We have deployed the Snowplow Stream Enricher JAR to an ECS task reading from a Kinesis stream and writing to another Kinesis stream. I notice the following error very often and wondered what was causing it and how it can be resolved.

[ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.

Our configuration

ECS

  • mem: 4096
  • cpu: 2048
  • count: 2
  • Base Docker Image: amazoncorretto:11

Kinesis Sink

  • Retention Period: 72 Hours
  • Provisioning Mode: ON_DEMAND

Enricher Configuration

  • Version: 3.2.3
  • configuration (redacted)
{
  "input": {
    "type": "Kinesis",
    "streamName": -,
    "appName": -,
    "initialPosition": { 
      "type": "TRIM_HORIZON",
    },
    "checkpointBackoff": {
      "minBackoff": 100 milliseconds
      "maxBackoff": 10 seconds
      "maxRetries": 10
    }
  },
  "output": {
    "good": {
      "type": "Kinesis",
      "streamName": -,
      "backoffPolicy": {
        "minBackoff": 100 milliseconds
        "maxBackoff": 10 seconds
        "maxRetries": 10
      },
    },
    "bad": {
      "type": "Kinesis",
      "streamName": -,
      "backoffPolicy": {
        "minBackoff": 100 milliseconds
        "maxBackoff": 10 seconds
        "maxRetries": 10
      }
    }
  },
  "monitoring": {
    "metrics": {
      "stdout": {
        "period": "1 minute"
      }
      "cloudwatch": true
    }
  }
}

Logs

Redacted logs providing context:

12:24:35.0082+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:35.0095+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:35.0095+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.Scheduler - Current stream shard assignments: shardId-000000000036	<redacted>
12:24:35.0095+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.Scheduler - Sleeping ...	<redacted>
12:24:35.0152+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:35.0233+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 23 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:35.0278+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 20 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:24:36.0246+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.Scheduler - Current stream shard assignments: shardId-000000000037	<redacted>
12:24:36.0246+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.Scheduler - Sleeping ...	<redacted>
12:24:38.0568+0000 [pool-15-thread-1] [INFO] software.amazon.kinesis.coordinator.DeterministicShuffleShardSyncLeaderDecider - Elected leaders: <worker-id>	<redacted>
12:24:44.0108+0000 [pool-15-thread-1] [INFO] software.amazon.kinesis.coordinator.DeterministicShuffleShardSyncLeaderDecider - Elected leaders: <worker-id>	<redacted>
12:24:49.0097+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.DiagnosticEventLogger - Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=3, maximumPoolSize=2147483647)	<redacted>
12:24:50.0254+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.DiagnosticEventLogger - Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=2, maximumPoolSize=2147483647)	<redacted>
12:25:02.0943+0000 [pool-17-thread-1] [INFO] software.amazon.kinesis.leases.LeaseCleanupManager - Number of pending leases to clean before the scan : 0	<redacted>
12:25:05.0954+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:06.0014+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:06.0265+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:06.0715+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:06.0752+0000 [pool-17-thread-1] [INFO] software.amazon.kinesis.leases.LeaseCleanupManager - Number of pending leases to clean before the scan : 0	<redacted>
12:25:07.0499+0000 [pool-1-thread-2] [ERROR] com.snowplowanalytics.snowplow.enrich.kinesis.Sink - 37 records failed with error code InternalFailure. Example error message: Internal service failure.	<redacted>
12:25:19.0100+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.DiagnosticEventLogger - Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=3, maximumPoolSize=2147483647)	<redacted>
12:25:20.0258+0000 [cats-effect-blocker-2] [INFO] software.amazon.kinesis.coordinator.DiagnosticEventLogger - Current thread pool executor state: ExecutorStateEvent(executorName=SchedulerThreadPoolExecutor, currentQueueSize=0, activeThreads=0, coreThreads=0, leasesOwned=1, largestPoolSize=2, maximumPoolSize=2147483647)	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.raw = 2880	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.good = 4766	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.bad = 0	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.invalid_enriched = 0	<redacted>
12:25:21.0857+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.latency = 7112	<redacted>
12:25:27.0148+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.raw = 2947	<redacted>
12:25:27.0149+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.good = 4955	<redacted>
12:25:27.0149+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.bad = 0	<redacted>
12:25:27.0149+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.invalid_enriched = 0	<redacted>
12:25:27.0149+0000 [pool-1-thread-2] [INFO] enrich.metrics - snowplow.enrich.latency = 6844	<redacted>

We have considered upgrading although we are struggling with team resources and there are higher priority tasks. I mainly wondered if there’s anything obvious and if there’s a specific reason for the internal server error from Kinesis?

Thanks for any help you can provide!

Hi @Daniel_Baron sorry I don’t remember ever seeing so many internal service failures. The error type is mentioned on this AWS docs page but the description is not very helpful.

In Enrich version 3.4.1 we improved a bit how the app handles errors when writing to Kinesis. So maybe if you upgraded then you might see a reduction in this error type in the logs. (I do understand your point though about needing team resources to do this carefully)

1 Like