I figured what I was doing wrong at this point: accessKey
and secretKey
were not ok, I should just be setting the fields with value iam
. I did that, and also added CloudWatchFullAccess
policy to the role.
Bug debugging goes on, normal day. Now I’m getting the following error log (going to post the whole thing, just in case, but I guess only the last lines are relevant for debugging):
[main] INFO com.snowplowanalytics.snowplow.enrich.stream.sources.KinesisSource - Using workerId: 15e9e888a733:0f7b9a88-74fb-4b63-af5f-5fc1dfd7d2f5
[main] INFO com.snowplowanalytics.snowplow.enrich.stream.sources.KinesisSource - Running: snowplow-enrich-staging.
[main] INFO com.snowplowanalytics.snowplow.enrich.stream.sources.KinesisSource - Processing raw input stream: snowplow-events
[main] INFO com.amazonaws.services.kinesis.leases.impl.LeaseCoordinator - With failover time 10000 ms and epsilon 25 ms, LeaseCoordinator will renew leases every 3308 ms, takeleases every 20050 ms, process maximum of 2147483647 leases and steal 1 lease(s) at a time.
[main] WARN com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Received configuration for both region name as us-east-1, and Amazon Kinesis endpoint as https://kinesis.us-east-1.amazonaws.com. Amazon Kinesis endpoint will overwrite region name.
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Initialization attempt 1
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Initializing LeaseCoordinator
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisClientLibLeaseCoordinator - Created new lease table for coordinator with initial read capacity of 10 and write capacity of 10.
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Syncing Kinesis shard info
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Starting LeaseCoordinator
[LeaseCoordinator-0000] INFO com.amazonaws.services.kinesis.leases.impl.LeaseTaker - Worker 15e9e888a733:0f7b9a88-74fb-4b63-af5f-5fc1dfd7d2f5 saw 2 total leases, 2 available leases, 1 workers. Target is 2 leases, I have 0 leases, I will take 2 leases
[LeaseCoordinator-0000] INFO com.amazonaws.services.kinesis.leases.impl.LeaseTaker - Worker 15e9e888a733:0f7b9a88-74fb-4b63-af5f-5fc1dfd7d2f5 successfully took 2 leases: shardId-000000000001, shardId-000000000000
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Initialization complete. Starting worker loop.
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Created new shardConsumer for : ShardInfo [shardId=shardId-000000000001, concurrencyToken=544560f3-8b10-43fa-808a-c76ea393797f, parentShardIds=[], checkpoint={SequenceNumber: TRIM_HORIZON,SubsequenceNumber: 0}]
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Created new shardConsumer for : ShardInfo [shardId=shardId-000000000000, concurrencyToken=b087ea0f-5c3b-4e4e-b091-a4d92eee5e99, parentShardIds=[], checkpoint={SequenceNumber: TRIM_HORIZON,SubsequenceNumber: 0}]
[RecordProcessor-0000] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.BlockOnParentShardTask - No need to block on parents [] of shard shardId-000000000001
[RecordProcessor-0001] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.BlockOnParentShardTask - No need to block on parents [] of shard shardId-000000000000
[cw-metrics-publisher] WARN com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable - Could not publish 20 datums to CloudWatch
[cw-metrics-publisher] WARN com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable - Could not publish 3 datums to CloudWatch
[RecordProcessor-0000] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisDataFetcher - Initializing shard shardId-000000000000 with TRIM_HORIZON
[RecordProcessor-0001] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisDataFetcher - Initializing shard shardId-000000000001 with TRIM_HORIZON
[RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.enrich.stream.sources.KinesisSource - Initializing record processor for shard: shardId-000000000000
[RecordProcessor-0001] INFO com.snowplowanalytics.snowplow.enrich.stream.sources.KinesisSource - Initializing record processor for shard: shardId-000000000001
[RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.enrich.stream.sources.KinesisSource - Processing 139 records from shardId-000000000000
[RecordProcessor-0001] INFO com.snowplowanalytics.snowplow.enrich.stream.sources.KinesisSource - Processing 152 records from shardId-000000000001
[RecordProcessor-0001] INFO com.snowplowanalytics.snowplow.enrich.stream.sinks.KinesisSink - Writing 6 records to Kinesis stream snowplow-enrich-good
[RecordProcessor-0000] INFO com.snowplowanalytics.snowplow.enrich.stream.sinks.KinesisSink - Writing 3 records to Kinesis stream snowplow-enrich-good
[RecordProcessor-0001] ERROR com.snowplowanalytics.snowplow.enrich.stream.sinks.KinesisSink - Writing failed.
com.amazonaws.services.kinesis.model.AmazonKinesisException: 6 validation errors detected: Value null at 'records.1.member.partitionKey' failed to satisfy constraint: Member must not be null; Value null at 'records.2.member.partitionKey' failed to satisfy constraint: Member must not be null; Value null at 'records.3.member.partitionKey' failed to satisfy constraint: Member must not be null; Value null at 'records.4.member.partitionKey' failed to satisfy constraint: Member must not be null; Value null at 'records.5.member.partitionKey' failed to satisfy constraint: Member must not be null; Value null at 'records.6.member.partitionKey' failed to satisfy constraint: Member must not be null (Service: AmazonKinesis; Status Code: 400; Error Code: ValidationException; Request ID: cfabdbe4-6f13-cc7a-9b8f-95ecaa7cc3f2)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1630)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1302)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:2388)
at com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:2364)
at com.amazonaws.services.kinesis.AmazonKinesisClient.executePutRecords(AmazonKinesisClient.java:1859)
at com.amazonaws.services.kinesis.AmazonKinesisClient.putRecords(AmazonKinesisClient.java:1834)
at com.snowplowanalytics.snowplow.enrich.stream.sinks.KinesisSink$$anonfun$multiPut$1.apply(KinesisSink.scala:289)
at com.snowplowanalytics.snowplow.enrich.stream.sinks.KinesisSink$$anonfun$multiPut$1.apply(KinesisSink.scala:275)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[RecordProcessor-0001] ERROR com.snowplowanalytics.snowplow.enrich.stream.sinks.KinesisSink - + Retrying in 8541 milliseconds...
(Spoiler: keeps rapidly retrying without success)
I guess the events from my SSC (Scala Stream Collector, thinking about future generations) don’t have a valid format, from what I gather from this log. I don’t know if this has something to do with the error, but my SSC config file has the property collector.streams.useIpAddressAsPartitionKey
setted to false
, which is the default for the config.hocon.sample
file.
Any suggestions? Do I have to set this property to true
? For us, grouping user events solely by IP, is not that interesting, that’s why I left this property as false
, but I guess I don’t fully understand its purpose.
Thanks for the help.