Hello,
I’m using the emretlrunner tool to load events into Redshift with nightly scheduled loads. Enrichment and shredding have been working fine, but lately the RDB loading step has begin failing due to what looks like a Hadoop exception, causing the EMR cluster to hang.
The exception happens after shredding (so the shredded/good
directory contains data), specifically on the “[rdb_load] Load AWS Redshift enriched events storage Storage Target” step. The logs from this step show that the RDB loader successfully completes the consistency check, finishes loading the detected shredded/good data, VACCUM queries skipped
, ANALYZE transaction executed
, and then we reach the exception (randomly - sometimes the job works!). Traceback:
Exception in thread "main" java.util.concurrent.TimeoutException: Futures timed out after [5 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:157)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.ready(package.scala:169)
at com.snowplowanalytics.snowplow.scalatracker.emitters.id.RequestProcessor.sendSync(RequestProcessor.scala:162)
at com.snowplowanalytics.snowplow.scalatracker.emitters.id.SyncEmitter.send(SyncEmitter.scala:39)
at com.snowplowanalytics.snowplow.scalatracker.emitters.id.SyncEmitter.send(SyncEmitter.scala:31)
at com.snowplowanalytics.snowplow.scalatracker.Tracker$$anonfun$send$1.apply(Tracker.scala:64)
at com.snowplowanalytics.snowplow.scalatracker.Tracker$$anonfun$send$1.apply(Tracker.scala:64)
at cats.data.NonEmptyList.traverse(NonEmptyList.scala:231)
at com.snowplowanalytics.snowplow.scalatracker.Tracker.send(Tracker.scala:64)
at com.snowplowanalytics.snowplow.scalatracker.Tracker.com$snowplowanalytics$snowplow$scalatracker$Tracker$$track(Tracker.scala:54)
at com.snowplowanalytics.snowplow.scalatracker.Tracker$$anonfun$trackSelfDescribingEvent$1.apply(Tracker.scala:137)
at com.snowplowanalytics.snowplow.scalatracker.Tracker$$anonfun$trackSelfDescribingEvent$1.apply(Tracker.scala:137)
at cats.package$$anon$1.flatMap(package.scala:41)
at cats.FlatMap$Ops$class.flatMap(FlatMap.scala:21)
at cats.FlatMap$ToFlatMapOps$$anon$2.flatMap(FlatMap.scala:21)
at com.snowplowanalytics.snowplow.scalatracker.Tracker.trackSelfDescribingEvent(Tracker.scala:137)
at com.snowplowanalytics.snowplow.rdbloader.interpreters.implementations.TrackerInterpreter$.trackSuccess(TrackerInterpreter.scala:109)
at com.snowplowanalytics.snowplow.rdbloader.interpreters.RealWorldInterpreter$$anon$1.apply(RealWorldInterpreter.scala:197)
at com.snowplowanalytics.snowplow.rdbloader.interpreters.RealWorldInterpreter$$anon$1.apply(RealWorldInterpreter.scala:116)
at cats.free.Free$$anonfun$foldMap$1.apply(Free.scala:155)
at cats.free.Free$$anonfun$foldMap$1.apply(Free.scala:153)
at cats.package$$anon$1.tailRecM(package.scala:43)
at cats.free.Free.foldMap(Free.scala:153)
at cats.free.Free$$anonfun$foldMap$1.apply(Free.scala:156)
at cats.free.Free$$anonfun$foldMap$1.apply(Free.scala:153)
at cats.package$$anon$1.tailRecM(package.scala:43)
at cats.free.Free.foldMap(Free.scala:153)
at com.snowplowanalytics.snowplow.rdbloader.Main$.run(Main.scala:69)
at com.snowplowanalytics.snowplow.rdbloader.Main$.main(Main.scala:36)
at com.snowplowanalytics.snowplow.rdbloader.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I have naively copy/pasted this error and increased the spark.sql.broadcastTimeout
configuration setting as suggested by S/O to 300 but this hasn’t fixed the issue.
Totally appreciate any tips/insights, thank you!