Snowflake transformer failing with Futures Timed Out message

iain · July 7, 2020, 3:02pm

Our snowflake transformer has started failing with the following error message in the cluster log:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/10/__spark_libs__6496353209541393289.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/07/07 14:40:50 INFO SignalUtils: Registered signal handler for TERM
20/07/07 14:40:50 INFO SignalUtils: Registered signal handler for HUP
20/07/07 14:40:50 INFO SignalUtils: Registered signal handler for INT
20/07/07 14:40:51 INFO ApplicationMaster: Preparing Local resources
20/07/07 14:40:53 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1594132618622_0001_000001
20/07/07 14:40:53 INFO SecurityManager: Changing view acls to: yarn,hadoop
20/07/07 14:40:53 INFO SecurityManager: Changing modify acls to: yarn,hadoop
20/07/07 14:40:53 INFO SecurityManager: Changing view acls groups to: 
20/07/07 14:40:53 INFO SecurityManager: Changing modify acls groups to: 
20/07/07 14:40:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users  with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
20/07/07 14:40:53 INFO ApplicationMaster: Starting the user application in a separate Thread
20/07/07 14:40:53 INFO ApplicationMaster: Waiting for spark context initialization...
20/07/07 14:40:55 WARN JsonMetaSchema: Unknown keyword exclusiveMinimum - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
20/07/07 14:42:33 ERROR ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:401)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
20/07/07 14:42:33 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
20/07/07 14:42:33 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
20/07/07 14:42:33 INFO ApplicationMaster: Deleting staging directory hdfs://ip-10-80-20-232.eu-west-1.compute.internal:8020/user/hadoop/.sparkStaging/application_1594132618622_0001
20/07/07 14:42:33 INFO ShutdownHookManager: Shutdown hook called

I’ve tried increasing the instance size to m2.xlarge (it was previously a medium) to no avail.

The only change we’ve made recently is altering the pathname of the streaming enrich output folders from YYYY-MM-DD-HH to YYYY-MM-DD-HH-mm to get around a problem of the loader missing data when it was loading incomplete buckets. However, the snowflake loader has been running for a couple of days succesfully since then. It’s a fairly low volume of events overall (less than 5k per day).

Advice appreciated!

anton · July 8, 2020, 8:22am

Hi @iain! What’s the version of the Loader? Slightly similar bug existed in very old versions where DynamoDB client was trying to fetch the manifest for too long and blocking the ApplicationMaster from initialising.

iain · July 8, 2020, 9:35am

Hi @anton , it’s version 0.6.0

anton · July 8, 2020, 11:02am

Can you try out Transformer 0.4.2 (Loader can remain at 0.6.0)? It’s the version we widely use inside Snowplow unless we need bad rows.

Bad rows is the only thing we added to Transformer since 0.4.2, but Loader itself got a lot of changes since then. I’ll check later if I re-introduced the bug I mentioned before, because it looks suspiciously similar.

iain · July 8, 2020, 3:03pm

Version 0.4.2 of the transformer is working - thank you!

Do you want me to open an issue on github for the 0.6.0 problem?

anton · October 6, 2020, 2:19am

Hi @iain! Just a belated heads-up that we’ve fixed the bug in latest 0.8.0

Topic		Replies	Views
Snowflake transformer fails in EMR step Troubleshooting	3	1817	December 10, 2020
Snowflake Transformer failing exitCode: 15 Troubleshooting	4	1390	December 16, 2021
Snowflake loader stopped processing enriched files Troubleshooting	6	659	August 14, 2023
Snowflake Transformer/Loader Stops Working Randomly Troubleshooting	6	999	December 5, 2022
Snowflake Transformer failing due to 'Timeout waiting for connection from pool' Troubleshooting	3	1702	May 21, 2020

Snowflake transformer failing with Futures Timed Out message

Related topics