I’m running the Snowflake transformer on a large backlog of data, so the job is running for 6+ hours. It’s just failed with the following message:
Failure Message
18/07/02 04:30:09 INFO Client: Application report for application_1530484047344_0001 (state: FINISHED)
18/07/02 04:30:09 INFO Client:
client token: N/A
diagnostics: User class threw exception: shadeaws.services.dynamodbv2.model.AmazonDynamoDBException: The security token included in the request is expired (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ExpiredTokenException; Request ID: 68RCJVDDVAOET7N9VGO6GJPRMFVV4KQNSO5AEMVJF66Q9ASUAAJG)
ApplicationMaster host: 172.31.40.159
ApplicationMaster RPC port: 0
queue: default
start time: 1530484225027
final status: FAILED
tracking URL: http://ip-172-31-43-69.eu-west-1.compute.internal:20888/proxy/application_1530484047344_0001/
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1530484047344_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1104)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1150)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/07/02 04:30:09 INFO ShutdownHookManager: Shutdown hook called
18/07/02 04:30:09 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-18b2a698-11d7-4a93-a965-0d5c38c68f3f
Command exiting with ret '1'
I’m assuming I can just re-run to carry on where it left off? Is there anything I can do to avoid this error in future?
Thanks!
Iain