EMR Job - Failing with SSL handshake?

Hey there, in the last few days I started receiving the error below when trying to run the EMR job:

java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

openssl version
OpenSSL 1.0.2k-fips  26 Jan 2017

./snowplow-emr-etl-runner run -c config.yml -n ./enrichment/ -r ./iglu_resolver.json -t ./targets/ --debug 
D, [2018-01-29T16:45:34.207000 #6972] DEBUG -- : Initializing EMR jobflow
F, [2018-01-29T16:57:26.585000 #6972] FATAL -- : 

Excon::Error::Socket (Unsupported record version Unknown-0.0 (OpenSSL::SSL::SSLError)):
    org/jruby/ext/openssl/SSLSocket.java:222:in `connect_nonblock'
    uri:classloader:/gems/excon-0.52.0/lib/excon/ssl_socket.rb:121:in `initialize'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:403:in `socket'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:100:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/mock.rb:48:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/instrumentor.rb:26:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:16:in `request_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:249:in `request'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/idempotent.rb:27:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:272:in `request'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/idempotent.rb:27:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:272:in `request'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/idempotent.rb:27:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/middlewares/base.rb:11:in `error_call'
    uri:classloader:/gems/excon-0.52.0/lib/excon/connection.rb:272:in `request'
    uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/sax_parser_connection.rb:35:in `request'
    uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/sax_parser_connection.rb:-1:in `request'
    uri:classloader:/gems/fog-xml-0.1.2/lib/fog/xml/connection.rb:7:in `request'
    uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/storage.rb:612:in `_request'
    uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/storage.rb:-1:in `_request'
    uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/storage.rb:607:in `request'
    uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/requests/storage/get_bucket.rb:43:in `get_bucket'
    uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/models/storage/directories.rb:21:in `get'
    uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/models/storage/files.rb:30:in `all'
    uri:classloader:/gems/fog-aws-1.4.0/lib/fog/aws/models/storage/files.rb:51:in `each'
    uri:classloader:/gems/sluice-0.4.0/lib/sluice/storage/s3/s3.rb:69:in `list_files'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:135:in `block in initialize'
    org/jruby/RubyArray.java:2564:in `select'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:133:in `initialize'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:100:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
    org/jruby/RubyKernel.java:979:in `load'
    uri:classloader:/META-INF/main.rb:1:in `<main>'
    org/jruby/RubyKernel.java:961:in `require'
    uri:classloader:/META-INF/main.rb:1:in `(root)'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1:in `<main>'

Hey! I’m having the same issue.

Did you figure out a solution ?

hey, it seems like this issue is related to the number of files in the S3 bucket. If I run with about 100k files, it’s fine. Usually more than that it shows this SSL error message.

Interesting, you mean the number of log files generated by the collector ? I don’t have that many files to process :o
It’s weird because it happens before the s3 copy (first step of the EMR job flow, I’m running R97).

I’ll try running the EMR more often in case …

Does anyone have another theory ?

Also, is there a way to retrieve more detailed logs about this initialization phase ?
Thanks!

Sorry but I need to up this!

This error still comes back … I tried a bunch of different EC2 instance types.
Sometimes it runs correctly a few times but it always comes back to that error and it gets very very rare to not encountering it …

When running it from my local machine, it works … so I guess it’s not a configuration problem.

What kind of EC2 instances are you usually spinning up your EmrEtlRunner process from ? Do they need to have a certain processing power/memory ?

Do we have a way to log more details about what’s happening when the process tries to create the cluster ?

Thanks to anybody who could help on this !

@cmartins @Timmycarbone

We did you find the root cause and solution for this issue? We are facing this issue from past few days consistently.

It would be great if you can help us with this.

Thanks

Those errors are likely to be related to the Java environment of the machine on which you run EmrEtlRunner. You might need to tune the JVM.

Something like this might help

java_args="-server -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -Xmn128m -Xms512m -Xmx512m"
export java_args

@ihor Thanks for the reply. I not able to understand how tuning JVM is related to SSL error?

We are still facing this issue and this has started happening everyday. Retrying multiple times works for us. Is there a way to root cause and fix this?

We are using etlemr runner script on t2.medium instance type.

@rahul, have you considered upgrading your pipeline as well as using newer EC2 type for the server you launching EmrEtlRunner from? The latest pipeline version is R117 released today.

@ihor As suggested, we have upgraded instance type to t2.large for now. We will monitor if that works for us. If that doesn’t solve issue then we will consider upgrading snowplow stack.