Transient rdbloader error: [Amazon](60000) Error setting/closing connection

bsweger · April 23, 2018, 6:19pm

Yesterday, the rdbloader step of our batch pipeline (R92) failed after data discovery and logged the following error message:

ERROR: Data loading error Amazon Error setting/closing connection: Connection reset by peer.
Following steps completed: [Discover]

I was able to load the shredded data by re-runing with the --resume_from rdbloader option but want to see if there’s something I should be doing to prevent this from happening again.

We’re currently running R92 (i.e., rdb_shredder: 0.12.0 and rdb_loader = 0.13.0). I’ve seen a few references (linked below) to the 60000 error, but am unsure if those are what caused the above error.

We upgraded to R92 several weeks ago, and things have been working well. In addition, the failure we saw yesterday was on a small, weekend volume of data compared to what we normally process for a regular business day.

Is there a way I can definitively tell whether or not our error is related to one of the following?

1.RDB Loader: make SSL configuration compatible with native JDBC settings · Issue #73 · snowplow/snowplow-rdb-loader · GitHub
2. Common: add NAT Gateway fix script · Issue #85 · snowplow/snowplow-rdb-loader · GitHub

Thanks for your help!

anton · April 24, 2018, 3:57pm

Hello @bsweger,

We indeed had an issue RDB Loader’s connection that resulted in similar error. That issue was caused by NAT Gateway and fact that it makes RST and FIN packets to misbehave. This was solved by this EMR bootstrap script. However, you mentioned that volume was quite small in that case, but problem I’ve described happened only when loading takes more than ~10 minutes and I’m surprised you didn’t encounter it before. Maybe you remember any changes you’ve made to your infrastructure recently?

bsweger · April 25, 2018, 7:10pm

Hi @anton,

TL;DR More poking around confirms that you’re right about the NAT Gateway–thank you. Can’t quite follow the Github threads…is there a specific release we should upgrade to for this fix?

More info for anyone else who might find this thread…

We’ve had no recent infrastructure changes.

I was skeptical that we hit that NAT Gateway issue on a low-volume day after running R92 successfully for several weeks, so I checked our Redshift logs (STL_QUERY):

The total elapsed time of all the COPY statements on normal-volume runs are between 5 - 8 minutes
The atomic.events load took ~ 11 min on the day we got the RDB Loader error (not sure why, since it only took 40 seconds on the re-run, but that’s a different issue)

Thanks again for confirming this.

anton · April 25, 2018, 9:44pm

Hey @bsweger,

I’m glad we’ve figured out what happened!

Sorry for mess in GH threads - most conversations happened internally, indeed. The quickest workaround for this problem is to add s3://snowplow-hosted-assets/common/emr/snowplow-ami5-bootstrap-0.1.0.sh to aws.emr.bootstrap (bear in mind - this is an array, not a string) in your config.yml. Another way is to update EmrEtlRunner to R102, which runs this script by default.

Hope this helps.

Topic		Replies	Views
RDB Loader 1.0.0 "Database error: [Amazon][JDBC](10060) Connection has been closed" Storage targets	2	1756	May 28, 2021
S3 curl Error on rdb loader For engineers	1	2038	April 20, 2021
Cannot acquire connection [Amazon](500150) Error setting/closing connection: UnknownHostException (RDB Loader) Storage targets	10	3699	July 1, 2021
RDB Loader step getting failed after loading few events Storage targets	2	932	October 29, 2020
R90 storage loading problems Troubleshooting	9	2287	October 19, 2017

Transient rdbloader error: [Amazon](60000) Error setting/closing connection

Related topics