I’m trying to send bad rows to my elasticsearch cluster and I found in the EMR logs (containers/application_*) that the EMR cluster is trying to balance requests between all my ES data nodes:
ERROR [main] org.elasticsearch.hadoop.rest.NetworkClient: Node [10.10.10.14:9200] failed (Connection timed out); selected next node [10.10.10.13:9200]
Is there any way to suppress this behaviour, so that it would connect only to the host I’m supplying in my runner config? I only want to have 1 proxy to the cluster.
thanks for getting back to me.
I’m using 2.4.1. Oh shoot, I think I got it. I left all the es_nodes settings default in my runner config. es.nodes.client.only is false by default, so I need to use ‘true’, in order to stop querying my data nodes, right? Thanks so much for putting me in the right direction.
EDIT: as per https://github.com/snowplow/snowplow/blob/master/3-enrich/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb#L449, I see that es.nodes.wan.only is the only setting that is allowed to be modified for es hadoop config. Will try to enable that one.
EDIT2: yes! Enabling es_nodes_wan_only helped. Thank you for the support.