Howdy everyone. I’m currently in the process of setting up a real-time component for a snowplow pipeline. Presently, the elasticsearch cluster is provided by AWS ElasticSearch Service, rather than an independent ES stack in EC2 (or elsewhere). Stream enriching, and sinking to this cluster, work with my barebones/ad-hoc deployment (single stream enrich instance, single elasticsearch-sink instance) after whitelisting the IP of the sink instance in the AWS ES Service’s access policy.
However, future plans are of course to have apps in appropriate ASGs, with automated deployments. I’ve thought of a couple different approaches for handling automated access to the cluster, such as:
make a static proxy instance with EIP, whitelist this instance in the AWS ES Service, and have any elasticsearch sink app proxy traffic through this instance
Utilize a NAT gateway with an EIP, whitelist that EIP, and ensure sink instances utilize that NAT gateway
Bite the bullet and build out our own ES cluster
None of these are optimal, as each adds maintenance overhead. It’d be much more straightforward if the sink apps were able to sign requests to the AWS ES Service endpoint via an iam role (like how reading/writing the kinesis streams is set up already). With that being said…
Is there any capability for the elasticsearch sink app to utilize request signing when sinking to AWS ElasticSearch Service endpoints? Is there a setting I’ve just missed? Sniffing the traffic just shows raw POSTs to the endpoint with no signature headers.
Is there perhaps some other approach that I haven’t though of?
Currently no there is no support for request signing in the Elasticsearch Sink. It would be a much cleaner approach to sinking data to the service in lieu of the fact that we cannot put it into a VPC.
I did find an approach that appears to work with our current Elasticsearch Client library (Jest) with signing on this thread:
I have created a new ticket to track this as I think it would be a great feature to have included! Especially if AWS eventually adds something like the VPC -> S3 endpoint so we can stop traversing the public network to sink events.
The current approach has been the one you have mentioned to use NAT gateways and to whitelist them. This has worked quite well for us at high volumes.
I guess from a security standpoint we have always looked to have our micro-services nested in private subnets and thus hidden from the public internet as much as possible - meaning that we would need to traverse the NAT irrespective of request signing to actually get data to the Elasticsearch Cluster.
Hi, We are looking for a similar process of signing Elasticsearch requests (for AWS ElasticSearch) as well. I can see from the github tracker that the item is still open. Is there any progress/milestone decided?
Also, as mentioned,
Can you please describe this alternative approach in detail, so that I can follow it temporarily.