But when I was at a debug session to identify where I was “losing data”, I realized that I could send the output/stdout of [Step 2] directly to enrichments process on the same instance, cutting [Step 2], 1 kinesis stream with 6 shards at [Step 3] and eliminating 3 instances for [Step 4].
The output of enrichment process is sent to Kinesis just because I cant send data directly to Elasticsearch if my input is from stdin.
Does it make sense? What are the cons about this decision?
There is nothing wrong with this approach if you are expecting quite low amounts of information coming through your pipeline. This setup is exactly what we have done for Snowplow Mini, where we also pipe from Stream Enrich directly to Elasticsearch Sink (this ability was added in r78). So there are no Kinesis Streams used at all; and everything is contained to a single instance.
However in a situation where you have sudden spikes of events or just generally large amounts of events the ability to distribute and scale distinct applications in the pipeline is quite important. It does make sense if you have very low volumes of data but in any other situation it means you run the risk of your stack failing due to back pressure on other apps. The Kinesis Stream here allows you to queue vast amounts of events without any worry about back pressure.
If you would like to retain this ability to scale I would suggest using something like Snowplow Mini for your testing/debugging and go back to your original setup for a production environment.