my query is do i need simultaneously run both scala stream collector and Stream enrich steps in my CLI inorder to push the events from kinesis stream (kinesis S3) to S3 bucket?
because when i run command to move event from kinesis S3 to S3 bucket it is taking more time to complete and sometimes it will run within minutes.
Please let us know the overall detailed structure how it is designed.
Because we are failing to run the process in most of the times.
Do you use Kinesis or stdi/stdout/stderr? If you are using Kinesis, do you use EC2 for Kinesis S3 storage? If not, maybe transfer takes to long? Maybe different region would help?
The realtime components are always-on components, i.e. they should always be running so that when the enriched events land in the enriched stream they are directly consumed by the s3 loader and pushed to s3.
Using different regions increase transmission delays.
I have never been thinking about turning on and of parts of stack. In general - if you are sure, you wonβt lost, you may leave. But not: Kinesis has limited throughput and EC2 has limited network bandwidth. Connecting the two with back off policy may lead to slow processing and data loss in worst case.
Real time data line was not designed to run in batches. If you want batches, change approach. If you want to use POST for requests, use kinesis collector -> Kinesis -> raw S3 storage. But still: both need