Consuming Kinesis Enriched Stream Cross-Account

af-tomwilkins · December 9, 2020, 3:40pm

I am a Snowplow Insights customer, so the pipeline infrastructure exists in an AWS Sub-Account. I’d like to set up a consumer for the Kinesis enriched stream using a subscribed Lambda, an approach documented here:

https://snowplowanalytics.com/blog/2020/10/30/accessing-snowplow-data-in-real-time-in-aws/

The problem is that a Lambda can only be subscribed to a Kinesis Stream existing in the same account, so I’m wondering what best practice would be for making the stream consumable from our main account?

Our primary use-case is to trigger processes in our application in reaction to events in the stream.

My current approach is to have a Lambda in the sub-account subscribed to the stream, with permissions to assume a role and invoke another Lambda on the main account passing in the payload, as suggested below. I have this working as a proof of concept but I’d appreciate advice on whether there is a better way to do it.

Thanks!

Colm · December 9, 2020, 4:07pm

Hi @af-tomwilkins,

As a Snowplow Insights customer, this kind of question is best directed through our dedicated support and customer success teams. Could you pop this into an email to support@snowplowanalytics.com?

This will make it easier for our Engineering and Tech Ops team to collaborate to get it sorted for you.

af-tomwilkins · December 9, 2020, 4:14pm

Thanks @Colm, I’ve just sent an email. I’ve struggled to find anything on this publicly so thought I’d make a contribution with this topic.

Once resolved I’ll post back!

af-tomwilkins · December 18, 2020, 3:37pm

For the benefit of anyone stumbling across this, it boiled down to two serverless options:

Sub-account Lambda pushes to a replica Kinesis Stream in the main account
Sub-account Lambda publishes events to an SQS Queue in the main account

SQS was the clear choice for us as we wanted our solution to auto-scale. AWS managed auto-scaling isn’t available for Kinesis Streams and to implement our own scaling solution would have added too much complexity to our project.

jeroen.v · December 18, 2020, 4:45pm

Hi, I think it is possible to just read from the Kinesis stream directly. Never tried but it is indicated here in the docs how cross account trust can be established: https://docs.aws.amazon.com/kinesisanalytics/latest/java/examples-cross.html

And about the auto scaling: depends what you want to do. Kinesis Firehose and Kinesis Analytics Applications (Apache Flink) do scale automatically. We have good results with a Flink application that is auto scaled by AWS. It is doing real time analytics of news paper articles (around a million read events/day).

Kind regards, Jeroen

Topic		Replies	Views
Real-time reporting using AWS Lambda and DynamoDB: a tutorial to compute the number of players in a game level on the Snowplow event stream (1/2) AWS real-time pipeline	0	23652	February 21, 2017
Why is Snowplow using Kinesis/Kafka for real-time pipeline? AWS real-time pipeline	4	6038	July 12, 2016
Real-time pipeline AWS real-time pipeline	2	1955	May 24, 2018
Realtime GCP pipeline GCP pipeline	7	1745	August 31, 2021
Stream Enrich Setup on AWS AWS real-time pipeline	0	989	October 28, 2021

Consuming Kinesis Enriched Stream Cross-Account

Related topics