for our customer we would like to test snowplow for tracking events in a Video-Player to add custom report in a Video Backend.
The customer is from the financial sector and very strict with privacy policys. Therefor we will not be allowed to use a Cloud Service.
Is there a way to use snowplow even without Amazon AWS. Which components must be replaced.
If AWS is absolute necessary, is there an alternative to implement an event tracking on its own?
I don’t think it’s possible to run Snowplow easily without using AWS. I suspect if you were 100% set on self hosting you’d have to cobble together components from the open source ecosystem (OpenStack etc) that have AWS compatible APIs in order to get everything working - or make major changes in the codebase to facilitate this.
That said I’m not sure what the requirement is that this can’t be hosted in AWS. AWS have all the compliance and regulation requirements in order to run almost any financial (and some government) applications in the cloud including the latest PCI DSS certifications.
This should not be a problem. Initially deploy a proof of concept on snowplow mini. Then take a streaming route with Kafka, use postgres loader. If too much traffic for postgres, get emc greenplum on premises. You can also try actian matrix instead of redshift, but ensure they will support for at least 3 years. For s3 on premises, they can go with cloudian.
Yes - as Dashirov says, start with Snowplow Mini, and then migrate to our beta Kafka release. We have other banks using the Kafka components already.
You will need to make a decision about how to surface the event stream data to your analysts. You could use Spark, or invest in an on-prem columnar database like Greenplum, or Vertica or ClickHouse or EventQL.
Let us know how you get on, and do contribute back to our on-premise Snowplow pipeline components!
Thanks for the quick support. As suggested, I will start with my colleagues with the evaluation of Snowplow Mini. When we have some results we will come back.