By default everything in Snowplow from trackers to collectors seems to be set to use HTTP instead of HTTPs. My guess is that this is for legacy reasons and because it is easier to setup. However, it seems like this would enable anyone that listens to the traffic to read the event in plain text.
Since it is also the default mode for the trackers, it seems like it would be an easy mistake to accidentally send it over HTTP as HTTPS is default for most other things. Could you add an option to disable HTTP calls so that you get an error message instead?
For example by not adding a lister in the alb module?
By default everything in Snowplow from trackers to collectors seems to be set to use HTTP instead of HTTPs
This isn’t the case - to look at the three most popular trackers, the javascript tracker uses the protocol of the page it’s on if not specified, and both mobile trackers default to https. There might well be some of the less-recently-updated trackers that default to http (if you’ve found some please raise an issue on their repos to change this), but generally the more recently updated ones should default to https.
I can’t speak to the suggestion on adding something to the quickstart terraform modules, since I don’t know much about it/don’t have any responsibility over that area.
It is also default for the JavaScript Browser tracker if you are unlucky.
Setting the event request protocol
Normally the protocol (http or https) used by the Tracker to send events to a collector is the same as the protocol of the current page. You can force the tracker to use https by prefixing the collector endpoint with the protocol. For example:
I agree that HTTPS everywhere is a good idea but unfortunately it’s really tricky to do things like changing defaults without a lot of care. For both the Python and Javascript trackers there will inevitably be pipelines (both old and new) that track successfully over HTTP and changing this default to HTTPS may mean that the tracker can no longer flush events to the collector successfully. In addition there will be certain older devices (think IoT devices) that may only support HTTP.
For applications that have dependencies pinned, so a breaking change can be made / communicated, this is likely to be less of an issue but for applications where this isn’t the case and they pull down the latest tracker versions then this increases the risk of data loss for that pipeline.
Seems reasonable, but a good compromise would then be to at least allow responsible users to silently accepting HTTP events on the collector side. HTTP could still be default on, but with an option to turn it off. Now we can only hack that option in (e.g. by replacing the official load balancer Terraform module with a one that disables HTTP listeners).
Further, I think that risk must be weighted against continuing to leak data accidentally. Fools such as myself might not realise the Python tracker is defaulting to HTTP. It would be a breaking change, so in version notation it would be a major version despite being a small change. Upgrading this would teach developers the standard software practice of locking versions and might inform others that they are leaking data.