I have a couple of questions regarding the BigQuery loader and mutator.
I have a pipeline in production which I completely provision using Terraform. It has a VM group for the collector and a VM group to facilitate the beam enrich and the BigQuery Mutator & Loader
When I would like to update my snowplow BigQuery Loader, my VM group will restart and re-execute the mutator task to create the table in BigQuery.
First question: If my table already exists, will the mutator overwrite my table?
Secondly, I would like to use the snowplow deployment to collect, enrich and store the data of multiple websites. Is it possible to separate the data into multiple tables while still using the same pipeline?
No, the mutator shouldn’t overwrite your existing table. By restarting it you’ll clear it’s internal cache with respect to what columns it has created, but this cache is refreshed on initialisation.
If you are running a single pipeline there’s a few different options:
Run multiple collector / pipelines (this might be desirable for first party cookie setting / ITP)
Split data out at enrich time (and have one BQ loader for each app_id for example)
Split data once the data has been sunk into BigQuery and either create an incremental table / view (you probably want a materialised view) per app_id that runs on a frequent basis.
Just given this point a second thought, if I have multiple tags in different GTM’s coming to the same collector (same IP & domain) it aren’t first party cookies anymore?
These can still be first party cookies (for domain_userid) but the question becomes if you want to stitch network_userids across these sites.
Re: multiple tags on different domains - depending on how many domains you are running from a single collector this increases the risk that ITP / Webkit will flag the collector domain as engaged in cross-site tracking.