Snowplow BQ mutator and repeater

Hanumanth · August 25, 2021, 4:58am

Hi,

I have a couple of questions about the BQ mutator and repeater. I have configured mutator(listen) and repeater and they are always running on VM. I was going thru the snowplow official documentation to find answers to the below questions. But, somehow couldn’t find it.

Questions:

Does the BQ repeater depend on the BQ mutator?
Does the mutator passes any signal to the repeater on completion of its job?
What happens if the mutator fails to do its job? Does the repeater stop processing the events?
Is there a default time set up for mutator? What if mutator takes longer time than usual time?
is there communication between mutator and repeater when they are running sequentially?
How do mutator and repeater communicate with each other?

Would be great If you could elaborate answers to the above questions with some insights?

Thanks

anton · August 25, 2021, 2:54pm

Hey @Hanumanth!

These are great questions. Although, the answer is always - they don’t communicate, they don’t know about each other.

Not directly. They certainly can run one without another, but if nothing mutates the table (it can be a human) - repeater’s records will always be failing
Nope, it’s completely timeout-based
Also - no. As they’re independent, repeater will never find out. Good news though is that it’s very unlikely that mutator fails. Even in a chance of a very rare connection issue - the mutator will receive another batch of types and will do another attempt to mutate the table
No, but it never takes long. Less than a second since it received a batch with types. Usually around minutes since Loader received first event with a new type
Also no. But it’s important they they’re not designed to run sequentially, but instead in parallel. They’re both fairly lightweight applications (especially mutator) and it shouldn’t be an issue to run them in parallel.
They don’t.

One question you’ve missed (or maybe you found this in documentation) is about repeater’s timeout, i.e. how long it will be waiting until making a decision to abandon a record. It’s configured via --backoffPeriod CLI option (will be a config option in 0.7.x). It’s also important to note that the age is derived from etl_tstamp property - a time that has passed since enrich processed the event.

Hanumanth · August 26, 2021, 8:07am

Hi @anton
Thank you for your response. Can you please elaborate on the second answer? Is there a document where I can find detailed information about mutator and repeater?

Here I have two more questions,

what if the mutator is running over the repeater’s time?
What if the mutator is busy doing something else when the request comes to it?

Apart from Snowplow BigQuery Loader - Snowplow Docs

Topic		Replies	Views
Bigquery mutator and repeater works abnormally GCP pipeline	5	1396	October 22, 2021
About BigQuery startup script GCP pipeline	2	1062	October 16, 2021
BigQuery Loader - Mutator GCP pipeline	6	1606	May 7, 2020
Running Repeater and Mutator on Serverless Platform GCP pipeline	5	1025	September 30, 2021
Dead letter bucket with no such field error Troubleshooting	4	1532	July 8, 2022

Snowplow BQ mutator and repeater

Related topics