Update custom schema but mutator didn't work

phxtorise · October 26, 2021, 7:36pm

Hi,

My Snowplow pipeline is deployed on GCP. And my Iglu server is a static repo hosted on Google Cloud Storage. Recently I met a problem when I wanted to update one of my custom schemas in the Iglu repo. I added a new property in the schema, and uploaded and overwrote it onto GCS with the same name “1-0-0”. Using the same name is because there are already a large amount of data stored with this version, and I didn’t want to lose them. But I found BigQuery Mutator could not help me update the new schema in BigQuery. If I ssh into the VM and use mutator add-column, it will also report an error. So my current solution is to manually update the BigQuery table schema through bq command line. I am not sure whether it is a good practice, or whether it will cause some issues in the future. Is there any suggestion or better practice for such a situation?

mike · October 26, 2021, 10:05pm

Best practice is to never modify a schema once it has been deployed to production and to treat it as immutable. If you need to add new columns you should use schema versioning to increment the version of your schema - and the mutator will then take care of adding a new column for you (it has no capability to alter existing columns).

phxtorise · October 26, 2021, 11:22pm

Thanks, @mike! And another question is, if I add a schema of a new version like upgrading from 1-0-0 to 1-0-1, and I want to move the data of column 1-0-0 to the column 1-0-1 in bigquery table, and then remove the column 1-0-0, is there any convenient way to do this? Or is it a suggested way?

mike · October 26, 2021, 11:54pm

The suggested way is to COALESCE these columns (either at query time or materialisation) as this will preserve the version associated with the event.

If you are using a data model you can just pick the first non-null value.

Topic		Replies	Views
BQ Mutator not adding column from custom schema GCP pipeline	4	1076	July 13, 2022
Mutator Exception Adding Custom Schema For engineers	2	762	April 11, 2022
Bigquery mutator and repeater works abnormally GCP pipeline	5	1396	October 22, 2021
Mutator Exception Adding Custom Schema via GKE Mutator pod Enrichment	5	979	July 5, 2022
Schema Violations error GCP pipeline	2	1164	January 27, 2022

Update custom schema but mutator didn't work

Related topics