Update custom schema but mutator didn't work


My Snowplow pipeline is deployed on GCP. And my Iglu server is a static repo hosted on Google Cloud Storage. Recently I met a problem when I wanted to update one of my custom schemas in the Iglu repo. I added a new property in the schema, and uploaded and overwrote it onto GCS with the same name “1-0-0”. Using the same name is because there are already a large amount of data stored with this version, and I didn’t want to lose them. But I found BigQuery Mutator could not help me update the new schema in BigQuery. If I ssh into the VM and use mutator add-column, it will also report an error. So my current solution is to manually update the BigQuery table schema through bq command line. I am not sure whether it is a good practice, or whether it will cause some issues in the future. Is there any suggestion or better practice for such a situation?

Best practice is to never modify a schema once it has been deployed to production and to treat it as immutable. If you need to add new columns you should use schema versioning to increment the version of your schema - and the mutator will then take care of adding a new column for you (it has no capability to alter existing columns).

Thanks, @mike! And another question is, if I add a schema of a new version like upgrading from 1-0-0 to 1-0-1, and I want to move the data of column 1-0-0 to the column 1-0-1 in bigquery table, and then remove the column 1-0-0, is there any convenient way to do this? Or is it a suggested way?

The suggested way is to COALESCE these columns (either at query time or materialisation) as this will preserve the version associated with the event.

If you are using a data model you can just pick the first non-null value.