BigQuery Mutator: Schema array field mapped to string

Hi!

I’m not yet sure, what happened exactly and if it’s an error on my side or the mutator:

I have created a new schema that contains a nested array. However in BigQuery the field is now of type string and the data I’m sending as an array is converted to a string representation of the array.

This is the relevant part of my schema. Linting the schema works fine.

    "anonymous_appstore_purchase": {
      "type": [
        "object",
        "null"
      ],
        "properties": {
          "permissions": {
            "type": [
              "array",
              "null"
            ],
            "description": "Permissions granted to the user"
          },
          "product_id": {
            "type": [
              "string",
              "null"
            ],
            "description": "Product id of the app store access right",
            "maxLength": 255
          }
        },
      "description": "App store access rights"
    }

The permissions array is manifested as a string in BigQuery however:

image

Is this intended behaviour of the mutator or did I maybe setup my schema wrong? Are there logs I could check on what the mutator did when integrating the schema?

Thanks a lot for your help!

Hi @volderette , I believe that is the expected behaviour as you can see from how schema types translate to datatypes here: How schemas translate to database types | Snowplow Documentation

Hi Ryan,
thanks for the link! If I’m going one level up in my schema, I have arrays defined, that are translated to repeated records:

image

    "products": {
      "type": [
        "array",
        "null"
      ],
      "items": {
        "type": [
          "object",
          "null"
        ],
        "properties": {
          "access_right_unique_id": {
            "type": [
              "string",
              "null"
            ],
            "description": "The access right a user has",
            "maxLength": 255
          },
          "internal_title": {
            "type": [
              "string",
              "null"
            ],
            "description": "The internal title of a product a user has access to",
            "maxLength": 255
          },
          "product_id": {
            "type": [
              "string",
              "null"
            ],
            "description": "The product id of a product a user has access to",
            "maxLength": 255
          }
        },
        "description": "Product properties",
        "additionalProperties": false
      },
      "description": "Products the user is subscribed to"
    },

I couldn’t find the array definition in your linked docs however…

Ok, I’m wondering if it’s even possible within BigQuery to have the structure I’m trying to create…

Hey @volderette,

I think what you want is a STRING REPEATED record type as permissions, but for that you’ll need to specify further in your schema. I think something like this might work to get the desired result:

"anonymous_appstore_purchase": {
      "type": [
        "object",
        "null"
      ],
        "properties": {
          "permissions": {
            "type": [
              "array",
              "null"
            ],
            "items": {
              "type" : "string",
              "description": "Add description for permissions",
              "maxLength": 1024
            },
            "minItems": 0,
            "description": "Permissions granted to the user"
          },
          "product_id": {
            "type": [
              "string",
              "null"
            ],
            "description": "Product id of the app store access right",
            "maxLength": 255
          }
        },
      "description": "App store access rights"
    }

I’m not sure about the minItems, whether that should be 0 or 1 but just give it a try and see what happens. If you have limited permissions then you can change the type of permissions['items'] to "enum" if you want, but either way in BQ the expected data type is STRING.

Feel free to test this out and let us know how it goes!

Thank you both! I will try that!

array without items would fall back to STRING. We should improve our docs.

FWIW we did update the docs to call out this behavior: How schemas translate to database types | Snowplow Documentation (thanks @Pavel_Voropaev!)