Load into a BigQuery table partitioned by date with local time zone

Hello all,

most of our analytical queries refer to data of a whole day in local Central European time zone, i.e. CET or UTC+01:00 or CEST or UTC+02:00. However, since the field “derrived_tstamp” is reported in UTC, basically two partitions are queried for a day in local time zone.
To improve the efficiency of our queries and only query one partition per day in the future, I would like to have a field “derrived_tstamp_local” in the table to use for partitioning instead. I am not sure how and at which point of the process the local time should ideally be created and added. I think it should be possible to create a schema and a new context with a JavaScript enrichment, which enriches each event with the local time to derrived_tstamp. However, I am not sure if it is possible to partition on nested fields of a custom context in BigQuery and if this approach is not too complicated.
So what do you guys think is the right way to create such a partitioning field “derrived_tstamp_local”?

Thanks in advance
Richard

Hi @riwi ,

That’s for sure possible.

From BigQuery documentation:

The partitioning column must be a top-level field. You cannot use a leaf field from a RECORD (STRUCT) as the partitioning column.

So unfortunately you can’t do that.

If you don’t care about the original derived_timestamp, you could use JS enrichment to mutate it in-place to the timezone that you want. You’d need something like that in the script (not tested):

function process(event) {
    var oldTimestamp = event.getDerived_timestamp();

    val newTimestamp = ...
  
    event.setDerived_timestamp(newTimestamp);

    return [];
}

But please be aware that in some future we’ll remove the possibility to mutate the events in-please like this.

Another possibility would be to partition the table hourly so that you query only the partitions that you’re interested in in your analytical queries.

3 Likes

I’d echo this approach - ideally have a data model (dbt or otherwise) that creates and partitions by your converted column rather than having to go back to the events table as much as possible to avoid rescanning those partitions.

1 Like