Simplifying wide row column names for event and contexts

Jayant_Kumar · October 31, 2023, 7:48am

Hi Folks,

Currently, I am testing the gcp-lake-loader. There are a few things I want to understand and check.

Currently, the event and context columns are named like this: contexts_com_cloudorbit_op_desk_user_1.

Is there a way to simplify for simplicity? Any in-built transformers for this?
Can I get rid of canonical null columns to shorten the width of the rows?

Thank you!!

evaldas · November 1, 2023, 10:30am

@Jayant_Kumar I don’t see the need to simplify this, in columnar storage it doesn’t really matter much how many columns you have as it has minimal effect on data size (null columns/fields don’t allocate any extra space).

Jayant_Kumar · November 1, 2023, 6:01pm

It just looks little messy with bunch of attributes which means nothing to many.

Arguably it would be better to have some sort of filter support to disable some of them by defaults.

mike · November 1, 2023, 10:37pm

I’m not sure how you could do this? The idea in the creation of columns is that they should (at some stage) contain non-null values so you can’t really hide them at all. Ideally nobody should really be looking at the raw table - in downstream modelling you can select and filter out to just the things you are interested in.

Jayant_Kumar · November 2, 2023, 3:43am

@mike If you think about creating data catalog over the raw data. It will look like a mess.

I was thinking to drop events using transformers after enrich stage. But I am not sure if that work, the reason being the loaders deserialise events using Event from analytics sdk. So it may not help that.

If there would have been a way to drop null fields using a config or something it would have been great. Loaders anyway support schema evolution, so when those columns have values, they cam start dumping.

Topic		Replies	Views
Customise column names for Event and Entities Storage targets	6	709	November 22, 2023
BigQuery Events Table Schema - Mode change from repeated to nullable Storage targets	3	2604	November 22, 2021
Denormalized context event_name / versions in events table For engineers	1	1784	March 21, 2019
Add non repeated context columns GCP pipeline	1	823	April 20, 2022
Unstructured Events and the events table Data store sources	5	2082	March 7, 2019

Simplifying wide row column names for event and contexts

Related topics