Writing Custom Spark Lake Loader for Iceberg

Jayant_Kumar · November 6, 2023, 6:28am

I have managed to create an Iceberg Biglake table using the lake loader by following the steps mentioned above. As you said, it’s very tricky to get the linking right during the Biglake table creation itself; otherwise, it’s never linked to BigQuery.

The workaround would be to link this table with BigQuery explicitly by pointing to the metadata file in the warehouse. Though I did not try, it’s worth trying.

I am yet to test a few features like schema evolution, access control, etc.

I have a couple of requests/suggestions.

@Simon_Rumble @istreeter

We should release the Biglake loader image to Docker Hub to let the community use the official image itself.
Secondly, modify the spark caster and transformers in a way to allow them to be used extensively with Apache Spark(Batch+Streaming).

Topic		Replies	Views
Kafka to BigQuery/GCS loader Storage targets	15	1185	October 28, 2023
Lake Loader 0.4.1 released New releases	8	389	November 27, 2024
[AWS Lake Loader Iceberg] Additional "_recovered_" custom context columns Storage targets	4	350	March 22, 2024
Snowplow and the Apache Iceberg Ecosystem Storage targets	4	1218	April 24, 2023
Inferring schema from json Spark	6	16106	June 16, 2017

Writing Custom Spark Lake Loader for Iceberg

Related topics