Could not upload data because column already exists

I try to upload history data in Snowflake DW. But get such issue:

2021-03-04T02:13:25.636Z ERROR: Error during unstruct_event_com_shipment_forwarded_1 column creation, while loading enriched/archive/run=2021-03-03-02-05-27/. SQL compilation error:
column 'UNSTRUCT_EVENT_COM_SHIPMENT_FORWARDED_1' already exists
No new columns were added, safe to rerun. Trying to rollback and exit

Message sound really srange: column exists, so it is ok to load data, but it was rollbacked. Could anyone clarify what’s going behind the issue and why it could not load data?

EDIT: 2021-03-04 16:05 UTC – Initially I suggested that Snowflake Loader inspects the atomic.events table to determine what columns need to be added, which is incorrect and changes the conclusion as well.

Hi @sphinks , the error message suggests that Snowflake Loader tried to create the column UNSTRUCT_EVENT_COM_SHIPMENT_FORWARDED_1 but the ALTER TABLE SQL statement failed because the column already existed in the atomic.events table.

The way Snowflake Loader works (simplified to fit the current discussion) is:

  • check state of processing manifest
  • determine what data needs to be loaded and if any new columns need to be created (because the manifest does not indicate they have been created)
  • create those columns
  • load the data.

In this case, the column could not be created.

It is safe to rerun in the sense that no data was loaded, so there will be no duplication.

How did it fail in the first place? Hard to say without more information, but you mentioned that this is historical data. Perhaps your manifest does not contain the proper information required by the loader to determine that the column already exists.

If that is the case, when re-running the job it should go through the same process as described above; and it will probably fail again because the manifest would still be telling it that the column needs to be created.

Could you please check and let us know some more details about what you are trying to do: timestamps of runs you want to load (min and max), timestamps of data in your processing manifest table and anything else that seems relevant. We can then try to advise on a way forward.

@dilyan I confused a little bit with your note on edit, so I’m not sure if should answer the questions you have posted below or not.

I totally understand that alter table try to add column, but fails. However, I’m not sure why it can not be done in safe manner: try to add column, in case it is already exists - go ahead.

Nevertheless, how can I obtain additional info (turn on some debug logging)?

@sphinks, If the column already exist that fact has to be reflected in the DynamoDB manifest table. That is the status of manifest has to be synced with the actual Snowflake DB status. There are two ways to attend to this matter.

  1. If the columns exist in Snowflake DB and it contains no data, you can drop it and restart the job again
  2. If there is some data already, then you need to have the manifest updated to reflect the existence of that column

In the latter case, it is sufficient to add the shredded type corresponding to the existing column name to some earlier DynamoDB record in column ShredTypes. The structure of the manifest table is described here.

1 Like