Adding optional field to existing schema, RDBLoader failure

mgloel · August 31, 2021, 1:29pm

Hi,

I am facing the following issue.

I want to add an additional optional field to an existing schema.

In the schema file 1-0-0:
 "isEnrolled": {
            "type": ["boolean","null"],
            "description": "true if user is enrolled"
        },

In the SQL file:
"is_enrolled" BOOLEAN       ENCODE RUNLENGTH,

The data should end up in the same table as usual. Some of the data (everything up to next week’s release) will not contain the field.

I added the column to the existing table by running:

ALTER TABLE atomic.com_myapp_my_session_context_1
ADD COLUMN is_enrolled boolean ENCODE RUNLENGTH NULL;

The column is added. However the RDBLoader throws the following error message:


INFO 2021-08-31 11:50:40.294: COPY atomic.com_myapp_my_session_context_1 FROM 's3://sp-shredded-alt/good/run=2021-08-31-11-23-01/vendor=com.myapp/name=my_session_context/format=json/model=1'
INFO 2021-08-31 11:50:40.294: COPY atomic.com_myapp_my_session_context_1 FROM 's3://sp-shredded-altgood/run=2021-08-31-11-23-01/vendor=com.myapp/name=my_session_context/format=json/model=1'

2021-08-31T13:50:40.784+02:00

Copy
RDB Loader unknown error in executeUpdate
RDB Loader unknown error in executeUpdate

2021-08-31T13:50:40.784+02:00

Copy
java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'com_oneapp_weconnect_session_context_1' failed.  Check 'stl_load_errors' system table for details.;

The stl_load_errors table says that the boolean format is unknown?

err_code 1210 Unknown boolean format

It seems to me that the RDBLoader has problems writing the data that does not contain that boolean field is_enrolled into the redshift database…

Is it at all possible to add to the same table? Or should we update the version number after such a change and start writing into a completely new table based on schema 1-0-1?

anton · August 31, 2021, 2:56pm

Hello @mgloel,

Do you use TSV or JSON format? For TSV, RDB Loader is supposed to handle table migrations all on its own.

If that’s JSON, you need to make sure your corresponding JSONPath file has a proper column order.

mike · August 31, 2021, 9:55pm

If you are adding a column then yes - ideally you should be bumping this to 1-0-1. Depending on what version of the pipeline you are using as @anton has mentioned above then RDB loader will handle the schema migration for you - there’s no requirement to manually modify the table.

mgloel · September 1, 2021, 7:17am

Hi,

thanks a lot for your quick replies. Actually, we assumed that the table migrations will happen automatically but then we did not see the corresponding column in the database table and added it manually. Furthermore the RDBloader was complaining that there are only 18 columns but 19 in the jsonpath file.

We are using JSON format. Does that mean that we do have to create the tables manually and check the order in the jsonpath file or does the RDBLoader handle the migration for both TSV and JSON.

Btw, we use the following rdbloader version snowplow/snowplow-rdb-loader:0.19.0

anton · September 1, 2021, 8:45am

Hi @mgloel,

With JSON format - yes, unfortunately you still have to manage tables by yourself, including creation and migration.

We recommend to use TSV to let Loader managing the tables, but then you need to deploy an Iglu Server and also it requires a bit extra versioning circumspection, i.e. it seems you’ve patched your schema by adding the column, but we generally strictly discouraging from doing so - it’s a very classic minor version bump, so should be plain 1-0-1 as @mike has mentioned.

I think it’s generally fine to leave it as is if you already did so (technically all historical data is still valid against the patched version), but it can cause some problems next time.

Anyways, in order to fix it now - you just need to make sure that your table and JSONPath fully match, we have a tutorial for it:

Topic		Replies	Views
RDB Loader Fails --> Amazon](500310) Invalid operation: ALTER TABLE ADD COLUMN defined as NOT NULL must have a non-null default expression; AWS real-time pipeline	4	870	August 15, 2022
RS loader failing to load into redshift Troubleshooting	5	80	November 1, 2024
New version of JSON schema for custom context causing errors with Storage Loader Storage targets	10	2372	December 20, 2016
RDB Loader: SQL `NULL` read at column 1 (JDBC type Char) Troubleshooting	3	1170	July 17, 2023
RDB Loader Failing with Varying Schemas	7	1078	October 25, 2022

Adding optional field to existing schema, RDBLoader failure

Related topics