Adding optional field to existing schema, RDBLoader failure

Hi,

I am facing the following issue.

I want to add an additional optional field to an existing schema.

In the schema file 1-0-0:
 "isEnrolled": {
            "type": ["boolean","null"],
            "description": "true if user is enrolled"
        },

In the SQL file:
"is_enrolled" BOOLEAN       ENCODE RUNLENGTH,

The data should end up in the same table as usual. Some of the data (everything up to next week’s release) will not contain the field.

I added the column to the existing table by running:

ALTER TABLE atomic.com_myapp_my_session_context_1
ADD COLUMN is_enrolled boolean ENCODE RUNLENGTH NULL;

The column is added. However the RDBLoader throws the following error message:


INFO 2021-08-31 11:50:40.294: COPY atomic.com_myapp_my_session_context_1 FROM 's3://sp-shredded-alt/good/run=2021-08-31-11-23-01/vendor=com.myapp/name=my_session_context/format=json/model=1'
INFO 2021-08-31 11:50:40.294: COPY atomic.com_myapp_my_session_context_1 FROM 's3://sp-shredded-altgood/run=2021-08-31-11-23-01/vendor=com.myapp/name=my_session_context/format=json/model=1'

2021-08-31T13:50:40.784+02:00

Copy
RDB Loader unknown error in executeUpdate
RDB Loader unknown error in executeUpdate

2021-08-31T13:50:40.784+02:00

Copy
java.sql.SQLException: [Amazon](500310) Invalid operation: Load into table 'com_oneapp_weconnect_session_context_1' failed.  Check 'stl_load_errors' system table for details.;

The stl_load_errors table says that the boolean format is unknown?

err_code 1210 Unknown boolean format

It seems to me that the RDBLoader has problems writing the data that does not contain that boolean field is_enrolled into the redshift database…

Is it at all possible to add to the same table? Or should we update the version number after such a change and start writing into a completely new table based on schema 1-0-1?

Hello @mgloel,

Do you use TSV or JSON format? For TSV, RDB Loader is supposed to handle table migrations all on its own.

If that’s JSON, you need to make sure your corresponding JSONPath file has a proper column order.

2 Likes

If you are adding a column then yes - ideally you should be bumping this to 1-0-1. Depending on what version of the pipeline you are using as @anton has mentioned above then RDB loader will handle the schema migration for you - there’s no requirement to manually modify the table.

Hi,

thanks a lot for your quick replies. Actually, we assumed that the table migrations will happen automatically but then we did not see the corresponding column in the database table and added it manually. Furthermore the RDBloader was complaining that there are only 18 columns but 19 in the jsonpath file.

We are using JSON format. Does that mean that we do have to create the tables manually and check the order in the jsonpath file or does the RDBLoader handle the migration for both TSV and JSON.

Btw, we use the following rdbloader version snowplow/snowplow-rdb-loader:0.19.0

Hi @mgloel,

With JSON format - yes, unfortunately you still have to manage tables by yourself, including creation and migration.

We recommend to use TSV to let Loader managing the tables, but then you need to deploy an Iglu Server and also it requires a bit extra versioning circumspection, i.e. it seems you’ve patched your schema by adding the column, but we generally strictly discouraging from doing so - it’s a very classic minor version bump, so should be plain 1-0-1 as @mike has mentioned.

I think it’s generally fine to leave it as is if you already did so (technically all historical data is still valid against the patched version), but it can cause some problems next time.

Anyways, in order to fix it now - you just need to make sure that your table and JSONPath fully match, we have a tutorial for it:

2 Likes