RDB Loader 1.0.0 Docker config error [RESOLVED]

Joao_Miguel_Santos · January 18, 2022, 1:59pm

Hey, so I’m trying to finally automate the final steps of the Snowplow pipeline to push the shredded data into my Redshift DWH using the RDB Loader 1.0.0 and so far I’ve always run the same Docker command locally docker run snowplow/snowplow-rdb-loader:1.0.0 --iglu-config $(cat ./resolver.json | base64) --config $(cat ./config.hocon | base64) and it always worked but I’m now trying to make it work within an Airflow task and I need to retrieve the codebase from Gitlab and replace the Environment Variables in the config.hocon. I’ve essentially tried to build a custom image derived from the snowplow/snowplow-rdb-loader:1.0.0:
Dockerfile

FROM snowplow/snowplow-rdb-loader:1.0.0
USER root
RUN apt-get update \
    && apt-get install -y gettext git
COPY snowplow_loader.sh .
USER snowplow
ENTRYPOINT ["/bin/sh"]

and then run the function after I’ve replaced the vars:
snowplow_loader.sh

# get gitlab code for snowplow
git clone https://$GITLAB_USERNAME:$GITLAB_ACCESS_TOKEN@gitlab.com/etc/etc.git

# replace env vars
envsubst < ./snowplow/sink_redshift/loader/config.hocon > ./snowplow/sink_redshift/loader/config_prod.hocon
cp -f ./snowplow/sink_redshift/loader/config_prod.hocon ./snowplow/sink_redshift/loader/config.hocon

# run function
/home/snowplow/bin/snowplow-rdb-loader --config $(cat ./snowplow/sink_redshift/loader/config.hocon | base64) --iglu-config $(cat ./snowplow/sink_redshift/loader/resolver.json | base64)

to which I get an error:

For reference these are my config files:
config.hocon

{
  # Human-readable identificator, can be random
  "name": "Snowplow Redshift Loader",
  # Machine-readable unique identificator, must be UUID
  "id": "fake",

  # Data Lake (S3) region
  "region": "fake",
  # SQS topic name used by Shredder and Loader to communicate
  "messageQueue": "fake",

  # Shredder-specific configs
  "shredder": {
    "type": "batch",
    # Path to enriched archive (must be populated separately with run=YYYY-MM-DD-hh-mm-ss directories)
    "input": "fake",
    # Path to shredded output
    "output": {
      "path": "fake",
      # Shredder output compression, GZIP or NONE
      "compression": "GZIP"
    }
  },

  # Schema-specific format settings (recommended to leave all three groups empty and use TSV as default)
  "formats": {
    # Format used by default (TSV or JSON)
    "default": "TSV",
    # Schemas to be shredded as JSONs, corresponding JSONPath files must be present. Automigrations will be disabled
    "json": [ ],
    # Schemas to be shredded as TSVs, presence of the schema on Iglu Server is necessary. Automigartions enabled
    "tsv": [ ],
    # Schemas that won't be loaded
    "skip": [ ]
  },

  # Warehouse connection details
  "storage" = {
    # Database, redshift is the only acceptable option
    "type": "redshift",
    # Redshift hostname
    "host": "fake",
    # Database name
    "database": "dev",
    # Database port
    "port": 5439,
    # AWS Role ARN allowing Redshift to load data from S3
    "roleArn": "fake",
    # DB schema name
    "schema": "atomic",
    # DB user with permissions to load data
    "username": "fake",
    # DB password
    "password": "$SNOWPLOW_DWH_PASSWORD",
    # Custom JDBC configuration
    "jdbc": {"ssl": false},
    # MAXERROR, amount of acceptable loading errors
    "maxError": 10
  },

  # Additional steps. analyze, vacuum and transit_load are valid values
  "steps": ["analyze"],

  # Observability and logging opitons
  "monitoring": {
    # Snowplow tracking (optional)
    "snowplow": null,
    # Sentry (optional)
    "sentry": null
  }
}

resolver.json

{
  "schema": "iglu:com.snowplowanalytics.iglu/resolver-config/jsonschema/1-0-3",
  "data": {
    "cacheSize": 500,
    "repositories": [
      {
        "name": "Iglu Central",
        "priority": 0,
        "vendorPrefixes": [ "com.snowplowanalytics" ],
        "connection": {
          "http": {
            "uri": "https://myserver.com/api/"
          }
        }
      }
    ]
  }
}

Also, if I invert the order of the arguments, then I get an error which is similar to the one I’ve mentioned but related to the resolver.json so I suspect this is either something related to the base64 enconding which is somehow different from my local base64 enconding or the way I’m trying to run the function which is not correct but I wasn’t able to figure out the issue.

EDIT: after trying some things I’ve found scattered around the web I’ve managed to actually get it running by adding -w 0 to the base64 command. I’m waiting for my Shredder to complete running to test it fully but in theory this should be resolved. Not sure why locally I don’t need this extra flag though…

anton · January 20, 2022, 12:58pm

Hey @Joao_Miguel_Santos,

I’m glad you’ve managed to resolve this! But please consider to upgrade your apps either to 1.2.3, which is latest in 1.x series and doesn’t require any configuration changes or to 2.1.0, which does require config changes, but also has massive stability improvements.

Topic		Replies	Views
RDB Loader 5.5.0 released New releases	0	726	May 26, 2023
Help with provisioning rdb loader AWS batch pipeline (Legacy)	8	1723	November 10, 2018
Snowplow/rdb-loader-redshift:5.4.1 /bin/sh Unexpected argument Storage targets	5	1261	May 26, 2023
RDB Loader 5.3.2 released New releases	0	878	March 8, 2023
RDB Shredder 1.0.0 Iglu Config Error Troubleshooting	6	1234	May 28, 2021

RDB Loader 1.0.0 Docker config error [RESOLVED]

Related topics