Shredder config parse error

pramod.niralakeri · January 12, 2022, 12:20pm

I’m getting error

- (String: 8) Unable to parse the configuration: Could not resolve substitution to a value: ${SHRED_BUCKET}.

And my config file is.

{

  "input": s3://${SHRED_BUCKET}/archive/enriched/,

  # Path to shredded archive
  "output": {

    "path": s3://${SHRED_BUCKET}/shredded/good/,

    "compression": "GZIP",

    "region": ${AWS_REGION}
  }

  # Queue used to communicate with Loader
  "queue": {
    "type": "sqs",

    "queueName": "snowplow-shredder-loader.fifo",

    "region": "${AWS_REGION}"
  }

  # Schema-specific format settings (recommended to leave all three groups empty and use TSV as default)
  "formats": {
    "default": "TSV",

    "json": [
      "iglu:com.acme/json-event/jsonschema/1-0-0",
      "iglu:com.acme/json-event/jsonschema/2-*-*"
    ],

    "tsv": [ ],

    "skip": [
      "iglu:com.acme/skip-event/jsonschema/1-*-*"
    ]
  }
}

have tried

"input": s3://${SHRED_BUCKET}/archive/enriched/,

and

"input": "s3://${SHRED_BUCKET}/archive/enriched/",

and

"input": "s3://"${SHRED_BUCKET}"/archive/enriched/",

SHRED_BUCKET ENV is setup and available.

not sure whats wrong.

istreeter · January 12, 2022, 12:35pm

Hi @pramod.niralakeri , I agree this should work if the SHRED_BUCKET environment variable really is set properly. So my first guess is that something has gone wrong setting the environment variable.

Are you running the shredder on EMR? If yes, then it is the EMR cluster that needs to have the environment variable set, not dataflow runner. Please can you share how you are setting the environment variable?

BenB · January 12, 2022, 1:43pm

I wonder if ${SHRED_BUCKET} can be in the middle of a string like this. I would try having just e.g. "input": ${SHRED_INPUT}

I’m also not sure that "region": "${AWS_REGION}" will work with the surrounding quotes, it might need to be "region": ${AWS_REGION}.

pramod.niralakeri · January 12, 2022, 3:09pm

you make sense I guess, I’m setting up ENV in Dockerfile

ENV SHRED_BUCKET=<bucket name>

How to set ENV in EMR? and yes, running the shredder on EMR.

pramod.niralakeri · January 12, 2022, 3:09pm

Tried this but no luck, same error

istreeter · January 13, 2022, 8:04pm

Hi @pramod.niralakeri I’m afraid I don’t know how to set an environment variable in EMR. I had a quick search for an answer, but I couldn’t find an answer.

The expected way to deploy the shredder is not to rely on environment variables in the hocon. But instead, you can template the variables into your config file before running dataflow-runner. Then, dataflow runner can read the templated configuration file (including the bucket name) and submit the steps to EMR using valid shredder base64-encoded config.

Topic		Replies	Views
R35 RDB Shredder config.hocon env var error with dataflow-runner For engineers	1	765	January 12, 2022
RDB Shredder 1.0.0 Iglu Config Error Troubleshooting	6	1231	May 28, 2021
RDB shredder failed? For engineers	27	3149	January 5, 2022
Upgrade shredder from 0.19.0 to 2.0.0 decoding failure in shredding job For engineers	21	2553	January 27, 2022
Configuring the playbook.json for loading Snowflake Snowflake	13	2255	February 26, 2018

Shredder config parse error

Related topics