Shredder config parse error

I’m getting error

- (String: 8) Unable to parse the configuration: Could not resolve substitution to a value: ${SHRED_BUCKET}.

And my config file is.

{

  "input": s3://${SHRED_BUCKET}/archive/enriched/,

  # Path to shredded archive
  "output": {

    "path": s3://${SHRED_BUCKET}/shredded/good/,

    "compression": "GZIP",

    "region": ${AWS_REGION}
  }

  # Queue used to communicate with Loader
  "queue": {
    "type": "sqs",

    "queueName": "snowplow-shredder-loader.fifo",

    "region": "${AWS_REGION}"
  }

  # Schema-specific format settings (recommended to leave all three groups empty and use TSV as default)
  "formats": {
    "default": "TSV",

    "json": [
      "iglu:com.acme/json-event/jsonschema/1-0-0",
      "iglu:com.acme/json-event/jsonschema/2-*-*"
    ],

    "tsv": [ ],

    "skip": [
      "iglu:com.acme/skip-event/jsonschema/1-*-*"
    ]
  }
}

have tried

"input": s3://${SHRED_BUCKET}/archive/enriched/,

and

"input": "s3://${SHRED_BUCKET}/archive/enriched/",

and

"input": "s3://"${SHRED_BUCKET}"/archive/enriched/",

SHRED_BUCKET ENV is setup and available.

not sure whats wrong.

Hi @pramod.niralakeri , I agree this should work if the SHRED_BUCKET environment variable really is set properly. So my first guess is that something has gone wrong setting the environment variable.

Are you running the shredder on EMR? If yes, then it is the EMR cluster that needs to have the environment variable set, not dataflow runner. Please can you share how you are setting the environment variable?

I wonder if ${SHRED_BUCKET} can be in the middle of a string like this. I would try having just e.g. "input": ${SHRED_INPUT}

I’m also not sure that "region": "${AWS_REGION}" will work with the surrounding quotes, it might need to be "region": ${AWS_REGION}.

you make sense I guess, I’m setting up ENV in Dockerfile

ENV SHRED_BUCKET=<bucket name>

How to set ENV in EMR? and yes, running the shredder on EMR.

Tried this but no luck, same error

Hi @pramod.niralakeri I’m afraid I don’t know how to set an environment variable in EMR. I had a quick search for an answer, but I couldn’t find an answer.

The expected way to deploy the shredder is not to rely on environment variables in the hocon. But instead, you can template the variables into your config file before running dataflow-runner. Then, dataflow runner can read the templated configuration file (including the bucket name) and submit the steps to EMR using valid shredder base64-encoded config.