Hi there,
config.yml takes an additional_info
JSON field, but I’m not sure how I could configure the EMR cluster to include Ganglia.
Can you point me in the right direction?
Thanks,
Gabor
Hi there,
config.yml takes an additional_info
JSON field, but I’m not sure how I could configure the EMR cluster to include Ganglia.
Can you point me in the right direction?
Thanks,
Gabor
Hi @rgabo,
I don’t think you can use the additional_info
for this purpose. To be able to add Ganglia to EMR cluster you would have to engage --applications
parameter as per http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-ganglia.html.
This can be achieved with Dataflow Runner. More info on the latest release is here: https://snowplowanalytics.com/blog/2017/03/31/dataflow-runner-0.2.0-released/.
To be more specific, the configuration file would need to include the value Ganglia on this line: https://github.com/snowplow/dataflow-runner/blob/master/config/cluster.json.sample#L84
Thanks, @ihor, I’ll keep an eye on Dataflow Runner development.
I have a specific requirement that let me to write this script to spin up the cluster and submit steps.
but you can just use this or similar script to start he cluster and then use the cluster id with data-flow-runner to submit the steps.