Hello
When I try to parser logs from my bucket I get the error below:
root@ip-172-31-32-36:/home/cloudfront/snowplow# ./snowplow-emr-etl-runner run -c config.yaml -r resolver.json -d
D, [2018-09-05T04:43:27.209000 #1807] DEBUG -- : Initializing EMR jobflow
ParamContractError: Contract violation for argument 2 of 5:
Expected: String,
Actual: nil
Value guarded in: Snowplow::EmrEtlRunner::EmrJob::get_archive_step
With Contract: String, String, String, String, String => Elasticity::S3DistCpStep
At: uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:806
block in Contract at uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:48
failure_callback at uri:classloader:/gems/contracts-0.11.0/lib/contracts.rb:154
block in redefine_method at uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:143
<main> at uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41
load at org/jruby/RubyKernel.java:979
<main> at uri:classloader:/META-INF/main.rb:1
require at org/jruby/RubyKernel.java:961
(root) at uri:classloader:/META-INF/main.rb:1
<main> at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:1
ERROR: org.jruby.embed.EvalFailedException: (ParamContractError) Contract violation for argument 2 of 5:
Expected: String,
Actual: nil
Value guarded in: Snowplow::EmrEtlRunner::EmrJob::get_archive_step
With Contract: String, String, String, String, String => Elasticity::S3DistCpStep
At: uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:806
I have take look more on google but nothing. Any one have issue like this. My logs file in bucket here
#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields
2018-09-05 04:34:02 NRT12-C1 565 112.197.14.10 GET d1g8ya19nxyjyz.cloudfront.net / 301 - Mozilla/5.0%2520(X11;%2520Ubuntu;%2520Linux%2520x86_64;%2520rv:61.0)%2520Gecko/20100101%2520Firefox/61.0 - - Redirect xLCndSc8mClqXjUZIAtllkGogaGkY4Jk-fRVbU2-0AiZYb9sSr_V-A== click-tracking.rcapp.co http 554 0.001 - - - Redirect HTTP/1.1 - -
2018-09-05 04:34:05 NRT12-C1 341 112.197.14.10 GET d1g8ya19nxyjyz.cloudfront.net / 200 - Mozilla/5.0%2520(X11;%2520Ubuntu;%2520Linux%2520x86_64;%2520rv:61.0)%2520Gecko/20100101%2520Firefox/61.0 - - Error Tv9QZusbNVt6K1kDMp27Rix24SWU6TRm5qPO4rLD3V_q5Sznil7QNQ== click-tracking.rcapp.co https 359 2.080 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Error HTTP/2.0 - -
2018-09-05 04:34:08 NRT12-C1 348 112.197.14.10 GET d1g8ya19nxyjyz.cloudfront.net /abc 200 - Mozilla/5.0%2520(X11;%2520Ubuntu;%2520Linux%2520x86_64;%2520rv:61.0)%2520Gecko/20100101%2520Firefox/61.0 - - Error DI55WryQ7hAUjhlewtCw3SfSxW_Q3227Yq_T_N6rSsXjxFmXm6v3Xw== click-tracking.rcapp.co https 24 0.342 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Error HTTP/2.0 -
And my config.yaml
access_key_id: xxx
secret_access_key: RHk/xxxxxx/xxx
s3:
region: eu-central-1
buckets:
assets: "s3://rcsnowplow-hosted-assets"
log: "s3n://rcmy-snowplow-etl/logs/"
raw:
in:
- "s3://rc-eu-central-1-snowplow/tokyo_logs/"
processing: "s3n://rcmy-snowplow-etl/processing/"
archive: "s3://rcmy-archive-bucket/raw"
enriched:
good: "s3://rcmy-data-bucket/enriched/good"
bad: "s3://rcmy-data-bucket/enriched/bad"
errors: "s3://rcmy-data-bucket/enriched/errors"
archive: "s3://rcmy-data-bucket/enriched/archive"
shredded:
good: "s3://rcmy-data-bucket/shredded/good"
bad: s3://rcmy-data-bucket/shredded/bad
errors: "s3://rcmy-data-bucket/shredded/errors"
emr:
ami_version: 5.9.0
region: eu-central-1
jobflow_role: EMR_EC2_DefaultRole
service_role: EMR_DefaultRole
placement:
ec2_subnet_id: subnet-1edcfa54
ec2_key_name: xxxx
bootstrap: []
software:
hbase:
lingual:
jobflow:
job_name: Snowplow ETL
master_instance_type: m1.medium
core_instance_count: 2
core_instance_type: m1.medium
core_instance_ebs:
volume_size: 100
volume_type: "gp2"
volume_iops: 400
ebs_optimized: false
task_instance_count: 0
task_instance_type: m1.medium
task_instance_bid: 0.015
bootstrap_failure_tries: 3
configuration:
yarn-site:
yarn.resourcemanager.am.max-attempts: "1"
spark:
maximizeResourceAllocation: "true"
additional_info:
collectors:
format: cloudfront
enrich:
versions:
spark_enrich: 1.12.0
continue_on_unexpected_error: false
output_compression: NONE
storage:
download:
folder:
versions:
rdb_loader: 0.14.0
rdb_shredder: 0.13.0
hadoop_elasticsearch: 0.1.0
targets:
name: "My PostgreSQL database"
type: postgres
host: [192.168.10.153] # Hostname of database server
database: postgress # Name of database
port: 5432 # Default Postgres port
ssl_mode: disable # One of disable (default), require, verify-ca or verify-full
table: atomic.events
username: [ptma-log]
password: [xxxxx]
maxerror: # Not required for Postgres
comprows: # Not required for Postgres
monitoring:
tags: {}
logging:
level: DEBUG
Thanks