Collector Not Sending Events to Stream

lkstrategy · February 15, 2021, 8:20pm

Cannot figure out what I am doing wrong. Figure I will give it one last shot. My Collector, Enricher, and S3 loader are all running on separate EC2 servers, and when I run the docker image, they start with no errors. Unfortunately, I am not seeing any get records on the Kinesis stream, and no activity from the the Enricher.

In my Javascript tracker, if I have the forceUnsecureTracker set to false, my collector gives me the following error:

[scala-stream-collector-akka.actor.default-dispatcher-85] WARN akka.actor.ActorSystemImpl - Illegal request, responding with status '400 Bad Request': Unsupported HTTP method: The HTTP method started with 0x16 rather than any known HTTP method. Perhaps this was an HTTPS request sent to an HTTP endpoint?

When it is set to true, I do not see any error in the console.

Here is the Javascript tracker:

<script type="text/javascript">
;(function(p,l,o,w,i,n,g){if(!p[i]){p.GlobalSnowplowNamespace=p.GlobalSnowplowNamespace||[];
p.GlobalSnowplowNamespace.push(i);p[i]=function(){(p[i].q=p[i].q||[]).push(arguments)
};p[i].q=p[i].q||[];n=l.createElement(o);g=l.getElementsByTagName(o)[0];n.async=1;
n.src=w;g.parentNode.insertBefore(n,g)}}(window,document,"script","//cdn.jsdelivr.net/gh/snowplow/sp-js-assets@2.10.2/sp.js","snowplow"));
window.snowplow('newTracker', 'sp', 'ec2-3-236-209-79.compute-1.amazonaws.com:8080', { // Initialise a tracker
      appId: "union",
      platform: "web",
      cookieDomain: "mywebsite.com",
      forceUnsecureTracker: true,
      bufferSize: 1,
      contexts: {
        webPage: true,
        performanceTiming: true,
        
      }
    });

    window.snowplow('trackPageView');
    </script>

Here is my collector config:

# Copyright (c) 2013-2021 Snowplow Analytics Ltd. All rights reserved.
#
# This program is licensed to you under the Apache License Version 2.0, and
# you may not use this file except in compliance with the Apache License
# Version 2.0.  You may obtain a copy of the Apache License Version 2.0 at
# http://www.apache.org/licenses/LICENSE-2.0.
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the Apache License Version 2.0 is distributed on an "AS
# IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied.  See the Apache License Version 2.0 for the specific language
# governing permissions and limitations there under.

# This file (config.hocon.sample) contains a template with
# configuration options for the Scala Stream Collector.
#
# To use, copy this to 'application.conf' and modify the configuration options.

# 'collector' contains configuration options for the main Scala collector.
collector {
  # The collector runs as a web service specified on the following interface and port.
  interface = "0.0.0.0"
 
  port = 8088


  # optional SSL/TLS configuration
  ssl {
    enable = false

    # whether to redirect HTTP to HTTPS
    redirect = false

    port = 9543

  }

  # The collector responds with a cookie to requests with a path that matches the 'vendor/version' protocol.
  # The expected values are:
  # - com.snowplowanalytics.snowplow/tp2 for Tracker Protocol 2
  # - r/tp2 for redirects
  # - com.snowplowanalytics.iglu/v1 for the Iglu Webhook
  # Any path that matches the 'vendor/version' protocol will result in a cookie response, for use by custom webhooks
  # downstream of the collector.
  # But you can also map any valid (i.e. two-segment) path to one of the three defaults.
  # Your custom path must be the key and the value must be one of the corresponding default paths. Both must be full
  # valid paths starting with a leading slash.
  # Pass in an empty map to avoid mapping.
  paths {
    # "/com.acme/track" = "/com.snowplowanalytics.snowplow/tp2"
    # "/com.acme/redirect" = "/r/tp2"
    # "/com.acme/iglu" = "/com.snowplowanalytics.iglu/v1"
  }

  # Configure the P3P policy header.
  p3p {
    policyRef = "/w3c/p3p.xml"
    CP = "NOI DSP COR NID PSA OUR IND COM NAV STA"
  }

  # Cross domain policy configuration.
  # If "enabled" is set to "false", the collector will respond with a 404 to the /crossdomain.xml
  # route.
  crossDomain {
    enabled = false
    # Domains that are granted access, *.acme.com will match http://acme.com and http://sub.acme.com
   
    domains = [ "*" ]

    # Whether to only grant access to HTTPS or both HTTPS and HTTP sources
    secure = false

  }

  # The collector returns a cookie to clients for user identification
  # with the following domain and expiration.
  cookie {
    enabled = true

    expiration = "365 days" # e.g. "365 days"

    # Network cookie name
    name = "test"

    # The domain is optional and will make the cookie accessible to other
    # applications on the domain. Comment out these lines to tie cookies to
    # the collector's full domain.

     domains = [
        "unionresolute.com" # e.g. "domain.com" -> any origin domain ending with this will be matched and domain.com will be returned
        # ... more domains
    ]
  
    # ... more domains
    # If specified, the fallback domain will be used if none of the Origin header hosts matches the list of
    # cookie domains configured above. (For example, if there is no Origin header.)
    fallbackDomain = "{{fallbackDomain}}"
    fallbackDomain = ${?FALLBACK_DOMAIN}
    secure = false
  
    httpOnly = false

    # The sameSite is optional. You can choose to not specify the attribute, or you can use `Strict`,
    # `Lax` or `None` to limit the cookie sent context.
    #   Strict: the cookie will only be sent along with "same-site" requests.
    #   Lax: the cookie will be sent with same-site requests, and with cross-site top-level navigation.
    #   None: the cookie will be sent with same-site and cross-site requests.
    #sameSite = "{{cookieSameSite}}"
 
  }

  # If you have a do not track cookie in place, the Scala Stream Collector can respect it by
  # completely bypassing the processing of an incoming request carrying this cookie, the collector
  # will simply reply by a 200 saying "do not track".
  # The cookie name and value must match the configuration below, where the names of the cookies must
  # match entirely and the value could be a regular expression.
  doNotTrackCookie {
    enabled = false
    name = "dnt"
    value = "value"

  }

  # When enabled and the cookie specified above is missing, performs a redirect to itself to check
  # if third-party cookies are blocked using the specified name. If they are indeed blocked,
  # fallbackNetworkId is used instead of generating a new random one.
  cookieBounce {
    enabled = false
    
    # The name of the request parameter which will be used on redirects checking that third-party
    # cookies work.
    name = "n3pc"
  
    # Network user id to fallback to when third-party cookies are blocked.
    fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000"
    
    # Optionally, specify the name of the header containing the originating protocol for use in the
    # bounce redirect location. Use this if behind a load balancer that performs SSL termination.
    # The value of this header must be http or https. Example, if behind an AWS Classic ELB.
    forwardedProtocolHeader = "X-Forwarded-Proto"
  }

  # When enabled, redirect prefix `r/` will be enabled and its query parameters resolved.
  # Otherwise the request prefixed with `r/` will be dropped with `404 Not Found`
  # Custom redirects configured in `paths` can still be used.
  enableDefaultRedirect = false


  # When enabled, the redirect url passed via the `u` query parameter is scanned for a placeholder
  # token. All instances of that token are replaced withe the network ID. If the placeholder isn't
  # specified, the default value is `${SP_NUID}`.
  redirectMacro {
    enabled = false

    # Optional custom placeholder token (defaults to the literal `${SP_NUID}`)
    placeholder = "[TOKEN]"
  
  }

  # Customize response handling for requests for the root path ("/").
  # Useful if you need to redirect to web content or privacy policies regarding the use of this collector.
  rootResponse {
    enabled = false

    statusCode = 302
   
    # Optional, defaults to empty map
    headers = {
      Location = "https://127.0.0.1/",
    
      X-Custom = "something"
    }
    # Optional, defaults to empty string
    body = "302, redirecting"
  }

  # Configuration related to CORS preflight requests
  cors {
    # The Access-Control-Max-Age response header indicates how long the results of a preflight
    # request can be cached. -1 seconds disables the cache. Chromium max is 10m, Firefox is 24h.
    accessControlMaxAge = 5 seconds
    
  }

  # Configuration of prometheus http metrics
  prometheusMetrics {
    # If metrics are enabled then all requests will be logged as prometheus metrics
    # and '/metrics' endpoint will return the report about the requests
    enabled = false
    # Custom buckets for http_request_duration_seconds_bucket duration metric
    #durationBucketsInSeconds = [0.1, 3, 10]
  }

  streams {
    # Events which have successfully been collected will be stored in the good stream/topic
    good = snowplow-1
   

    # Events that are too big (w.r.t Kinesis 1MB limit) will be stored in the bad stream/topic
    bad = snowplow-2
 

    # Whether to use the incoming event's ip as the partition key for the good stream/topic
    # Note: Nsq does not make use of partition key.
    useIpAddressAsPartitionKey = false
    useIpAddressAsPartitionKey = ${?COLLECTOR_STREAMS_USE_IP_ADDRESS_AS_PARTITION_KEY}

    # Enable the chosen sink by uncommenting the appropriate configuration
    sink {
      # Choose between kinesis, google-pub-sub, kafka, nsq, or stdout.
      # To use stdout, comment or remove everything in the "collector.streams.sink" section except
      # "enabled" which should be set to "stdout".
      enabled = kinesis
    

      # Region where the streams are located
      region = us-east-1

      ## Optional endpoint url configuration to override aws kinesis endpoints,
      ## this can be used to specify local endpoints when using localstack
      # customEndpoint = {{kinesisEndpoint}}
      # customEndpoint = ${?COLLECTOR_STREAMS_SINK_CUSTOM_ENDPOINT}

      # Thread pool size for Kinesis API requests
      threadPoolSize = 10

      # Optional SQS buffer for good and bad events (respectively).
      # When messages can't be sent to Kinesis, they will be sent to SQS.
      # If not configured, sending to Kinesis will be retried. 
      #sqsGoodBuffer = {{sqsGoodBuffer}}
      #sqsGoodBuffer = ${?COLLECTOR_STREAMS_SINK_SQS_GOOD_BUFFER}

      #sqsBadBuffer = {{sqsBadBuffer}}
      #sqsBadBuffer = ${?COLLECTOR_STREAMS_SINK_SQS_BAD_BUFFER}

      # The following are used to authenticate for the Amazon Kinesis sink.
      # If both are set to 'default', the default provider chain is used
      # (see http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html)
      # If both are set to 'iam', use AWS IAM Roles to provision credentials.
      # If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and WS_ACCESS_KEY_ID
      aws {
        accessKey = env
        secretKey = env
      }

      # Minimum and maximum backoff periods, in milliseconds
      backoffPolicy {
        minBackoff = 10000

        maxBackoff = 100000000

      }

      # Or Google Pubsub
      #googleProjectId = ID
      ## Minimum, maximum and total backoff periods, in milliseconds
      ## and multiplier between two backoff
      #backoffPolicy {
      #  minBackoff = {{minBackoffMillis}}
      #  maxBackoff = {{maxBackoffMillis}}
      #  totalBackoff = {{totalBackoffMillis}} # must be >= 10000
      #  multiplier = {{backoffMultiplier}}
      #}

      # Or Kafka
      #brokers = "{{kafkaBrokers}}"
      ## Number of retries to perform before giving up on sending a record
      #retries = 0
      # The kafka producer has a variety of possible configuration options defined at
      # https://kafka.apache.org/documentation/#producerconfigs
      # Some values are set to other values from this config by default:
      # "bootstrap.servers" = brokers
      # "buffer.memory"     = buffer.byteLimit
      # "linger.ms"         = buffer.timeLimit
      #producerConf {
      #  acks = all
      #  "key.serializer"     = "org.apache.kafka.common.serialization.StringSerializer"
      #  "value.serializer"   = "org.apache.kafka.common.serialization.StringSerializer"
      #}

      # Or NSQ
      ## Host name for nsqd
      #host = "{{nsqHost}}"
      ## TCP port for nsqd, 4150 by default
      #port = {{nsqdPort}}
      buffer {
      byteLimit = 1000000

      recordLimit = 500  # Not supported by Kafka; will be ignored

      timeLimit = 1000000

    }
  }

}

# Akka has a variety of possible configuration options defined at
# http://doc.akka.io/docs/akka/current/scala/general/configuration.html
akka {
  loglevel = DEBUG # 'OFF' for no logging, 'DEBUG' for all logging.
  loglevel = ${?AKKA_LOGLEVEL}
  loggers = ["akka.event.slf4j.Slf4jLogger"]
  loggers = [${?AKKA_LOGGERS}]

  # akka-http is the server the Stream collector uses and has configurable options defined at
  # http://doc.akka.io/docs/akka-http/current/scala/http/configuration.html
  http.server {
    # To obtain the hostname in the collector, the 'remote-address' header
    # should be set. By default, this is disabled, and enabling it
    # adds the 'Remote-Address' header to every request automatically.
    remote-address-header = on
    remote-address-header = ${?AKKA_HTTP_SERVER_REMOTE_ADDRESS_HEADER}

    raw-request-uri-header = on
    raw-request-uri-header = ${?AKKA_HTTP_SERVER_RAW_REQUEST_URI_HEADER}

    # Define the maximum request length (the default is 2048)
    parsing {
      max-uri-length = 32768
      #max-uri-length = ${?AKKA_HTTP_SERVER_PARSING_MAX_URI_LENGTH}
       uri-parsing-mode = relaxed
      # uri-parsing-mode = ${?AKKA_HTTP_SERVER_PARSING_URI_PARSING_MODE}
    }
  }

  # By default setting `collector.ssl` relies on JSSE (Java Secure Socket
  # Extension) to enable secure communication.
  # To override the default settings set the following section as per
  # https://lightbend.github.io/ssl-config/ExampleSSLConfig.html
  # ssl-config {
  #   debug = {
  #     ssl = true
  #   }
  #   keyManager = {
  #     stores = [
  #       {type = "PKCS12", classpath = false, path = "/etc/ssl/mycert.p12", password = "mypassword" }
  #     ]
  #   }
  #   loose {
  #     disableHostnameVerification = false
  #   }
  # }

If it helps, here is my Enrich config:

enrich {

  streams {

    in {
      raw = "snowplow-1"
    }

    out {
      enriched = "good-enrich-stream"
      bad = "bad-enrich-stream"
      pii = "pii-enrich-stream"
      partitionKey = "event_id"
    }

    sourceSink {
      enabled =  "kinesis"
      region = "us-east-1"

      aws {
	accessKey = env
        secretKey = env
      }

      maxRecords = 10000
      initialPosition = "TRIM_HORIZON"

      backoffPolicy {
        minBackoff = 3000
        maxBackoff = 600000
      }
}
    buffer {
      byteLimit = 4000000
      recordLimit = 500 # Not supported by Kafka; will be ignored
      timeLimit = 5000
    }

    appName = "snowplow-enrich-staging"
  }
}

ihor · February 15, 2021, 8:40pm

@lkstrategy, you send your events on port 8080 but the collector seems to be configured to receive them on 8088.

lkstrategy · February 15, 2021, 8:45pm

@ihor sorry I realized that with the code on the server and did not update it in my docs. That has been fixed for a bit and still not getting anywhere.

That said, I do get the error mentioned when I don’t have the forceUnsecureTracker set to true, so I think it is going to the right place.

mike · February 15, 2021, 9:15pm

Is the request to the collector going over HTTPS or not? Typically Akka responds with this if a HTTPS request is being sent to a HTTP only endpoint.

lkstrategy · February 15, 2021, 9:20pm

I believe I set it to HTTP when I set the forceUnsecureTracker to true, unless I am mistaken. I do not get that error if it is set to true.

mike · February 15, 2021, 9:57pm

That makes sense.

If you want to use HTTPS - which ideally you should - you’ll want to setup the SSL config and make sure that this port is accessible on the collector / load balancer.

lkstrategy · February 15, 2021, 10:37pm

Okay cool. If I use the forceUnsecureTracker when I initialize the tracker it should still work over HTTP right? My main problem is nothing is making it to the Kinesis stream. Do I need HTTPS to make this work?

mike · February 16, 2021, 2:13am

Yes - it should still work over HTTP without any errors.

lkstrategy · February 16, 2021, 3:22pm

Okay so I am still not able to get events into the enrich streams. Do you mind taking a look at the config files and let me know if there is anything obvious I am getting wrong here?

mike · February 16, 2021, 11:35pm

Sure - if you are getting events into the raw stream chances are that part of collection is correct. If you aren’t getting events in the enriched stream then that suggests an issue either with the enrich config or if you are getting events emitted (into the bad stream) potentially an error in the events themselves.

Topic		Replies	Views
Collector Issue // Unsupported HTTP method Collectors	6	2370	February 15, 2021
Problem Sinking from Scala Stream Collector to Kinesis Collectors	3	1436	August 23, 2016
Collector is sending empty raw events Collectors	3	2286	August 22, 2018
JS tracker and scala stream collector For engineers	5	1394	December 15, 2017
Issue while sending data to kinesis data stream AWS real-time pipeline	7	1432	June 24, 2020

Collector Not Sending Events to Stream

Related topics