Issues with Open Source Kinesis Collector

I am completely new to Snowplow and am in the process of setting up my Open Source Scala Collector for Kinesis and I continue to run into what appears to be a unique issue. I have searched all of the previous topics, to no avail.

When I attempt to run this:

java -jar -Dcom.amazonaws.sdk.disableCbor snowplow-stream-collector-kinesis-1.0.0.jar --config poc.config

I continue to receive this message immediately following:

Exception in thread “main” com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), com.amazonaws.auth.profile.ProfileCredentialsProvider@73386d72: profile file cannot be null, com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@2516fc68: Unable to load credentials from service endpoint]
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(
at com.amazonaws.http.AmazonHttpClient.execute(
at com.amazonaws.http.AmazonHttpClient.execute(
at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.streamExists(KinesisSink.scala:125)
at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.$anonfun$createAndInitialize$2(KinesisSink.scala:52)
at scala.util.Either.flatMap(Either.scala:341)
at com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink$.createAndInitialize(KinesisSink.scala:50)
at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.$anonfun$main$2(KinesisCollector.scala:38)
at scala.util.Either.flatMap(Either.scala:341)
at com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$.main(KinesisCollector.scala:30)
at com .snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector.main(KinesisCollector.scala)

Here is how my config is currently set up:

# Copyright (c) 2013-2020 Snowplow Analytics Ltd. All rights reserved.
# This program is licensed to you under the Apache License Version 2.0, and
# you may not use this file except in compliance with the Apache License
# Version 2.0.  You may obtain a copy of the Apache License Version 2.0 at
# Unless required by applicable law or agreed to in writing, software
# distributed under the Apache License Version 2.0 is distributed on an "AS
# implied.  See the Apache License Version 2.0 for the specific language
# governing permissions and limitations there under.

# This file (config.hocon.sample) contains a template with
# configuration options for the Scala Stream Collector.
# To use, copy this to 'application.conf' and modify the configuration options.

# 'collector' contains configuration options for the main Scala collector.
collector {
  # The collector runs as a web service specified on the following interface and port.
  interface = ""
  interface = ${?COLLECTOR_INTERFACE}
  port = 9092
  port = ${?COLLECTOR_PORT}

  # optional SSL/TLS configuration
  ssl {
    enable = false
    enable = ${?COLLECTOR_SSL}
    # whether to redirect HTTP to HTTPS
    redirect = false
    redirect = ${?COLLECTOR_SSL_REDIRECT}
    port = 9543
    port = ${?COLLECTOR_SSL_PORT}

  # The collector responds with a cookie to requests with a path that matches the 'vendor/version' protocol.
  # The expected values are:
  # - com.snowplowanalytics.snowplow/tp2 for Tracker Protocol 2
  # - r/tp2 for redirects
  # - com.snowplowanalytics.iglu/v1 for the Iglu Webhook
  # Any path that matches the 'vendor/version' protocol will result in a cookie response, for use by custom webhooks
  # downstream of the collector.
  # But you can also map any valid (i.e. two-segment) path to one of the three defaults.
  # Your custom path must be the key and the value must be one of the corresponding default paths. Both must be full
  # valid paths starting with a leading slash.
  # Pass in an empty map to avoid mapping.
  paths {
    # "/com.acme/track" = "/com.snowplowanalytics.snowplow/tp2"
    # "/com.acme/redirect" = "/r/tp2"
    # "/com.acme/iglu" = "/com.snowplowanalytics.iglu/v1"

  # Configure the P3P policy header.
  p3p {
    policyRef = "/w3c/p3p.xml"

  # Cross domain policy configuration.
  # If "enabled" is set to "false", the collector will respond with a 404 to the /crossdomain.xml
  # route.
  crossDomain {
    enabled = false
    # Domains that are granted access, * will match and
    domains = [ "*" ]
    # Whether to only grant access to HTTPS or both HTTPS and HTTP sources
    secure = true

  # The collector returns a cookie to clients for user identification
  # with the following domain and expiration.
  cookie {
    enabled = true
    expiration = "365 days" # e.g. "365 days"
    # Network cookie name
    name = "userEvents"
    # The domain is optional and will make the cookie accessible to other
    # applications on the domain. Comment out these lines to tie cookies to
    # the collector's full domain.
    # The domain is determined by matching the domains from the Origin header of the request
    # to the list below. The first match is used. If no matches are found, the fallback domain will be used,
    # if configured.
    # If you specify a main domain, all subdomains on it will be matched.
    # If you specify a subdomain, only that subdomain will be matched.
    # Examples:
    # will match, and
    # will match but not or
    domains = [
        "" # e.g. "" -any origin domain ending with this will be matched and will be returned
        "" # e.g. "" -any origin domain ending with this will be matched and will be returned
        # ... more domains
    domains += ${?COLLECTOR_COOKIE_DOMAIN_1}
    domains += ${?COLLECTOR_COOKIE_DOMAIN_2}
    # ... more domains
    # If specified, the fallback domain will be used if none of the Origin header hosts matches the list of
    # cookie domains configured above. (For example, if there is no Origin header.)
    fallbackDomain = ""
    fallbackDomain = ${?FALLBACK_DOMAIN}
    secure = false
    httpOnly = false
    # The sameSite is optional. You can choose to not specify the attribute, or you can use `Strict`,
    # `Lax` or `None` to limit the cookie sent context.
    #   Strict: the cookie will only be sent along with "same-site" requests.
    #   Lax: the cookie will be sent with same-site requests, and with cross-site top-level navigation.
    #   None: the cookie will be sent with same-site and cross-site requests.
    #sameSite = "{{cookieSameSite}}"

  # If you have a do not track cookie in place, the Scala Stream Collector can respect it by
  # completely bypassing the processing of an incoming request carrying this cookie, the collector
  # will simply reply by a 200 saying "do not track".
  # The cookie name and value must match the configuration below, where the names of the cookies must
  # match entirely and the value could be a regular expression.
  doNotTrackCookie {
    enabled = false
    name = "doNotTrackCookieName"
    value = "doNotTrackCookieName"

  # When enabled and the cookie specified above is missing, performs a redirect to itself to check
  # if third-party cookies are blocked using the specified name. If they are indeed blocked,
  # fallbackNetworkId is used instead of generating a new random one.
  cookieBounce {
    enabled = false
    # The name of the request parameter which will be used on redirects checking that third-party
    # cookies work.
    name = "n3pc"
    # Network user id to fallback to when third-party cookies are blocked.
    fallbackNetworkUserId = "00000000-0000-4000-A000-000000000000"
    # Optionally, specify the name of the header containing the originating protocol for use in the
    # bounce redirect location. Use this if behind a load balancer that performs SSL termination.
    # The value of this header must be http or https. Example, if behind an AWS Classic ELB.
    forwardedProtocolHeader = "X-Forwarded-Proto"

  # When enabled, redirect prefix `r/` will be enabled and its query parameters resolved.
  # Otherwise the request prefixed with `r/` will be dropped with `404 Not Found`
  # Custom redirects configured in `paths` can still be used.
  enableDefaultRedirect = true
  enableDefaultRedirect = ${?COLLECTOR_ALLOW_REDIRECTS}

  # When enabled, the redirect url passed via the `u` query parameter is scanned for a placeholder
  # token. All instances of that token are replaced withe the network ID. If the placeholder isn't
  # specified, the default value is `${SP_NUID}`.
  redirectMacro {
    enabled = false
    # Optional custom placeholder token (defaults to the literal `${SP_NUID}`)
    placeholder = "[TOKEN]"

  # Customize response handling for requests for the root path ("/").
  # Useful if you need to redirect to web content or privacy policies regarding the use of this collector.
  rootResponse {
    enabled = false
    statusCode = 302
    # Optional, defaults to empty map
    headers = {
      Location = "",
      X-Custom = "something"
    # Optional, defaults to empty string
    body = "302, redirecting"

  # Configuration related to CORS preflight requests
  cors {
    # The Access-Control-Max-Age response header indicates how long the results of a preflight
    # request can be cached. -1 seconds disables the cache. Chromium max is 10m, Firefox is 24h.
    accessControlMaxAge = 5 seconds

  # Configuration of prometheus http metrics
  prometheusMetrics {
    # If metrics are enabled then all requests will be logged as prometheus metrics
    # and '/metrics' endpoint will return the report about the requests
    enabled = false
    # Custom buckets for http_request_duration_seconds_bucket duration metric
    #durationBucketsInSeconds = [0.1, 3, 10]

  streams {
    # Events which have successfully been collected will be stored in the good stream/topic
    good = "good_stream"

    # Events that are too big (w.r.t Kinesis 1MB limit) will be stored in the bad stream/topic
    bad = "bad_stream"

    # Whether to use the incoming event's ip as the partition key for the good stream/topic
    # Note: Nsq does not make use of partition key.
    useIpAddressAsPartitionKey = false

    # Enable the chosen sink by uncommenting the appropriate configuration
    sink {
      # Choose between kinesis, google-pub-sub, kafka, nsq, or stdout.
      # To use stdout, comment or remove everything in the "collector.streams.sink" section except
      # "enabled" which should be set to "stdout".
      enabled = kinesis

      # Region where the streams are located
      region = "us-east-2"

      ## Optional endpoint url configuration to override aws kinesis endpoints,
      ## this can be used to specify local endpoints when using localstack
      # customEndpoint = {{kinesisEndpoint}}

      # Thread pool size for Kinesis API requests
      threadPoolSize = 10

      # The following are used to authenticate for the Amazon Kinesis sink.
      # If both are set to 'default', the default provider chain is used
      # (see
      # If both are set to 'iam', use AWS IAM Roles to provision credentials.
      # If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
      aws {
        accessKey = default
        secretKey = default

      # Minimum and maximum backoff periods, in milliseconds
      backoffPolicy {
        minBackoff = 10
        maxBackoff = 1000

      # Or Google Pubsub
      #googleProjectId = ID
      ## Minimum, maximum and total backoff periods, in milliseconds
      ## and multiplier between two backoff
      #backoffPolicy {
      #  minBackoff = {{minBackoffMillis}}
      #  maxBackoff = {{maxBackoffMillis}}
      #  totalBackoff = {{totalBackoffMillis}} # must be >= 10000
      #  multiplier = {{backoffMultiplier}}

      # Or Kafka
      #brokers = "{{kafkaBrokers}}"
      ## Number of retries to perform before giving up on sending a record
      #retries = 0
      # The kafka producer has a variety of possible configuration options defined at
      # Some values are set to other values from this config by default:
      # "bootstrap.servers" = brokers
      # "buffer.memory"     = buffer.byteLimit
      # ""         = buffer.timeLimit
      #producerConf {
      #  acks = all
      #  "key.serializer"     = "org.apache.kafka.common.serialization.StringSerializer"
      #  "value.serializer"   = "org.apache.kafka.common.serialization.StringSerializer"

      # Or NSQ
      ## Host name for nsqd
      #host = "{{nsqHost}}"
      ## TCP port for nsqd, 4150 by default
      #port = {{nsqdPort}}

    # Incoming events are stored in a buffer before being sent to Kinesis/Kafka.
    # Note: Buffering is not supported by NSQ.
    # The buffer is emptied whenever:
    # - the number of stored records reaches record-limit or
    # - the combined size of the stored records reaches byte-limit or
    # - the time in milliseconds since the buffer was last emptied reaches time-limit
    buffer {
      byteLimit = 10485760  # 10MB
      recordLimit = 1024 # Not supported by Kafka; will be ignored
      timeLimit = 5000 # 5 seconds


# Akka has a variety of possible configuration options defined at
akka {
  loglevel = DEBUG # 'OFF' for no logging, 'DEBUG' for all logging.
  loglevel = ${?AKKA_LOGLEVEL}
  loggers = ["akka.event.slf4j.Slf4jLogger"]
  loggers = [${?AKKA_LOGGERS}]

  # akka-http is the server the Stream collector uses and has configurable options defined at
  http.server {
    # To obtain the hostname in the collector, the 'remote-address' header
    # should be set. By default, this is disabled, and enabling it
    # adds the 'Remote-Address' header to every request automatically.
    remote-address-header = on
    remote-address-header = ${?AKKA_HTTP_SERVER_REMOTE_ADDRESS_HEADER}

    raw-request-uri-header = on
    raw-request-uri-header = ${?AKKA_HTTP_SERVER_RAW_REQUEST_URI_HEADER}

    # Define the maximum request length (the default is 2048)
    parsing {
      max-uri-length = 32768
      max-uri-length = ${?AKKA_HTTP_SERVER_PARSING_MAX_URI_LENGTH}
      uri-parsing-mode = relaxed
      uri-parsing-mode = ${?AKKA_HTTP_SERVER_PARSING_URI_PARSING_MODE}

  # By default setting `collector.ssl` relies on JSSE (Java Secure Socket
  # Extension) to enable secure communication.
  # To override the default settings set the following section as per
  # ssl-config {
  #   debug = {
  #     ssl = true
  #   }
  #   keyManager = {
  #     stores = [
  #       {type = "PKCS12", classpath = false, path = "/etc/ssl/mycert.p12", password = "mypassword" }
  #     ]
  #   }
  #   loose {
  #     disableHostnameVerification = false
  #   }
  # }

The collector is looking for access credentials to connect to Kinesis but can’t find any in the provider chain. Depending on how you are running the collector you will need to pass these in using one of the options in the provider chain.

Does anyone have any examples/samples of these in use with the Snowplow Kinesis connector? I simply want to make sure I am doing it correctly.

Hi @vbaker! How are you currently wanting to configure the AWS credentials?

      # The following are used to authenticate for the Amazon Kinesis sink.
      # If both are set to 'default', the default provider chain is used
      # (see
      # If both are set to 'iam', use AWS IAM Roles to provision credentials.
      # If both are set to 'env', use environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
      aws {
        accessKey = default
        secretKey = default

The relevant section of the configuration is here.

If you are using IAM Instance profile set them both to “iam” if you are using “env” set the config to “env” and then configure your environment variables on the server and so on.

I ended up getting the environment variables figured out…

Now, when I run

java -jar snowplow-stream-collector-kinesis-1.0.1.jar --config=poc.collector.config

I receive the following within the terminal window

[main] INFO com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink - Creating thread pool of size 10
[main] INFO com.snowplowanalytics.snowplow.collectors.scalastream.sinks.KinesisSink - Creating thread pool of size 10
[DEBUG] [07/21/2020 13:58:18.101] [main] [EventStream(akka://scala-stream-collector)] logger log1-Logging$DefaultLogger started
[DEBUG] [07/21/2020 13:58:18.106] [main] [EventStream(akka://scala-stream-collector)] Default Loggers started
[DEBUG] [07/21/2020 13:58:18.255] [main] [AkkaSSLConfig(akka://scala-stream-collector)] Initializing AkkaSSLConfig extension…
[DEBUG] [07/21/2020 13:58:18.257] [main] [AkkaSSLConfig(akka://scala-stream-collector)] buildHostnameVerifier: created hostname verifier: com.typesafe.sslconfig.ssl.DefaultHostnameVerifier@410ae5ac
[DEBUG] [07/21/2020 13:58:18.621] [] [akka://scala-stream-collector/system/IO-TCP/selectors/$a/0] Successfully bound to /0:0:0:0:0:0:0:0:9092
[] INFO com.snowplowanalytics.snowplow.collectors.scalastream.KinesisCollector$ - REST interface bound to /0:0:0:0:0:0:0:0:9092

Should the interface be that of the server I am looking to run this on?

Hi @vbaker,

interface = "" means that the collector (which is basically an HTTP server) is accepting requests on all the network interfaces of the machine running it.

REST interface bound to /0:0:0:0:0:0:0:0:9092 means that it’s accepting IPv6.

So, how do I verify it is working with kinesis then? When I get it up and running and go into any of the kinesis streams, it appears blank for me as though no data is being sent over.

Are you sending events in to the collector? You should see some logs in the collector if things are succeeding / failing to Kinesis.

I thought that I had a Java Script Tracker set up on my staging website properly. I followed these instructions and changed the cookie to be snowplow.js, instead of sp.js.

Here is the Page Tracker I have placed into GTM:

Does the MYURL in the window.snowplow block need to be something different? Possibly the location of where I want the event to flow to? If so, where do I find that in Amazon Kinesis?

Here is the Event Tracker I have placed into GTM:

That mostly looks correct but your third argument when initialising the tracker should be the endpoint of your collector rather than the path to the Snowplow Javascript tracker.


window.snowplow('newTracker', 'trackername', '', { ...

To add, because it’s not clear whether or not it’s understood - in (window,document,"script","//MYURL/snowplow.js","snowplow")

//MYURL/snowplow.js needs to be the path to the snowplow tracker script itself, which you’ll need to host somewhere (normally just in your CDN). So if you haven’t already, you’ll need to grab the sp.js file from here, and have this path reference it.