Node Tracker String Invisible String Limit

Hiya,
We’re using Snowplow to track events and have some long strings (~17K characters). We’ve set the limit in the schema to 30K, and the documentation is very unclear on how the node tracker works with long strings, but we appear to be dropping any event that is longer than 3K. Has anyone had any experience with this? The node tracker documentation is dismal, with most links going to 404 and only the most basic examples.

The events don’t show up in either the good or bad index, and we don’t see any error message. Snowplow support is ~2 weeks to respond. Does anyone have any experience with this, know what’s going on, or how to fix? We’ve restarted the elasticsearch service after every schema change, and have tried varying different types of text (removing all special characters, unicode/decode, etc.

Thanks!

By dropping do you mean the event is never being sent at all from the tracker itself?

Elasticsearch unfortunately has it’s own trimming limits (ignore_above) that don’t guarantee it will appear there.

How are you initialising the Node tracker (specifically the emitter) at the moment? I suspect if you are using GET that you may be running into a maximum payload limit within Node as using POST I seem to be able to send payloads in excess of 30k without issues.

Thanks for the response!

By dropping do you mean the event is never being sent at all from the tracker itself?

I think so – my only visibility into what’s going on is through the Mini UI, but I’m not getting any error messages in my code when I, for instance, try/catch. If you have any suggestions for other ways to confirm that it’s sending, I’m all ears. We’re forcing flush while we’re testing.

How are you initialising the Node tracker

Here’s how I’m doing it, currently. I believe we tried forcing POST, but good to know about Node’s GET limits – I didn’t know that. I’ll test it again now with a deeper understanding.

const e = gotEmitter(SNOWPLOW_ENDPOINT);
const t = tracker(e, SNOWPLOW_NAMESPACE, SNOWPLOW_APPID, SNOWPLOW_BASE64);
...

...
  try {
    t.track(
      buildSelfDescribingEvent({
        event: {
          schema: "<custom iglu schema repo here>",
          data: {
            ...eventData,
          },
        },
      })
    );
  } catch (error) {
    console.error("Failed to send event:", error);
  }
  e.flush();

I would double check with Micro (the Docker image is straightforward to spin up) which will eliminate any truncation issues in OpenSearch.

It is possible (if you are using GET) that you may also be hitting a 4xx Error assuming the event is being sent successfully (this may be different depending on your collector version, could be a 400 for 3.x or a 414 for 2.x).

The Node tracker has a parameter on the emitter that invokes a callback function in the event of success / failure which you can modify here (my example below is for Micro).

const e = gotEmitter(
    'localhost', // Collector endpoint
    HttpProtocol.HTTP, // Optionally specify a method - https is the default
    9090, // Optionally specify a port
    HttpMethod.GET, // Method - defaults to GET
    1,
    1,
    cookieJar,
    function (err, response) {
      if (err){
        console.log('An error has occurred', err)
      } else {
        console.log('event success');
      }
    }
  );

I’d expect that if the event isn’t arriving at the collector (or is being rejected with a non-retryable status) then you’d see an error logged.

1 Like

Thanks! Micro is super cool and I tried to use it before. Works great for standard events, but I ran into auth issues with iglu, and I didn’t want to stand up my own custom iglu server just to test it. Any suggestions for testing selfDescribingEvent() with a custom iglu schema in Micro?

If you are only testing a couple of schemas you can bind a volume to Micro and it will pick these schemas up (using an embedded Iglu registry instead of a remote one) e.g.,

docker run -p 9090:9090 --mount type=bind,source=$(pwd)/schemas,destination=/config/iglu-client-embedded/schemas snowplow/snowplow-micro:2.1.0

where your local schemas folder has a layout of

schemas
└── com.example
    └── my-schema
        └── jsonschema
            ├── 1-0-0
1 Like

In case you have not seen it, there is a doc page on this topic :slight_smile: Adding custom schemas | Snowplow Documentation

2 Likes

In case you have not seen it,

I had NOT seen this – thank you!

1 Like

So we finally got it working. Every response here, along with some help from Snowplow support, added a piece of the puzzle and we finally got it working. Configuring the emitter to use POST, use SSL on a secure port, NOT include an extra https:// at the front (only for the node tracker, the python tracker is the opposite) and being able to troubleshoot with callback, local micro instance, and the mini UI restart were all parts that helped us put it together. Ended up configuring the emitter like this:

const e = gotEmitter(SNOWPLOW_ENDPOINT, HttpProtocol.HTTPS, 443, HttpMethod.POST);

and restarting the mini service and it worked without truncating the extra long string. Thanks!

1 Like

No worries - fwiw I believe I was on that call with your team.

Going forward the next major release (v4) of the Node tracker will use POST by default which will then negate having to populate with those additional parameters (and explicitly having to specify POST over GET) so these failures should be easier to debug.

1 Like