Custom events are not showing up in the db

I’m using scala stream collector, stream enrich, s3loader and emretlrunner with redshift. This morning I began setting up a simple custom event to capture large chunks of page content in order to build part of a training data set. So far so good-ish.

Everything seems to be working except for the fact that I don’t see any records in the new custom event table after the emretlrunner job successfully completes.

By the way I do see the basic struct and page_view events the events table. That part is still working.

I followed this guide and read through the docs quite thoroughly… still, maybe I missed something.

I’ve worked through all the various errors that I’ve bumped into and I now see my custom events in the s3 stream (enrichment output) bucket and the archives for both enrich and shredded. There are no errors during emretlrunner’s job. There are no errors in the Redshift console.

I suspect that it’s something to do with the jsonpaths file. By the way, I generated it and the ddl using schema-guru and validated the actual schema using igluctl.

Where would I see jsonpaths related errors if there are any?

[UPDATE]

I looked into the EMR rdb_loader step’s stderr output and see this exception:

Exception in thread "Thread-2" java.lang.ExceptionInInitializerError
	at com.snowplowanalytics.snowplow.scalatracker.emitters.AsyncEmitter$$anon$1.run(AsyncEmitter.scala:55)
Caused by: java.lang.IllegalStateException: Shutdown in progress
	at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
	at java.lang.Runtime.addShutdownHook(Runtime.java:211)
	at com.snowplowanalytics.snowplow.scalatracker.emitters.RequestUtils$.<init>(RequestUtils.scala:78)
	at com.snowplowanalytics.snowplow.scalatracker.emitters.RequestUtils$.<clinit>(RequestUtils.scala)
	... 1 more

…I’m not sure what to make of that.

There should be some more logs associated with rdb_loader on the failure itself. If you are seeing shredded data on S3 that’s a good sign - but it sounds like there may be an issue loading that data into Redshift. It could be a connection timeout error or something else - the logs should have a bit more detail.

Mike, thank you. I took some time to read over the redshift logging docs and enabled audit logging. I ran the emretlrunner job again. The only audit log entries (I cat’d all the logs to a single file) are regarding successful operations. There were no errors.

Next I looked more closely at the Redshift console and looked for all operations that involve ‘atomic.’. There were only 4 COPY commands to RS involving atomic during the run.

It seems no copy command is being issued. By the way, the new table is atomic.com_companyname_webpage_content_1

[UPDATE]
I reread https://github.com/snowplow/snowplow/wiki/4-Loading-shredded-types#5-configuring-loading

I was using the wrong directory structure for jsonpaths in s3…
/assets/jsonpaths/webpage_content_1.json

Obviously, I had left out the vendor directory.

I changed it to…
/jsonpaths/com.companyname/webpage_content_1.json

…and updating emretlrunner’s yml to use
jsonpath_assets: s3://iglu.companyname.com/jsonpaths

…then reran the job. Unfortunately, still no copy command to atomic.com_companyname_webpage_content_1.

[UPDATE]

Last problem and the solution…

our emretljob run script (run-etl.sh) was configured to use and version of the resolver file that wasn’t coming from the repo :man_facepalming: …so, my addition of our iglu server was never seen. I made a mistake last night thinking that the shred archive had the records …chalk it up to being tired.

Anyhow, now it’s working!

2 Likes