We use Snowplow for data analytics and we use Postgres for storing data. We use metabase for data visualization and it had been good until it started to reduce speed for queering due to large amount of data.
As a result, we decided to test Elasticsearch. But we faced with a problem that amount of data in Elasticsearch is 5 times less than in Postgres. We do not understand why. They receive data from the same stream, but difference is massive.
Also, there is warn from elk stream loader: WARN com.snowplowanalytics.stream.loader.clients.ElasticsearchBulkSender - Returning 56 records as failed, but we did not find explanation of what it means.
Also it throws: [scala-execution-context-global-18] ERROR com.snowplowanalytics.stream.loader.clients.ElasticsearchBulkSender - Record
And: failed with message failed to parse
If someone faced with similar problem or may be work with Elasticsearch, can you give some explanation of how to deal with difference of data. I understand that Postgres and Elastic are different storage services but they use the same stream of data. May be there is some problems with schemes?
There are some warnings and errors in enriched events loader’s logs.
First block is:
[RecordProcessor-0000] INFO com.snowplowanalytics.stream.loader.clients.ElasticsearchBulkSender - Emitted 97 records to Elasticseacrch
[RecordProcessor-0000] WARN com.snowplowanalytics.stream.loader.clients.ElasticsearchBulkSender - Returning 55 records as failed
[scala-execution-context-global-19] WARN com.snowplowanalytics.stream.loader.clients.ElasticsearchBulkSender - Cluster health is yellow
The second is a json block (or group of blocks) that starts with:
[scala-execution-context-global-19] ERROR com.snowplowanalytics.stream.loader.clients.ElasticsearchBulkSender - Record
and finishes with:
failed with message failed to parse
with information about an event between them.
Also I forgot to mention that we deployed snowplow’s stuff with terraform and the version of loader is 1.0.0 as it turned out, and we use elasticsearch version 7.13. May be there is a discrepancy between versions? What is the best stuck versions to use?