Hi,
I’m now investigating an issue with the Elasticsearch loader and happy if I could have advice from you folks.
We have set up Snowplow using Terraform. We are receiving some bad rows with “schema_violations” errors but cannot find those data in Elasticsearch. Bad rows for “adapter_failures” are available on Elasticsearch so I’m sure Enrich Server and Elasticsearch are connected.
We are loading bad rows also into the S3 and we can find bad rows with “schema_violations” in the data on S3.
Our setup is as follows:
Collector - Kinesis - Enrich - Kinesis for Good - ES Loader - ES
- S3 Loader - S3
Collector - Kinesis - Enrich - Kinesis for Bad - ES Loader - ES
- S3 Loader - S3
And our terraform config the Enrich is as follows:
module "es_loader_for_enricher_output_bad_stream_staging" {
source = "snowplow-devops/elasticsearch-loader-kinesis-ec2/aws"
version = "0.1.1"
name = "snowplow-es-loader-for-enricher-output-bad-stream-staging"
vpc_id = local.snowplow_conf_staging.vpc_id
subnet_ids = local.snowplow_conf_staging_subnets.private_subnet_ids
ssh_key_name = local.snowplow_conf_staging.ssh_key_name
in_stream_type = "bad"
in_stream_name = module.enricher_output_bad_stream_staging.name
bad_stream_name = module.es_loader_shared_output_bad_stream_staging.name
es_cluster_endpoint = aws_elasticsearch_domain.snowplow_elasticsearch_staging.endpoint
es_cluster_port = 443
es_cluster_name = aws_elasticsearch_domain.snowplow_elasticsearch_staging.domain_name
es_cluster_index = "snowplow-bad-enriched-index"
es_cluster_document_type = "bad"
aws_es_domain_name = aws_elasticsearch_domain.snowplow_elasticsearch_staging.domain_name
telemetry_enabled = false
}
And on S3 we can find bad raw data as follows:
{"schema":"iglu:com.snowplowanalytics.snowplow.badrows/schema_violations/jsonschema/2-0-0","data":{"processor":{"artifact":"streamCommon","version":"2.0.5"},"failure":{"timestamp":"2022-03-02T11:12:29.542037Z","messages":[{"schemaKey":"iglu:com.snowplowanalytics.snowplow/link_click/jsonschema/1-0-1","error":{"error":"ValidationError","dataReports":[{"message":"$.targetUrl: is missing but it is required","path":"$","keyword":"required","targets":["targetUrl"]}]}}]},"payload": ...TRUNCATED
Have check CW logs for ES loader but coulnd’t identify helpful output.