Hi,
I am trying to set up a stream transformer and a databricks loader to consume the enriched events and load them to Databricks.
To deploy the stream transformer, I am using the snowplow-devops/transformer-kinesis-ec2
terraform module with infra version 0.2.1 and app version 5.2.0. The application is working as expected when output format is set to json, but throws the following error when set to parquet:
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
I believe this error was supposed to be resolved with release 5.1.1.
Is it still an outstanding bug or is there something I have missed in my setup/config?
My tf module config:
module "transformer_enriched" {
source = "snowplow-devops/transformer-kinesis-ec2/aws"
version = "0.2.1"
name = "${var.prefix}-transformer-kinesis-enriched-server"
vpc_id = var.vpc_id
subnet_ids = var.private_subnet_ids
ssh_key_name = aws_key_pair.pipeline.key_name
ssh_ip_allowlist = var.ssh_ip_allowlist
stream_name = module.enriched_stream.name
s3_bucket_name = var.s3_bucket_name
s3_bucket_object_prefix = "${var.s3_bucket_object_prefix}transformed/good"
window_period_min = var.transformer_window_period_min
sqs_queue_name = aws_sqs_queue.message_queue[0].name
transformation_type = "widerow"
widerow_file_format = "parquet"
custom_iglu_resolvers = local.custom_iglu_resolvers
kcl_write_max_capacity = var.pipeline_kcl_write_max_capacity
iam_permissions_boundary = var.iam_permissions_boundary
telemetry_enabled = var.telemetry_enabled
user_provided_id = var.user_provided_id
tags = var.tags
cloudwatch_logs_enabled = var.cloudwatch_logs_enabled
cloudwatch_logs_retention_days = var.cloudwatch_logs_retention_days
}
Thanks!