I have Dockerized version of Snowplow Stream Transformer Kinesis running in EKS and as i am going to store the data in Azure Databricks thats why i’m trying to write it as wide row and parquet format. but i’m getting the following error
Pleae note that it works great with wide row json
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at com.github.mjakubowski84.parquet4s.parquet.io$.$anonfun$validateWritePath$1(io.scala:35)
at fromSync @ com.snowplowanalytics.snowplow.rdbloader.transformer.stream.kinesis.Main$.run(Main
.scala:28)
And here is the config.hocon
{
"input": {
"streamName": "{{.Values.config.streams.good}}"
},
"output": {
"path": "s3://{{.Values.config.storage.bucket}}/transformed/"
},
"windowing": "1 minutes",
"queue": {
"type": "sqs""queueName": "{{.Values.config.sqs}}"
},
"formats": {
"transformationType": "widerow",
"fileFormat": "parquet"
}
}