On a recent batch process run, in my shredded / good folder, I see these records:
{"schema":{"vendor":"com.managedbyq","name":"oa_globals","format":"jsonschema","version":"1-0-0"},"data":{"version_content":"2.20.3","version_app":"1.7.1","extended_data":"","location_authorized":false,"location_auth_status":"0","location_device_enabled":true,"location_status":"0","wifi_enabled":true},"hierarchy":{"rootId":"c6468566-b1f8-4885-83d3-809c6569cfa5","rootTstamp":"2016-04-19 15:20:19.000","refRoot":"events","refTree":["events","oa_globals"],"refParent":"events"}}
{"schema":{"vendor":"com.managedbyq","name":"oa_globals","format":"jsonschema","version":"1-0-0"},"data":{"version_content":"2.20.3","version_app":"1.7.1","extended_data":"","location_authorized":false,"location_auth_status":"0","location_device_enabled":true,"location_status":"0","wifi_enabled":true},"hierarchy":{"rootId":"cabf4718-5cd0-4816-a976-7725733a1772","rootTstamp":"2016-04-19 15:20:19.000","refRoot":"events","refTree":["events","oa_globals"],"refParent":"events"}}```
Inexplicably, these are the records in my redshift table:<img src="//cdck-file-uploads-us1.s3.dualstack.us-west-2.amazonaws.com/flex016/uploads/snowplowanalytics/original/1X/75e2185d0eebce94173489c04f4467a4ffcd2c31.png" width="690" height="23">
1. Why is extended_data coming through as "f", when it was a blank string in records?
2. Why is location_status coming through as "t", when it was a "0" in the records?
Here's the table - I created it using the schema guru:
<img src="//cdck-file-uploads-us1.s3.dualstack.us-west-2.amazonaws.com/flex016/uploads/snowplowanalytics/original/1X/8c1fd5e703d2d93143459416852c6f3ef853c838.png" width="690" height="281">
Thanks,
Dan
From the “table” above you can see that extended_data is lined up with the location_authorized which is a boolean value. The same goes to location_status which is lined up with the booleanwifi_enabled element. Thus the boolean values of wifi_enabled (true) and location_authorized (false) were converted to “string representation” on data load to Redshift table as those values were loaded into location_status and extended_data respectively (according to the mapping which JSONPaths represent).
You mentioned that the SQL code to create the table was generated by Schema Guru. What about JSONPaths file? Was it generated too or it’s an older version of manually created file?
Again, the two files (the SQL code to create the table and the elements in JSONPaths file) should match.
Thanks Ihor - I had no idea that the two had to line up, and I had created the JSONPaths manually. I will fix that JSONPaths and I’m sure that will solve the problem.
However I see your post of mapping version_app to $.data.version_content. May I know which should be the correct way? Are you eventually got your data imported?
The key name in JSONPath should match the name in JSON data file. JSONPath is like XPath for JSON (rather than XML document). You can validate your JSON with JSONPath expression using this tool, for example.
On the other side of the mapping, the order of the keys in JSONPath should follow the corresponding order of the fields in the Redshift table.
In the Amazon Redshift COPY syntax, a JSONPath expression specifies the explicit path to a single name element in a JSON hierarchical data structure.
Each JSONPath expression in the jsonpaths array corresponds to one column in the Amazon Redshift target table. The order of the jsonpaths array elements must match the order of the columns in the target table or the column list, if a column list is used.