Hi,
I’m working with the Python Analytics API trying to read data from Google Cloud Storage (written by Storage Loader).
However the event parsing fails for me on:
return jsonify_good_event(line.split('\\t'), known_fields, add_geolocation_data)
AttributeError: 'dict' object has no attribute 'split'
I’m not quite sure how to handle this or what the expected input format would be as I’m passing on the TSV row string. Do I have to take care of transformations beforehand?
This is my code:
def snowplowTsv2Json(content):
"""Use the Snowplow SDK to generate a JSON object from TSV files
Args:
content: File retrieved from GCS
"""
snowplowEvents = []
print(type(content))
with open(content, encoding='utf-8') as tsvfile:
reader = [line.rstrip('\n') for line in tsvfile]
#reader = csv.DictReader(tsvfile, dialect='excel-tab')
print('First row of reader: {}, Type of reader var {}'.format(reader[0], type(reader)))
for row in reader:
print('Iterator row: {}'.format(str(row)))
try:
jsonRow = snowplow_analytics_sdk.event_transformer.transform(
str(row))
print('Json row: {}'.format(jsonRow))
snowplowEvents.append(
snowplow_analytics_sdk.event_transformer.transform(jsonRow))
except snowplow_analytics_sdk.snowplow_event_transformation_exception.SnowplowEventTransformationException as e:
for error_message in e.error_messages:
print('Error in snowplowTsv2Json: {}'.format(error_message))
return snowplowEvents
Do you have any idea on how to progress? I’ve tried various options of feeding the TSV rows, but none worked…