My EmrEtlRunner has completed its task (shredding and saving to Postgres). However, due to some reason, I did truncate the event table in Postgres. And, now that I want to re-process the same raw data, I’d get this message “No Snowplow logs to process since last run”.
What should I do in order to process the old data again? I already deleted the processing, archived, enriched, and shredded folders, but I still got the same message.
Need someone to save my day again…
I think I got it now, after some serious thinking over the answer I got from the other thread. Too bad, I can’t delete my own post.
Anyway, to anyone else having the same question.
It is because the raw data is moved from raw-in to raw-process folder during EmrEtlRunner’s step, and later on moved to raw-archived folder. What got me confused was that there were still many other log files inside the raw-in folder, so it made me think the raw data was left unmoved (but actually, the data there was totally unrelated to the raw data).
So, if I want to reprocess the old data, then what I should do is copy the data from raw-archived back to raw-in.