Context: We’ve been experiencing consistently increasing memory usage on our workers that are solely responsible for firing these Snowplow events. We’re attempting to rule out potential misuse of the library as a culprit (something akin to a memory leak perhaps).
1. Using the Python Tracker, should Tracker be a singleton, or recreated for each new event?
Suppose the Emitter is initialized once as a singleton. For each event fired, a Tracker is created anew, the Subject is set on that Tracker, and tracker.track_self_describing_event() is called passing along the appropriate data. Is this a perfectly acceptable way to use Tracker?
The alternative is to keep Tracker a long-lived instance and change the Subject that is set on it as needed (as appears to be demonstrated in documentation). The concern with this was the variability of state this introduces to the Tracker instance - without the proper cleanup from any calling code that uses the Tracker, this could pass along Subject info that is not desired.
2. Are there any known memory leaks aside from the above?
The other consideration was if there are any known memory leaks with the Python Tracker library in general. We have some events that may have started passing along significantly larger amounts of data than before, potentially close to when the steady increase in memory usage began. We are suspicious of if the significant memory increase is due to holding on to that data, and we’re only observing it now because of how much data is now being passed along.