When running with the cross run duplicate event removal we’ve empirically observed that only the write index is used. Reading the snowplow-r88 release notes on this feature it only discusses the write throughput capacity.
We’ve written our own script to manage this throughput, is it sufficient only turn up the write capacity throughput if we wish to avoid throttling?
Yes, you’re right, deduplication in shred job uses only write throughput and it should be enough to tune only write capacity. Read capacity can remain on super-low values such as 5 units or so.
It’s an interesting feature of DynamoDB conditional writes that they count against only write throughput (not read), whether or not the condition is met or not (i.e. regardless of whether the operation ends up being a write-read or a read-only).