I’ve got a couple of materialized tables that serve several dashboards and would like to exclude several internal IP addresses.
I could add a WHERE statement but I’m looking for alternatives. I thought about creating a SQL runner playbook that would move the events with the listed IP’s to another table.
Normally we just create alternate views of our events table with WHERE NOT
user_ipaddress IN (…). I haven’t spent any time testing the performance
of it because our views are still so fast. Avoid using Regex though. We’re
on dense compute BTW.
For any custom reports or dashboards’ tables, we run through SQL runner and
generate them off the filtered views. Saves tons of time filtering traffic.
Creating a new view using a WHERE clause should be reasonably fast but that’s going to depend on a few factors (# rows, IPs filtered on).
In the past we’ve created derived tables using an exclusion table which gives you the flexibility of changing the IPs you want to filter without having to change the underlying SQL.
atomic.ip_exclusions contains a single ip_address column with the IP addresses you wish to exclude.
Create your new table off this
CREATE TABLE atomic.example AS SELECT * FROM atomic.events WHERE ip_address NOT IN (SELECT ip_address FROM atomic.ip_exclusions)
Redshift recently added (October 27) support for automatically adding compression when using the CTAS syntax so you’ll benefit from automatic encoding as well.
I would suggest to add
DISTRIBUTION ALL to the IP exclusion table - the table would be copied to all the nodes, therefore
JOINs would work blazing fast.
That is nice! I had been hoping for something like this.
I love the above ideas but wonder if either are efficient for IP ranges. I have at least 17 ranges of internal IPs that I want to filter out from the third octet (so 10.2.0.0 to 10.2.255.255, for example). I could insert 8704 rows of IPs into my DB (256x2x17) but that’s a lot of filtering on every report. Has anyone come across a better way, either in ETL or in modeling or reporting? I don’t mind physically removing records from those IPs from the DB.