One Man’s SPAM is…

As the saying goes, is another man’s Ham, or something like that. So I’m migrating the city from product OLD to product NEW, and have been anticipating getting off OLD for sometime. However, there’s been a hitch, seems NEW has a different philosophy on Marketing/Newsletter/Bulk type email. So when we made the final switch over for our test domain (250 users) we started getting complaints and samples of all this HTML type email coming in from Macy’s, EWeek, Hotwire, Management seminars, Foreclosure Auctions, etc. So in talking with reps for NEW their company has always seen that as not officially SPAM. Which I understand, clearly some of the samples were legitimate opt-in emails, however there were more then a few in that grey area; emails that were at best opt-out but more like the address being bought or used without permission. Now OLD does create many false positives (FPs) but it’s the situation our customers are used to (9 years running OLD) so no matter where the fault lies, it’s our job to continue providing that level of service to our customers. Also I’d like to give them the ability to see their own marketing mail to release/delete as they see fit (End User Quarantines). So if some guy in Iran tries to charge 2 tons of yellow cake from Nukes-R-Us on your credit card you don’t miss the notification email because it had too much HTML in it.

NEW is coming out with a feature that will allow for the identification of those type of emails so they can be tagged as SPAM. Great, but it’s not available immediately, so I had to come up with a stop gap and wanted to share that here in case anyone googles how to stop HTML ladened emails at the gateway. I created a Reg Ex used in a filter, that will trip after 5 HTML links to pictures and .asp files in an email, the filter will then add a header so that you can route it wherever you want, including the bit bucket.

I invite anyone to leave a comment on how to improve the regex as I’m not guru at writing them. I was thinking it might be more effective (ie. less expensive) to have only the possible URL characters instead of the lazy .* anyway feel free to leave a comment (I’d be my first).

http://.*.jpg|.gif|.bmp|.jpeg|.tiff|.png|.asp

It is working pretty well, very low FP HAM but obviously not catching everything. I’ve quarantined 1211 emails in 24hrs to the test domain of 250 users, while seeing < 5 FPs.

UPDATE 10-29-08: In addition to the above filter I also discovered it’s a good idea to look for the header List-UnSubscribe only when List-Subscribe doesn’t exist.  This is because legitmate mailing lists (bugtraq, dshield) most often have both where bulk/marketing emails probably only have the first one. So something like this….

If header (‘List-UnSubscribe’) AND if (NOT header(‘List-Subscribe’))

{ action }

UPDATE 11-7-08:  Those marketers are still peddling their wares, so take this Opt-out suckers

http://.*?[a-z0-9]+=S+

That catches dynamic URLs x number or more of times.  That can be lot of FPs but it kills the bulk email big time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s