Tracking lots of spam for fun and profit

It dawned on me today that I haven't been logging the recipient addresses identified in the spam messages I'm cataloging and reporting data on. I think it'd be a good idea to expand my data set sideways and start adding that info, as spot checking the data has been quite insightful. I've found, for example, that spammers are dumb enough to harvest from Google Groups, because I have a fair number of recipient addresses with “...” in them, indicating they were truncated versions of real addresses I used when posting to newsgroups years ago. Then there's lots of spam directly to those newsgroup-harvested addresses, spam to addresses obviously harvested from the web, spam hitting abused co-reg addresses, and god knows what else to actual once-valid but long-dead actual user addresses.

There's one alias that is getting just a metric ton of spam, and the construction of the username portion makes it clear to me that it was an alias I gave to somebody and they misused it, or somehow leaked it to some real bad dudes. I wish I could remember who I gave the address to – but that info is stored on a drive pulled from my old unix server when I moved to Chicago. I'm dying to know which random bad actor is responsible for that bit o' feed, because the mail it's getting is so far from CAN-SPAM compliant that it's not even funny.

Even though I'm getting more than six thousand spams a day, I've only been tracking an average of 2200 a day for the past forty-one days. At first I had to do a lot of manual review of the spam to ensure that it wasn't accidental ham, there was a fair amount of that to be weeded out. It was easily weeded out and rules were put in place to help keep it out, but doing so took time, and I couldn't run the whole spamtrap feed through the measuring stick until I reviewed it all.

Now that this is out of the way, the only things holding me back here and there are software bugs and/or server issues. Occasionally the drive on the server handling this mail fills up, so I had to do a lot of fancy coding around that, to make stuff sit and pause and wait for the disk usage to come back down. That's no fun. But now that I'm able to work around it, I should start consistently logging data about at least five thousand spams each day.

Here's some random statistics for you. I recently added Gmail bulk foldering to my spam results, and so far I'm seeing that Gmail is only 88.8% affective against my spam feed. Meaning, 11.2% of spam I receive is not going to the spam folder in Gmail. Of the 92,730 spam messages I've tracked so far, over the past forty-one days, they have come my way from 68,516 unique IP addresses, and 58,022 unique /24 blocks.

Just yesterday it dawned on me that I should start tracking domains used in spam. I decided to focus on from lines, and log unique from domains that actually exist. Just since I turned it on, I've tracked over 5,500 unique domains. I have a few ideas of neat things I can do with this data, after I compile enough of it, but I'm not sharing any of those secrets quite yet.

What I will share though, is information showing what IP addresses and netblocks actually send me the most spam. It'll be interesting to see how it compares to what other people are seeing on their own mail streams. Look for that soon!

Al Iverson

I've lived in Chicago since 2006. I spend a lot of time in hotels. I spend the time catching up on emails, blogging, writing shell scripts, and solving complex email-related equations. The system says I've been a Blogger user since April 2006. Actually, most of my sites are far older -- spamresource.com in particular has been a dumping ground for my thoughts on spam going back to 2001. I've been doing the whole "blog thing" for years, before I knew to call it blogging. I'm a jazz fan and my favorite club in the world is the Artists' Quarter in St. Paul, Minnesota.

Spam Resource: All Things Deliverability

Tracking lots of spam for fun and profit

Comments

Post a Comment

Checking an SPF record with the Kitterman SPF Validator

One click unsub confusion: Let's clarify!

Comcast email addresses: .com or .net?

Gmail: New spam-related rejections and what you need to know

You use 2FA for your Google account, right?

Microsoft: Intermittent STOREDRV.Deliver errors on April 9th

KumoMTA: Moving Beyond IP Reputation

TAGS

POPULAR

ISP Deliverability Guide: Yahoo/AOL (Updated for 2022)

Comcast email addresses: .com or .net?

Gmail: New spam-related rejections and what you need to know

Reference: Web.de, GMX and Mail.com Domains

Gmail now rejecting unauthenticated mail

Gmail and Yahoo: New deliverability requirements coming in 2024

Reference: MAGY (Microsoft, AOL, Gmail, Yahoo) Email Domains

DMARC to be required at Gmail in 2024

Gmail: Weird RFC 5322 bounces and what to do about them

Reference: All AT&T Email Domains

Tracking lots of spam for fun and profit

Comments

Post a Comment

Related Posts

TAGS

POPULAR