How do you know it's spam? We just can't know!


The other day, I talked about the spam seemingly sent by a local aldermanic campaign here in Chicago. When I talked about this spam on Facebook, one of the folks pushing back attempted to lead me down an existential rabbit hole based on the theory that we just can't know whether or not a given email message is spam. That it is literally impossible to know with absolute certainty whether or not a single email message is unsolicited. Which is yet another one of those (possibly) correct but (definitely) not very useful kind of responses. Let's break it down.

First, let's get this out of the way. Yes, it's absolutely true that there is no “this message is unsolicited” flag or email header in an email message, allowing anyone, at a glance, to immediately know, whether or not a given email message is spam or not.

The recipient has a pretty good idea. Mail from somebody you don't recognize; you don't recall signing up for it, and ethically speaking, it's not up to the recipient to prove the negative; the onus is generally on the sender to ensure that mail is solicited, not the other way around.

But that's a specific knowledge had by the sender has and the recipient alone. And either could lie. The sender could mis-state that mail is being sent with permission, and the recipient could be mistaken and perhaps did sign up for the mail, but forgot about it. (Or was forge subscribed, signed up by a third party as a prank or a typo'd address submission.)

So how does a third party measure for spam? It turns out that the process of spam filtering, as it has evolved over the years, has involved a lot of thought around how to best determine if a sender, IP address, domain, or set of messages, is spammy or not. Because that's ultimately the goal of spam filters; keeping the unwanted out, and the unsolicited is broadly unwanted. So your mailbox providers – the Google's Gmail, Yahoo Mail, Microsoft's Outlook.com and so many others have worked hard to implement reputation-based mechanisms to identify which messages are wanted versus wanted, solicited versus unsolicited, blockable versus deserving of inbox placement.

So how does an ISP figure out that you've sent spam, or if you're a spammer? Everybody's got their secret sauce and a million data points, but pick a simple starting point by focusing on two very common data points used by so many mailbox providers to measure wanted versus unwanted mail.

  1. Engagement: High or low? Unsolicited email has lower engagement rates (opens, clicks) compared to solicited mail. Yes, most mailbox providers can tell which messages you interact with.
  2. Complaints: Low or high? Unsolicited mail has higher complaint rates than solicited emails. Mailbox providers often have a “report spam” button. When you “report spam” on an unwanted email message, the mailbox provider notes that complaint – they know which messages you complain about.

There's a lot more beyond that, but this is where it starts. The point is not that a single complaint is proof of spam, or that a single open is proof of non-spam. Mailbox providers see billions of email messages and billions of these data points and are able to roll them up to the sending domain, the sending IP address, the fingerprint of a given type of email message, etc. And then it's “just” a simple matter of stack ranking senders. Who's generating the most complaints? Who's generating the least engagement? Given enough data, time, and expertise, mailbox providers get a feel, based on the analysis of this data, for which mail streams merit blocking, and which ones merit inbox placement.

That's what “reputation” means in the context of sending email. That's where it all starts.

This whole thing can sometimes be a bit of an arms race; people are looking to exploit loopholes or edge cases in spam filtering and occasionally you'll see somebody saying they've cracked the code to bypass some certain filter, but these wins are usually short-lived, because mailbox providers are, for the most part, pretty good about stopping spam.

Post a Comment

Comments