SMTP Address Validation: Bad Idea

Every once in a while, somebody asks me to help them with a project to do SMTP validation of a large number of email addresses, or help them build this functionality into some product or website.

SMTP address validation is a really bad idea, for many reasons. Allow me to explain.

Let's start with a bit of history: Mail servers used to have built in functionality to do email address validation. What happened to it?

One upon a time, most Internet Service Providers (ISPs) running most of the common mail transfer agents (MTAs - mail servers), had an SMTP command called "VRFY" (verify) that allowed somebody to ask a receiving mail server whether or not an email address is valid. Like so many other good things in the world, spammers wrecked it for everyone else. After wide-scale abuse of this functionality, ISPs administrators and MTA maintainers disabled this functionality en masse, and you find it implemented just about nowhere nowadays. RFC 2505 - from more than ten years ago - explains why SMTP VRFY is exploited by spammers and recommends restricting access to it.

Even though SMTP VRFY is long deprecated, many folks have figured out that there's another way to do this -- to check a remote mail server to see if an email address is alive. What you do is you simulate a connection to the remote mail server, as though you were sending an email message. To understand why SMTP address validation is a bad idea, you should understand how it works. Here's how:

When sending an email message, the sending and receiving mail servers talk together, in an conversation that goes somewhat like this:
  1. Sender: HELLO! I'm the sending mail server.
  2. Receiver: Hello, I'm the receiving mail server. Go ahead!
  3. Sender: Mail from: Somebody
  4. Receiver: OK, that's good.
  5. Sender: Mail to: Somebody else
  6. Receiver: Okay, that's fine.
  7. Sender: Here's my message body!
  8. Receiver: Thanks for the message, I will deliver it.
  9. Sender: OK, bye!
Step 6 up there, which I paraphrased as "Okay, that's fine," is the receiving mail server telling the sending server that it's okay to send a message to this person; I'll accept the message. Some people believe that a there is a strong correlation between this response and the validity of the recipient's email address. Meaning, you can (in theory) use this test to try to determine whether or not an email address is valid. If the address doesn't exist, the server will typically respond with sorry, I can't accept mail for that user, or sorry, that user doesn't exist.

If you want to do this without actually sending an email message, you simply terminate the connection after step 6. It seems pretty simple, right? I'm sure there are a lot of various shareware and foreign applications out there that you can buy that will allow you to do this. But that doesn't mean it's a good idea, nor is it a best practice. In fact, this is a very bad idea.

Why? There are numerous risks and flaws in this methodology, which combine to make it unusable in the real world. Here's what I mean:
  1. Some mail servers "catch all" inbound messages. Any SMTP address validation is going to be inaccurate in those cases. Google Apps, a very popular email hosting platform, has a very easy setting that one can enable. "If received email does not match any existing address, forward it to X." Some people use this setting for ease of giving different email addresses to different sites where they've signed up for email messages. Others use it to catch and denote spam. And that's just Google Apps - many other platforms do this as well. Some spam filters grab all inbound mail, process it, then forward on the non-spam to a site's "real" inbound email server. Also, many admins choose to set up "catch all" mail handling on their company or organization mail servers, for various reasons. Google for "Qmail catchall" and you'll see lots of people trying to figure out how to do it in common mail servers like Qmail and Sendmail.

  2. ISPs consider this spammer behavior and will block you. ISPs can easily tell that you're doing this. You're tickling their mail server in a way that results in a lot of noise in their mail server logs. Yet, you're not sending much (or any) actual email messages. That looks odd to them, and after the history with SMTP VRFY, odd looks spammy. Hotmail, for example, considers this evidence of a "dictionary attack" (aka directory harvest attack) attempt in progress, and will drop a hard block on all SMTP connections from your IP address. Many other ISPs will do the same. It will be difficult to get your IP address unblocked, because you look like a REALLY bad actor to the ISP -- they are not likely to want to work with you.

  3. You will get blacklisted. When you're doing this to various ISPs and to various other domains, you're going to eventually get noticed by a blacklist operator or spam filter provider, and you're likely to get blacklisted by them. Some of the ISPs even share SMTP log file info with various blacklists. Just like with an ISP, resolving this issue will be difficult, because the blacklist already thinks you're a bad guy. It won't be easy to convince them otherwise.

  4. Your own ISP may block or terminate your internet service. Running an "SMTP Validator" application on your desktop internet connection is likely prohibited by your ISP's terms of service. Why? Because, again, it looks spammy. ISPs have long prohibited "Direct-to-MX" mail sending, bypassing their mail server to connect to remote mail servers. The network chatter this generates looks very much like that. Also, most "dynamic" provider space is blacklisted on various special blacklists to try to prevent this kind of spam. You'll have a very low success rate trying to connect to remote mail servers in this way.
So that's what you CAN'T do and why you SHOULDN'T do it. Enough with the "no," let's talk about the "yes." What CAN you do if you want to validate email addresses? Watch for Monday's post, where I'll talk about that in more detail.

1 comments:

adamo said...

Just for completeness, here's my version for "catchall" for sendmail.