Signup Best Practices: Banning Bots and NHI


Today I’m focusing on email signup form best practices, or: how to keep the crap out of your list.
You might think that nothing bad will ever happen to you, but it will. Even on the signup form for my tiny niche Spam Resource newsletter list, even on the signup form for my friend’s now-closed jazz club, people seemed to be using scripts to feed email addresses to them on the regular. Not real signups. Real addresses, maybe. Why? I am not certain. Maybe it’s a free way of email validation. If a signup form uses email validation, maybe they think that by tickling your form, they can freely use the email validation you pay for. But the point is, bad guys will eventually try to game your signup form. It happens to everyone.

It might sound harmless to some folks, but it really isn’t. Every time I forget to re-enable CAPTCHA for my own signup form (I turn it off sometimes, while testing updates), I end up with more of these bot signups. Last time around, I was using Amazon SES to handle the mail, and after just one day, I got a warning from them that I was generating an excessive percentage of spam complaints. (Not a ton, but small complaints against small volumes lead to big percentages.) I have double opt-in enabled, but spam enough people with unwanted signup confirmation emails, and you’re still going to end up with a problem. Leave this unchecked, and it most certainly will lead to domain reputation and deliverability issues.

Double opt-in blocking and complaint issues can still happen, if your signup process is aggressive and generates high volumes of confirmation emails. This is something that I dealt with all the way back in my Digital River days, where I designed the original “Name Capture Technology” (NCT) service used by Norton, PKWare, Nuance and many other companies to manage double opt-in registration for free and trial software downloads in the early aughts. (Though long dead, there are lots of links to it still out there in the wild. It was quite popular.)

I think, back then, we didn't make the signup process clear enough. Lots of people would put in garbage initially, meaning that they never receive the confirmation email because they didn’t provide a real address, then have to come back and try again with a valid address before they could download a desired piece of software. Those garbage submissions caused problems for the NCT email service, as the server kept getting listed on Spamcop. (And people were brutally rude about it, too, trying to jerk-splain to me constantly that the only way I could fix it was by implementing double opt-in, which I was already doing.) If I were building it today, there are a LOT of things that I would do differently.

But, for today, here's a mix of best practice tips for email capture that I either implemented back then or more recently implemented for my own signup forms.

Start with: Double opt-in. Require an active response. Why? Because blocklist operators love to hassle people who don’t use double opt-in, and it’s in your best interest to avoid that noise. Many very popular email service providers utilize or offer double opt-in. I can’t even count high enough to tell you how many Mailchimp-hosted email newsletters have required double opt-in confirmation. I maintain that most subscribers are used to this step today, and ones who aren’t able to figure out how to click a confirmation link are quite possibly not that great as subscribers, anyway.

Building double opt-in yourself? Going way back, I’ve blogged my guidance on how best to build a double opt-in signup mechanism. Find that here. This is from 2006, but I think there's still value to be found here. I still see people implementing new signup mechanisms for websites with hackable URLs or 100% image confirmation emails.

Remember that double opt-in is not legally mandated here in the USA. That doesn’t matter. It is extremely valuable for headache prevention. Oh, the stories I could tell.

Add on: CAPTCHA. I use Google Re-CAPTCHA on my signup form. Some might think it’s overkill, but see above. People were feeding bot garbage to forms I host even when I had double opt-in in place. A CAPTCHA does not stop a determined bad actor from doing a bad thing, but it adds cost and time and most bad actors are lazy. This is free and it works well for me.

You'll also want to: Reduce NHI (non-human interactions), preventing script signups and non-human clicks on the confirmation link. I block signup attempts from IP addresses listed as TOR exit nodes and IP addresses blocklisted by the Spamhaus XBL (Exploits Block List) as, in my personal experience, these tend to be sources of garbage signups. I require that Javascript be enabled (this won’t stop all scripted activity, though), and if I were starting over today, I might put it all behind Cloudflare, too. (Their protection, meant for denial-of-service attack protection, probably does pretty well to block some forms of low volume/individual bot attacks and traffic, too.)

Smart Ruby on Rails developer Garrett Dimon recommends looking at "tools like Castle, Sift Science, Cognito, E-Hawk, FraudGuard.io, and Akismet." I agree. I’ve observed folks using Sift and E-Hawk successfully. Akamai has a Bot Manager NHI mitigation service as well. These tools can potentially get spendy, but are especially valuable if you run a platform where you manage signups for multiple clients, like an email service provider or newsletter hosting service.

Garrett recommends “using logging to find bad patterns” and I couldn’t agree more. If you get enough traffic, there will be data just waiting to be interpreted into something useful to fight back against bad actors.

That's what some of these third party services do, too. You'll benefit from their ability to see the bigger trends in bad traffic.

Bonus: Register your email domain(s) for the Yahoo CFL (Complaint Feedback Loop, aka FBL). I find this so valuable. If a Yahoo user reports a Spam Resource confirmation email as spam, the Yahoo FBL system sends me an email report directly, based on the fact that I’ve registered my sending domains with them. This is a great early warning system. If you’re low volume, do like I do and just have the reports come directly to you. If that’s overwhelming, feed them into automation and process and report on them. Watch for spiking complaints. Did something break, or did somebody find a way past your defense and start submitting massive amounts of garbage to your forms.

A couple of caveats: FBLs don’t always send a report for every complaint and there’s nothing you can do about it, so if you’re getting reports, do keep in mind that you could be receiving more “this is spam” clicks than you realize. Also, note that lawyers instruct mailbox providers to redact the recipient information from spam reports, so be sure to include an encoded token of some kind, if you want to be able to figure out which email user complained. This can be handy to block further signup attempts from that address.

I wouldn’t implement a complete ban on proxy IPs and CDNs (Content Delivery Networks). Apple Private Relay and others will use these as proxies for legitimate clicks from real users. Go too crazy with that blocking and weird stuff can happen. (I see some BIMI logos blocked from displaying properly because a domain owner will host the SVG image on a CDN that aggressively attempts to block non-human interactions, as one example of excessive silliness.)

What about an email verification service? I use Alfred, myself, and I know that many other folks also like using email verification. Keep in mind what it actually does: it validates whether or not the verification service thinks an address is deliverable; it does not confirm permission and verification service historically work very poorly to prevent mail to spamtrap addresses.

What about using hidden fields? I’ve tested including hidden fields in my confirmation emails on the theory that only bots follow them. They don’t seem to get followed very often. Your mileage may vary.

What about delivering a confirmation code via email? Sending a code via email and then requiring a user to enter that code back into the website certainly checks the box as far as opt-in confirmation goes. Google and others do it, and the manual copy-and-paste step likely helps to eliminate the non-human response factor. I don’t have enough personal experience here to know of potential downsides, but it seems interesting and possibly useful.

What did I get wrong? Did I miss anything? What else would you add to this list? I'm sure you'll let you know in the comments below.

Thanks to Mike Auldredge, Shad Taylor and others for their thoughts and feedback that helped to inspire this post.
Post a Comment

Comments