
Has this ever happened to you? Because it just happened to me! In theory, I should be on top of this sort of thing, because I've been writing scripts and code to generate email messages since the late 1990s. But in practice, I sometimes take coding shortcuts that can have unintended consequences. And here's an example of that causing me to trip over my own two feet.
How Spam Resource Works
As you may or may not know, the Spam Resource newsletter is generated using tools that I built myself. It's all a bunch of shell scripts that handle the list signup process, tracking of subscriptions and unsubs, and then for the newsletter itself, automation that parses the blog's RSS feed to generate the newsletter body and wrap it up in an HTML email body, properly encode the body, add the email headers, extract the current valid subscriber list, and launch the email newsletter to each subscriber, tracking links with UTMs and tracking opens the usual way, using those horrible "spy pixels" that I don't actually think are all that evil.When last week's newsletter launched, I saw bounces starting to rack up. 313 bounces, to be exact, which made for just over a 21% bounce rate. Usually I get no more than 2-3 bounces with every send, normally from folks who have moved on from their marketing jobs and their B2B addresses are no longer valid. (This is how I know that 550 5.4.1 most often means "user unknown.")
When I dug into why the messages bounced, I found a recurring theme. All were rejected with: "5.6.7: SMTPUTF8 is required, but was not offered by host.” Why? What went wrong? Let me share what I found, as it might help folks who run into a similar issue in the future.
What causes SMTPUTF8 encoding-related rejections?
This error pops up when you try to send an email that contains international (characters outside the basic 7-bit Latin character set (ASCII), such as accented letters, non-Latin scripts, or special symbols) characters somewhere in the headers -- this could be in the sender or recipient addresses, subject line, or other parts of the email headers. Think of email addresses with accented characters, non-Latin scripts, or other special characters. To properly handle those, the sending and receiving mail servers need to support the "SMTPUTF8" extension, defined in RFC 6531.In this case, my use of an unencoded curly apostrophe (single quote) in the subject line meant that SMTPUTF8 is required for delivery to most domains, but that a number of receiving servers didn’t advertise support for it during the SMTP handshake. As a result, the message couldn’t be sent and was instead rejected with the error message "SMTPUTF8 is required, but was not offered by host."
How To Fix It
Don't be like me! Here's what to do to prevent (or fix) an issue like this.- Check and edit the email headers: If you control the message content, see if any part of the email headers contains characters outside the basic Latin alphabet (ASCII), such as accented letters, non-Latin scripts, or special symbols characters. This could be pretty much in any header field, but if your situation is anything like mine, the subject line is likely to be what needs attention. You’ll need to remove, replace, or properly encode anything you find here.
- Update your mail server: If you manage the sending server, ensure that SMTPUTF8 support is enabled and that you’re running a modern mail transfer agent (MTA) aka mail server. Just in case it’s a mail server issue, and not a header generation issue.
- Fix the software that generates the email headers: In my case, as mentioned above, I’m all about the shell scripts. Adding a little bit of perl to the script to properly base64 encode the subject line would solve the issue. (Quoted-Printable encoding would be OK, too.)
Proper Subject Line Encoding
How to do this properly is described in RFC 2047. The short version is, you'll want to encode the otherwise incompatible string using either base64 or Quoted-Printable, and it's relatively easy do, and lots of different modules and libraries for different languages and systems offer support for this. Like the MIME::Base64 perl module.In my case, adding a fix in my shell script was easy. This tiny little bit of code will do just what I need.
echo -n "=?UTF-8?B?$(echo -n "My fave 2600 game: Yars’ Revenge" | \
perl -MMIME::Base64 -e 'print encode_base64(<STDIN>, "")')?="
That'll turn "My fave 2600 game: Yars’ Revenge" into "=?UTF-8?B?UHLDvGZ1bmcgYmVzdGFuZGVu?=" -- now hiding that curly apostrophe in a blob of encoding, just what I need to prevent this issue from happening next time around.
That'll turn "My fave 2600 game: Yars’ Revenge" into "=?UTF-8?B?UHLDvGZ1bmcgYmVzdGFuZGVu?=" -- now hiding that curly apostrophe in a blob of encoding, just what I need to prevent this issue from happening next time around.
Which Mailbox Providers Block?
If you're wondering what mailbox providers will block mail due to this encoding issue, the top ten rejecting inbound email receivers in my own list were as follows:
- Mimecast
- Proofpoint
- Cisco (Ironport)
- Fastmail
- Yahoo
- Barracuda
- Apple iCloud
- ProtonMail
- Zoho
- AT&T
Many others blocked based on this as well; the 313 different SMTP rejections came from 100 different mail servers.
Things I'm Glossing Over Here
- Folding long subject lines: Failure to fold rarely results in deliverability issues, unless that encoded or raw subject line is 999+ characters long, so I don't really care.
- Domain names and hostnames: These are encoded differently; internationalized domain names should be encoded with punycode, which is a whole other thing that I’m not touching on today.
- The username portion of email domains: I assume, but do not know for sure, that if you're sending to a recipient with a username containing characters outside of the default 7-bit ASCII, that all servers involved in the sending and receipt of the message must support SMTPUTF8 and thus the email address should simply be encoded with UTF8. But don't quote me on that.
You might also look at this and think to yourself, why would I deal with any of this nonsense instead of just using Beehiiv for my email newsletter instead? And you'd have a valid point. Don't listen to me! I'm crazy.
1
Comments
Your encoding "may" break because of the "may" in rfc2047#section-2. Cite: "An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters."
ReplyDelete