Email subject line encoding, decoded


Finally! It clicks for me, I now understand  how you encode fancy stuff in the subject line. For years, I'd just paste accented or extended-ASCII text into Gmail to have it create the encoded version for me, but I've never really stopped to look at what it was creating and how it works.

A bit of searching undertaken and it turns out, it's governed by RFC. Initially RFC 1342, but that appears to be supplanted by RFC 1522, so let's link to that one.

It's really as simple as this: Subject lines that use something other than 7-bit ASCII are encoded in three bits of stuff ("=?charset?encoding?encoded-text?=") corresponding to:

  1. Charset, meaning the character set. Like, UTF-8, for example.
  2. Encoding, meaning, how are we converting these characters into something transmittable via email headers? This is going to pretty much always be Quoted-Printable or Base64.
  3. And then the actual encoded text, down-converted into something safely transmittable via the method described above, that will be reconverted back into something human readable, assuming the receiving side MUA (mail user agent -- aka email client) is able to interpret it.
And that's how you turn something like "Thîs ís á sübjëçt lîñé! 😊" into 
"=?UTF-8?B?RndkOiBUaMOucyDDrXMgw6Egc8O8YmrDq8OndCBsw67DscOpISDwn5iK?="!

TeleMessage provided a code snippet that explains how to do this in PHP (but took their page down just before I published this blog post, sigh).

Italian ESP SendBlaster has a neat little widget that will encode the subject line for you.

And Steve Atkins has a decoding widget that will reverse that encoding for you upon demand.

This is one of those things that, while I am sure other folks figured this out many years ago, I never bothered to look closely at how it all worked. I had just assumed that a wizard did it. And if you're like me and never thought about this before -- well, now you know.

Post a Comment

Comments