Fun with email hashes

Hashing, if you didn't know, is a term generally meant to refer to converting a bit of text, a key, or a file into a specific type of value. In the context of email address hashing, we're talking about "one way hashing," meaning that you convert an email address into a hashed value, but once that's done, you can't convert it back to an email address.

Why do you ever need to hash email addresses? Because, if you're ever working with a marketing partner, or list rental, or perhaps even a newsletter sponsorship, you need to make sure that the partner or list owner, when sending that advertisement (or newsletter containing the advertisement), doesn't send it to people who have already unsubscribed from your emails.

If I sell widgets, and I'm sponsoring Bob's newsletter, and he's going to send out an email advertising widgets for me, to comply with the law (and best practices), neither Bob nor I will want that email to be sent to people who have already unsubscribed from my widget-related marketing emails.

Hashing lets you give your email list to the vendor/partner safely, so that they can scrub the list of any of your unsubscribes, without giving them the ability to access (or misuse) your email address.

In this scenario, both parties convert their email address lists (the unsub list from the first party and the email list for the second party) to lists of hashes. Then the vendor/partner can compare those two lists of hashes -- looking for any matches. Any matched hashes are representative of addresses that the second party needs to suppress from their mailing. They can decode which addresses those are by comparing them to their original list (usually keeping a copy of it with both email address and hash as separate fields). But they can't discern any other non-matching email addresses, meaning they wouldn't be able to successfully steal the first party's unsubscribe list.

It's not a question of mistrust; it's a question of data security. Even if the partner would never steal your email list; if they never have the data, there's no way any leak of that data could come from them.

There are tools out there that can help you work wish hashed email addresses, but if you're not afraid, your friendly, modern Macintosh computer can pretty well convert and handle hashed email address data on its own.

Here's just a taste to get you started: Open a terminal window (Utilities -> Terminal) and type this and hit return:

echo -n "example@example.com" | openssl dgst -sha256

It'll respond with "31c5543c1734d25c7206f5fd591525d0295bec6fe84ff82f946a34fe970a1e66" which is the SHA256 hash value of "example@example.com." If you needed it in MD5 format, or SHA1 format, you could just change that SHA256 value to either MD5 or SHA1 (in all lower case), and it'll respond with "23463b99b62a72f26ed677cc556c44e8" or "914fec35ce8bfa1a067581032f26b053591ee38a", as appropriate.

Starting there, you could pretty easily build a little shell script that converts an email list to a list of hashes. Here's a very simple example script that does that. It'd take your list of email addresses (in emaillist.txt) and convert them to a file called hashes.csv, and in that CSV file, you'll get two fields back -- first, your original email address, and second, the hash for that email address. (Make sure you give only the HASHES to the vendor, not the email addresses.)

cat emaillist.txt | ( while read EMAIL
do HASHED=`echo -n "$EMAIL" | openssl dgst -sha256`
echo "$EMAIL,$HASHED" >> hashes.csv
done )

There's a lot more you could do with this, but I think this is a good starting point, and it helps to remind folks that dealing with hashed email address data isn't that hard, doesn't necessarily require special tools, and that macs are secretly very powerful unix computers underneath the surface, with a whole bunch of utility just waiting for you to tap it. (And you should learn how to write shell scripts; they're very handy.)

Post a Comment

Comments