#wellactually: email addresses are case-sensitive, but proceed as if they're not

SanfordWhiteman
Level 10 - Community Moderator
Level 10 - Community Moderator

đź“Ł This note from 2018 was moved from another section to be referenced in an upcoming post.

This one's good for getting your know-it-all on.

Despite it being commonplace to “fix up” email addresses by lowercasing them — or, in financial/government contexts, uppercasing them — email addresses are clearly defined as case-sensitive in the only standard that matters.

RFC 5321 is unequivocal:

The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith".

When an IETF RFC uses the keyword “MUST” it means business: you can't connect an SMTP server to the internet and claim it’s standards-compliant (as they all do) if it doesn't treat mailbox local-parts as case-sensitive.[1]

(The local-part is the left-hand-side of an address, to the left of the @ symbol; the right-hand-side, the domain, is necessarily case-insensitive because that’s how DNS works. So sandy@teknkl.com is the same as sandy@TEKNKL.COM but â€” through the eyes of the accepted standard — isn't the same as SANDY@teknkl.com.)

So the standard is clear… and yet, here we are. Nobody in the actually-using-email world can afford to treat addresses as case-insensitive. If we did that in martech, our deduping would go haywire: someone who has the habit of typing ProfessorLonghair@bayou.com couldn't be deduped against professorlonghair@ the same domain. CSVs in all-caps would create 1000s of new leads. And so on.

Admittedly, ignoring case in an offline database (like Marketo) can't be an RFC violation per se. The RFC sets rules for SMTP servers, and a server can't know what an address used to be, it just can't make any changes of its own. But lowercasing/uppercasing in a database designed to send email surely violates the spirit of the related RFC.

Either way, at some point almost everyone started treating addresses as case-insensitive. The fact that most default SQL database collations are case-insensitive probably fueled this consensus. Caught a bug in my own software today (it happens) where I had forgotten to do a case-insensitive comparison, i.e. I had accidentally followed the standard! So that's the state of things.

Some say you should treat addresses as case-preserving as opposed to case-sensitive, meaning you don't change IStillUse@AOL.COM to istilluse@aol.com but you still consider it a dupe of iSTilLUSE@aol.com. This doesn't make any sense, though. Once you recognize that the two may represent different addresses, you're arbitrarily choosing the first one in your system as the right one, when the second one is just as right. Just give up at that point and lowercase ’em.[2]

 
NOTES

[1] 5321 does admonish against new technologies being, er, technically correct:

However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.

[2] Sole reason I can think of to preserve the case as entered on forms is you give the recipient the reassurance that you got the info from them in the first place, because they always fill out forms that way. Doubt this would have any measurable effect on engagement, though.

3747
1
1 Comment