A hackity-hack for “Ends With” and email domains

SanfordWhiteman
Level 10 - Community Moderator
Level 10 - Community Moderator

If you've been around the block with Marketo Smart Lists, you know there's no Ends With operator, only Starts With and Contains.[1]

This puts a damper on a common need: accurately searching for an email domain (@gmail.com, @example.co.uk) or TLD (firmographic clues like .edu, geographic ccTLDs like .cn).[2]

Some have attempted extravagant combos of Contains and Not Contains, which require a whole lot of prep just to determine that... they don't work. (Read my comments on the linked post for some examples of how such approaches are broken.)

There's a much easier way: maintain a custom field, here called Matchable Email, that always holds the value of {{Lead.Email Address}} followed immediately by two quotation marks "":

ss

ss

Then, to do a domain search, search that Matchable Email field for Contains @example.com"", which is equivalent to searching the original Email Address for Ends With @example.com:

ss

Pretty easy, right?

Why two quotation marks (“”)?

The key is to add a sequence of characters to the end of the email address that can never occur in the middle of the email address, so Contains @{{domain}}{{characters}} is functionally equivalent to Ends With @{{domain}}.

Finding those appropriate {{characters}} is a lot harder than it sounds. For awhile I was lazily appending $ to the end, because I like the fact that the dollar sign represents end-of-line in regular expressions, so it was easy to remember. But the email address "$ke$ha​@marketo.com$"​@gmail.com — note the quotation marks around the mailbox part, it wouldn't be valid without those — is a Gmail account, but would match Contains @marketo.com$.

Yes, RFC 5321 is just that generous. There are so many crazy-but-valid email addresses, however inadvisable it would be to use them in the real world, that it's hard to find something that, without exception, can only occur outside of a valid address and so can be used as your anchor point.[3]

I think I've found that something, though. Two quotation marks in a row "" can occur inside an email address, but they can never be preceded by a character that is a valid part of a domain name.

Let me explain.

First of all, as you may already be confused by this part, it's possible to have a quoted mailbox name (called a quoted-string in the standard). That's how you can add spaces on the mailbox side of the @: "sandy spacebot"@teknkl.com is a valid SMTP address.

You can also put quotation marks inside an already quoted mailbox name, but if you do so, you have to escape them with a backslash.  Thus "Clarence "Frogman""@henry.com" is not a valid email address, but if you escape the quotes as "Clarence \"Frogman\""@henry.com it is valid. Even though this address has two quotes in a row "" (see the characters right before the @?) they are by necessity preceded by a \.  And the \ can never be at the end of a domain name. 

Therefore you can accurately search the Matchable Email field for a string that Contains @gmail.com"", knowing that that sequence of characters cannot be found at the start or middle of the value, only at the end.

Enjoy!


Notes

[1] As a sometime database architect, I've never understood the technical reasoning and figure it must be just legacy-code-nobody-wants-to-touch syndrome. When searching strings, Starts With is faster than Ends With unless specific indexing is used; yet Contains and Ends With have equivalent performance — often terrible performance, don't get me wrong, but roughly the same either way. Plus, it's way easier to add indexing to speed up Ends With than it is to optimize Contains (an index on the reversed value in the first case, n-grams in the second case, FWIW). But here we are.

[2] My colleague EU points out that Marketo attempts to optimize a search for a domain-like pattern, one that begins with the character @, and turn it into an SMTP domain search. The problem is that it still doesn't work: The valid address "me​@gmail.com"​@outlook.com will (as we would expect given the concept of contains) match both Contains @gmail.com and Contains @outlook.com so it doesn't successfully emulate Ends With. It will also false negative on Contains @outlook.co, which is just plain wrong.

[3] The way to do this in a technically complete manner is to add an ASCII control character (like ASCII 30 RECORD SEPARATOR, one of my faves) which is never allowed, not even in quotes. But while you can append such a character with a specially hacked Change Data Value, searching for those characters is, unless it's just a one-time thing, effectively impossible. So we'll have to make do with "".

9407
17
17 Comments