A hackity-hack for “Ends With” and email domains

SanfordWhiteman
Level 10 - Community Moderator
Level 10 - Community Moderator

If you've been around the block with Marketo Smart Lists, you know there's no Ends With operator, only Starts With and Contains.[1]

This puts a damper on a common need: accurately searching for an email domain (@gmail.com, @example.co.uk) or TLD (firmographic clues like .edu, geographic ccTLDs like .cn).[2]

Some have attempted extravagant combos of Contains and Not Contains, which require a whole lot of prep just to determine that... they don't work. (Read my comments on the linked post for some examples of how such approaches are broken.)

There's a much easier way: maintain a custom field, here called Matchable Email, that always holds the value of {{Lead.Email Address}} followed immediately by two quotation marks "":

ss

ss

Then, to do a domain search, search that Matchable Email field for Contains @example.com"", which is equivalent to searching the original Email Address for Ends With @example.com:

ss

Pretty easy, right?

Why two quotation marks (“”)?

The key is to add a sequence of characters to the end of the email address that can never occur in the middle of the email address, so Contains @{{domain}}{{characters}} is functionally equivalent to Ends With @{{domain}}.

Finding those appropriate {{characters}} is a lot harder than it sounds. For awhile I was lazily appending $ to the end, because I like the fact that the dollar sign represents end-of-line in regular expressions, so it was easy to remember. But the email address "$ke$ha​@marketo.com$"​@gmail.com — note the quotation marks around the mailbox part, it wouldn't be valid without those — is a Gmail account, but would match Contains @marketo.com$.

Yes, RFC 5321 is just that generous. There are so many crazy-but-valid email addresses, however inadvisable it would be to use them in the real world, that it's hard to find something that, without exception, can only occur outside of a valid address and so can be used as your anchor point.[3]

I think I've found that something, though. Two quotation marks in a row "" can occur inside an email address, but they can never be preceded by a character that is a valid part of a domain name.

Let me explain.

First of all, as you may already be confused by this part, it's possible to have a quoted mailbox name (called a quoted-string in the standard). That's how you can add spaces on the mailbox side of the @: "sandy spacebot"@teknkl.com is a valid SMTP address.

You can also put quotation marks inside an already quoted mailbox name, but if you do so, you have to escape them with a backslash.  Thus "Clarence "Frogman""@henry.com" is not a valid email address, but if you escape the quotes as "Clarence \"Frogman\""@henry.com it is valid. Even though this address has two quotes in a row "" (see the characters right before the @?) they are by necessity preceded by a \.  And the \ can never be at the end of a domain name. 

Therefore you can accurately search the Matchable Email field for a string that Contains @gmail.com"", knowing that that sequence of characters cannot be found at the start or middle of the value, only at the end.

Enjoy!


Notes

[1] As a sometime database architect, I've never understood the technical reasoning and figure it must be just legacy-code-nobody-wants-to-touch syndrome. When searching strings, Starts With is faster than Ends With unless specific indexing is used; yet Contains and Ends With have equivalent performance — often terrible performance, don't get me wrong, but roughly the same either way. Plus, it's way easier to add indexing to speed up Ends With than it is to optimize Contains (an index on the reversed value in the first case, n-grams in the second case, FWIW). But here we are.

[2] My colleague EU points out that Marketo attempts to optimize a search for a domain-like pattern, one that begins with the character @, and turn it into an SMTP domain search. The problem is that it still doesn't work: The valid address "me​@gmail.com"​@outlook.com will (as we would expect given the concept of contains) match both Contains @gmail.com and Contains @outlook.com so it doesn't successfully emulate Ends With. It will also false negative on Contains @outlook.co, which is just plain wrong.

[3] The way to do this in a technically complete manner is to add an ASCII control character (like ASCII 30 RECORD SEPARATOR, one of my faves) which is never allowed, not even in quotes. But while you can append such a character with a specially hacked Change Data Value, searching for those characters is, unless it's just a one-time thing, effectively impossible. So we'll have to make do with "".

9181
17
17 Comments
Malik_Zafar2
Level 4

so if I am just starting out and creating the Matchable Email field, I would first need to run a batch to transfer all {{lead.Email Address}} of all records to that new field, correct? 

With very large databases, is it recommended to do this in waves instead of one massive batch?

SanfordWhiteman
Level 10 - Community Moderator

Yeah, I would do smaller batches, say 100K each.

Malik_Zafar2
Level 4

Sanford, what if I want to look at just the very end of an email address

for example, for some govt domains you have

@us.army.mil

@us.navy.mil

@us.airforce.mil

In this example, I only want to filter for domains that end in ".mil". 

I used the steps above, but that doesn't seem to work in this case. Or am I missing something?

Added screenshot of a smart list I created after I followed the steps above. I converted over the above email domains to the new "Matchable Email" field, and then ran a test search just to see if it would work but it showed no leads.

Screen Shot 2019-11-14 at 12.28.39 AM.png

SanfordWhiteman
Level 10 - Community Moderator

Can you show the value of Matchable Email for a sample lead you expect to match [contains]?

Once the value is populated there isn't really anything special about it, it's like any other match.

Malik_Zafar2
Level 4

So this is what I hope to happen

Matchable Email CONTAINS .mil""

And it would ideally show results like these:

bob.smith@us.army.mil

bugs.bunny@us.navy.mil

mike.jones@us.airforce.mil

(all examples)

however when I use the above logic "Matchable Email CONTAINS .mil""" I don't get any results back. 

If I simply did "contains .mil" for my filter, I'll get back results like

matt.miller@gmail.com

Sarah.milstone@yahoo.com

Which isn't what I want.

SanfordWhiteman
Level 10 - Community Moderator

What you're describing is exactly how it works. But I have to see if your field is being populated correctly. Is there a space between the quotes?

Malik_Zafar2
Level 4

Ahh, it worked. I had populated the field incorrectly.

Screen Shot 2019-11-14 at 9.35.27 AM.png

This is fixed now, but originally I had forgotten to add the quotes to the end of the New Value.

Thanks Sanford! This is amazing