In reference to this post
Arlen Walker 22/12/2016 2:33 PM (in response to Kenny Elkington)
Well, we don't have that particular concern...in our case we have someone reading the form fields, then auto-submitting a few thousand per day with a 13-digit hex number in the name field. It's easy enough to filter that out of a smart list, but I want to keep it from getting into the db in the first place. Marketo just lets it in, no apparent way to insert some server-side filter that just drops the record.
I am quite new and more a marketer/strategist than a technologist so could you please walk me through how to build those "cleaning up" smart lists please?
We had a "bot-attack" and many leads were create with fake email addresses. Many Chinese characters and many numbers.
I found on one of your comment (see above) that it was easy to use smart list to clean up this kind of bad data.
Could you please show me how you do it please?
(Unfortunately, I’m nothing if not a technologist; I’ll try to limit the technobabble.)
That URL holds the basic instructions for creating a smart list. When you get to the steps where you create and add filters to the list, that’s where the effort lies. There’s a bunch of available “pre-built” filters from Marketo, but you can also create your own. You can pick the fields you want to filter on, and the values you want to filter. For example, if you want to get rid of all leads without either a first or a last name, you create a filter for that field with “is not empty” operator, and they won’t show up in the list.
That’s the hardest part of the process -- finding the commonalities for the bad leads, or even better the commonalities for good ones. And it’s the most unique. What worked simply for us might not work so simply (or at all) for you, because all customer bases vary; use what you know (or can learn) about your own base to guide you while you look for the patterns. You might find, for example, that a bunch of bad leads use mail.ru email domains (likely, if your customers are US-based, but if you sell a lot internationally, remember mail.ru is the Russian equivalent of, say, Hotmail in the US). Unfortunately, Marketo doesn’t provide an “ends with” operator to filter out email domains directly, but you might use “contains” (or “not contains”) with “@mail.ru” to target them. This part of the process is trial and error. Setting up a smart list is easy enough, but getting the filters right is less so.
It’s an iterative process. You look for patterns, and every pattern you find cleans it a little (or, if you’re lucky, a lot.) Don’t limit yourself to just the form fields, either. You have the IP address that submitted the form, so look for IP’s that have lots of fake submissions and filter those out (especially if they have a batch of leads created within seconds of each other – use the “Created” field). Use the inferred country field to help identify fake leads from places you don’t sell to. We had a ton of fake submissions w/o a first name -- “is not empty” filter nailed them.
I think what’s usually considered best is beginning with the assumption the full list is bad and then creating smaller specialized lists from it for particular campaigns, using smart lists with filters to select the obviously good addresses. But the second time someone here mistakenly decided to use the big list, we realized that wasn’t a viable solution for us. So instead, we created smart lists to identify the fakes. Once we had a smart list that worked, we took that filter to the database and deleted them. IIRC, it took an hour or two, because we were limited to deleting a page at a time. Tedious, but so is gardening, yet you have to do it if you want flowers. Come to think of it, gardening is probably an apt analogy; you spend a little time every day weeding and you reduce the number of times you have to spend all Saturday digging in the dirt.
No process is perfect. The best processes involve both time-based rejections (not enough time elapsed between form send and submission for a human to read it and fill it out) and randomized values (stored on the server and sent out in the form – if on submission they don’t match it’s rejected) but AFAIK this isn’t available with Marketo. Captcha’s get some, honeypot fields get some. But they get fewer because the smart bots can get past them; nothing gets them all. The only thing that gets the rest is you, getting better at recognizing what real leads from real potential customers look like, helping you write better filters for your smart lists.
I should also note: once you work out what you consider to be really reliable filters for bad leads, you can put them into the form processing flow, so the fake leads get dropped before they get into the database (or maybe get a big deduction from their score, if you want to keep them around). In keeping with the gardening analogy, consider that a judicious application of weed-killer, or maybe dusting the roses for aphids.
Many Chinese characters
For this particular case, you could make use of Unicode script detection.
The "not-spoken-here" (I'm assuming from your phrasing that you don't do business in China or Taiwan) type of form spam is frustrating, since it's glaringly out-of-place to you as a human, but legit from a machine's perspective.
While Marketo Flows don't natively distinguish between Unicode script characters, you can plug in such detection using FlowBoost:
That'll check to see if the token contains characters from Han Chinese script.* If the response is true, just trash the lead.
Note: Velocity tokens, unlike Flow steps, can automatically swap in content based on the Unicode script(s) used in lead fields. But that would only apply if the lead is supposed to be your system, i.e. not for spam detection.
* Distinguishing Chinese Hanzi spam from legit Hanja or Kanji would require more steps, though if you do business in Japan and Korea you're less likely to dismiss Chinese glyphs as spam anyway.
oh gosh, that is amazing!!! thanks so much guys for taking the time to type all of that and help me out! Priceless!!!
Great, i know what Im gonna do now
thanks! and I might solicit you again in near future hihi
Hi Sandford - based on the comment above (and let me know if you'd prefer i open another thread) "Note: Velocity tokens, unlike Flow steps, can automatically swap in content based on the Unicode script(s) used in lead fields."
Are you saying that with a velocity token we could, for example, swap Japanese characters with latin letters in a form fill?
Not exactly sure what you mean by "swap." What's the exact problem you're trying to solve?
Ok this may actually be a dumb question because I don't know if the Japanese language works this way (of being mapped to english in any way?) so lets say Cryllic, essentially as someone types in their job title in their languages characters could we use a velocity token to display and store what they are typing as english? The problem we are trying to solve is demographic lead scoring on job titles in non latin language form fills (for text fields) - is there a way outside of adding every language they could possibly enter into our scoring model for every title using velocity tokens or flowboost?
Japanese does have a -- actually more than one -- Romanization method. Broadly, this process is called transliteration. It's not available from Marketo's Velocity install because the underlying 3rd-party Java libraries are not included.
But I think you're overstating what transliteration is. Transliteration is how you get from 建築家 to "Kenchikka". Not how you get from 建築家 to "architect".
If you want to use a translation library you can call the Google or Microsoft Translate services as a webhook.
You'd mentioned this setup explanation not being the "main thrust" in the linked post but still very helpful nonetheless, on both accounts! Thank you! https://blog.teknkl.com/detecting-the-language-of-a-filled-out-form/