In a recent post you learned that specific characters must be URL-encoded before storing String/Text fields in Marketo. You don’t need to encode the whole value, and other characters should be left alone for readability.
Below I show to do that kind of selective URL-encoding in 4 languages: JavaScript, Java, PHP, and C#, and how to decode in Velocity for correct email output.
I’ll cover decoding first because it’s so easy. Just do:
#set( $decodedStuff = $link.decode($lead.encodedStuff) )
$link.decode
doesn’t care if the entire input isn’t URL-encoded, as long as the URL-encoded parts are done right.
(This is how the URL decoding function works in all languages that I know of. And it makes sense: otherwise, a string like My%20love%20of%20🥨%20is%20twisted
could never be decoded, despite it having a single very clear meaning!)
In all cases, we’re replacing these characters with their URL-encoded equivalents:
So we expect this original value:
Longtime product user.👍
Hoping to get pricing for an enterprise contract.
To become this encoded value:
Longtime product user.%F0%9F%91%8D%0AHoping to get pricing for an enterprise contract.
(Gotta say it wasn’t so fun to get back into PHP nor into C#, which I’ve never actually used for a full app! But I do it all for you guys and/or your devs.😛)
JS sets the standard for simplicity with String#replace(regex, callback)
and encodeURIComponent
:
let pattern = /[%\r\n\u{10000}-\u{10FFFF}]/ug;
let replaced = original.replace(pattern, encodeURIComponent);
I went for callback-style in the other languages too, so you can easily see the differences.
Here we use Matcher.replaceAll(function)
and URLEncoder.encode
:
Pattern pattern = Pattern.compile("[%\\r\\n\\x{10000}-\\x{10FFFF}]");
String replaced = pattern.matcher(original).replaceAll( match -> URLEncoder.encode(match.group(), StandardCharsets.UTF_8) );
Note the Java regex is Unicode-aware by default, but it has that double-escaping requirement. Plus the UTF_8
hint is seemingly redundant but required.
Never going back to PHP professionally but it’s pretty good here. preg_replace_callback(pattern, callable)
and urlencode
:
$pattern = "/[%\r\n\x{10000}-\x{10FFFF}]/u";
$replaced = preg_replace_callback($pattern, function ($matches) { return urlencode($matches[0]); }, $original);
Deciding to use C# as my 4th example was… questionable. Turned out .NET is one of the few “modern” runtimes that doesn’t have Unicode-aware regexes yet. Instead, we have to look for 2 {Cs}
characters (surrogates) in a row, which implicitly means they’re encoding a character beyond U+FFFF. In turn that means we can’t use a simple character class []
but need to switch to alternation this|or|that
.
Then pattern.replace(string, MatchEvaluator)
does the trick:
Regex pattern = new Regex(@"%|\r|\n|\p{Cs}{2}");
MatchEvaluator callback = new MatchEvaluator((Match match) => {
return WebUtility.UrlEncode(match.Value);
});
string replaced = pattern.Replace(original, callback);
%
?You might wonder why %
is encoded to %25
, since that character wasn’t in our must-encode list. It’s because we can’t risk breaking on user input that looks URL-encoding-ish. $link.decode
, decodeURIComponent
, et al. will error out on this string:
Do you still do 50% off student subscriptions?
To pass it safely through a decoder it needs to be:
Do you still do 50%25 off student subscriptions?
In an earlier version of this post, I had an additional (r/R)eplace("+","%20")
in the callbacks for Java and C# and used rawurlencode
in PHP instead of urlencode
.
That’s because neither Java, C#, nor PHP correctly encodes the space character as %20
in their “frequently used” functions — only JavaScript does it correctly! and even though we aren’t replacing spaces in this particular case it felt better to have the languages be aligned. But decided to clip that out for brevity.
Also didn’t show imports for Java and C# (java.net.*
, java.util.regex.*
, java.nio.charset.*
/ System.Text.RegularExpressions
, System.Net
) but you’d probably figure those out, all things considered.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.