Be aware of decodeURIComponent() limitations (with any 1ˢᵗ or 3ʳᵈ party JS, not just Marketo forms)

SanfordWhiteman

Here’s a fun one. Open a URL that hosts a Marketo form and change the query string to:

?utm_medium=email&utm_campaign=new-years-discount-10%

Here’s how your form will look:

What the? Let’s check the browser console:

⮾️ Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)
    at b.exports (forms2.min.js)
    at h.getPageFields (forms2.min.js)
    at yourpage.html?utm_medium=email&utm_campaign=new-years-discount-10%
    at q (forms2.min.js)

Hmm, that’s not great. Granted, we didn’t URL-encode the param names or values, like a certain blogger always reminds us to. Could that be it?

﹥ decodeURIComponent("utm_medium")
⋖ 'utm_medium'
﹥ decodeURIComponent("email")
⋖ 'email'
﹥ decodeURIComponent("utm_campaign")
⋖ 'utm_campaign'
﹥ decodeURIComponent("new-years-discount-10%")
⮾ Uncaught URIError: URI malformed
     at decodeURIComponent(<anonymous>)
     at <anonymous>

Yep, that was it: decodeURIComponent throws a URIError if a value contains invalid percent-encoding. Make no mistake, a % must be followed by 2 hex digits (0-9 or A-F, case-insensitive) in a URL, per RFC 3986. That URL, as innocent + unambiguous as it looks, is invalid.

Marketo’s forms library — as do many others, we’re definitely not alone here! — still uses decodeURIComponent to parse query strings. Since it doesn’t wrap calls in try/catch or do any pre-fixup, a URIError is fatal.

Out of curiosity, let’s see what URLSearchParams does with the same query string:

﹥ const searchParams = new URLSearchParams(document.location.search);
﹥ searchParams.get("utm_campaign");
⋖ 'new-years-discount-10%'

Interesting. URLSearchParams — the new(ish) native way to read and write query strings — is forgiving of the dangling % and doesn’t throw an error. It obeys Postel’s Law, in other words, accepting lightly-broken input but not producing broken output.

**But `decodeURIComponent` isn’t completely strict, since it tolerates unencoded input**

It’s tempting to conclude decodeURIComponent only supports output that would be generated by its companion function encodeURIComponent, but that’s not so. Check this out:

﹥ decodeURIComponent("😊")
⋖ '😊'

decodeURIComponent is cool with an unencoded emoji, even though that’s also invalid in a URL (it should be %F0%9F%98%8A)!

So it’s strict when input is incorrectly percent-encoded: it also fails on, say, trailing %F or %AZ anywhere in a string, since those aren’t % followed by 2 hex digits. But it’s loose when a value simply isn’t percent-encoded.

Note URLSearchParams is consistently loose, also tolerating the unencoded emoji:

﹥ new URLSearchParams("test=😊").get("test")
⋖ '😊'

Protect `decodeURIComponent`-dependent parsers from bad percent-encoding

Later, I’ll show how to modify a parser that uses decodeURIComponent so it’s more URLSearchParams-ishly tolerant of bad encoding (though you should migrate to URLSearchParams itself, it’s great).

But what if you didn’t write the parser, so you can’t modify it — and say you accidentally sent out a bad URL and 1000s of people are about to land on your LP?

Not to worry, you have 2 great options and 1... let’s say, interesting option. Choose one and include it as high as possible in <head>.

Option 1: Proxy `decodeURIComponent` while still using it under the hood

This option adds a one-line “shim” to replace invalid sequences with valid equivalents before calling native decodeURIComponent. In my tests, it’s lightning-fast (to the tune of 2 million calls/sec) so you don’t have to worry about overhead. It also replaces the troublesome + sign in the same shot, so you don’t need to repeat that step.

window.decodeURIComponent = new Proxy(window.decodeURIComponent,{
    apply(target, thisArg, [encoded]) {
        const dUCSafeEncoded = encoded.replace(/(\+|%(?![0-9a-f][0-9a-f]))/ig, (match) => match === "+" ? "%20" : "%25" );
        return Reflect.apply(target, thisArg, [dUCSafeEncoded])
    }
});

Option 2: Overwrite `decodeURIComponent` to use `URLSearchParams` under the hood

This one is provided more as a curiosity. It overwrites decodeURIComponent to use the URLSearchParams parser. Short and sweet-looking, for sure. But URLSearchParams is designed to parse a full query string at once, not to be called for every component. There’s considerable overhead in parsing and getting the first value from the iterator. (We’re still talking ~400,000 calls/sec, so you’re really unlikely to feel any difference in MOps applications, but I can’t fully recommend it.)

window.decodeURIComponent = function(encoded){
    const uSP = new URLSearchParams("=" + encoded);
    return uSP.values().next().value;
}

Option 3: Canonicalize the query string upfront

This option leverages URLSearchParams’s loose input + strict output to in-place replace the query string (don’t worry, no refresh) with a properly encoded version. Any dangling % becomes %25[1], so native decodeURIComponent won’t have trouble.

const currentURL = new URL(document.location.href);
currentURL.search = currentURL.searchParams.toString();
history.replaceState({},"",currentURL.href);

Fix a `decodeURIComponent`-dependent parser

A ton of forms and analytics code still uses decodeURIComponent; again, Marketo users are not alone. Those query string parsers look like this:[2]

const queryParams = document.location.search
    .substring(1)
    .replace(/\+/g, "%20")
    .split("&")
    .reduce( (map, nextPair) => {
        const hasValue = nextPair.includes("=");
        const rawKey = hasValue ? nextPair.substring(0,nextPair.indexOf("=")) : nextPair,
              rawValue = hasValue ? nextPair.substring(nextPair.indexOf("=")+1) : "";
        const key = decodeURIComponent(rawKey),
              value = decodeURIComponent(rawValue);
        if( key ) {
          if( !map.has(key) ) { 
            map.set(key, value);
          } else if ( Array.isArray(map.get(key)) ) {
            map.get(key).push(value);
          } else {
            map.set(key, [map.get(key),value]);
          }
        }
        return map;
    }, new Map() );

Straightforward enough, right?

split query string on & to separate params
split each param on the first = to separate names from values
set a single value or append to an array of values (remember, ?a=one&a=two is valid)

About the plus sign thing

Perhaps not self-explanatory is replacing + with encoded %20. This is necessary because decodeURIComponent supports only RFC 3986-strict percent encoding, while in the wild you’ll see both RFC 3986 and x-www-form-urlencoded strings.[3] The latter uses + for spaces because it’s more concise, creating 30 years of confusion and counting!

To adapt that code to be URLSearchParams-ish, also replace %, when not followed by 2 hex digits, with %25:

 const queryParams = document.location.search
    .substring(1)
    .replace(/(\+|%(?![0-9a-f][0-9a-f]))/ig, (match) => match === "+" ? "%20" : "%25" )
    .split("&")
    .reduce( (map, nextPair) => {
        const hasValue = nextPair.includes("=");
        const rawKey = hasValue ? nextPair.substring(0,nextPair.indexOf("=")) : nextPair,
              rawValue = hasValue ? nextPair.substring(nextPair.indexOf("=")+1) : "";
        const key = decodeURIComponent(rawKey),
              value = decodeURIComponent(rawValue);
        if( key ) {
          if( !map.has(key) ) { 
            map.set(key, value);
          } else if ( Array.isArray(map.get(key)) ) {
            map.get(key).push(value);
          } else {
            map.set(key, [map.get(key),value]);
          }
        }
        return map;
    }, new Map() );

Should you fix it, though? 🤔

decodeURIComponent’s strictness has an upside. Provided you catch errors during testing, it tells you something’s wrong with whoever/whatever is creating your URLs.

Hiding the problem with new-years-discount-10% encourages broken “URL builders” to stay broken in worse ways, like failing to encode literal & as %26 (which will never throw a parsing error, it’ll just break reporting and/or on-page logic).

Might be worth enduring a post mortem if your team pays attention to encoding in the future.

Notes

[1] You could debate whether bad encoding should result in stripping the % or encoding the %. That is, it’s easy to say discount-10% was meant to be discount-10%25, but not easy to know whether newsletter-%signup was meant to be newsletter-%25signup or if the % was a typo. I personally respect URLSearchParams’s take on this, that the % was intentional.

[2] Okay, their code doesn’t usually use a Map but a regular JS Object. I couldn’t in good conscience publish code that mangles param order, so my example uses a Map.😛

[3] You don’t need to worry about this with URLSearchParams because it outputs + but accepts either %20 or + as input.

Be aware of decodeURIComponent() limitations (with any 1ˢᵗ or 3ʳᵈ party JS, not just Marketo forms)

But decodeURIComponent isn’t completely strict, since it tolerates unencoded input

Protect decodeURIComponent-dependent parsers from bad percent-encoding

Option 1: Proxy decodeURIComponent while still using it under the hood

Option 2: Overwrite decodeURIComponent to use URLSearchParams under the hood