Here’s a fun one. Open a URL that hosts a Marketo form and change the query string to:

?utm_medium=email&utm_campaign=new-years-discount-10%

 

Here’s how your form will look:

SanfordWhiteman_0-1766992502407.png

 

What the? Let’s check the browser console:

⮾️ Uncaught URIError: URI malformed
    at decodeURIComponent (<anonymous>)
    at b.exports (forms2.min.js)
    at h.getPageFields (forms2.min.js)
    at yourpage.html?utm_medium=email&utm_campaign=new-years-discount-10%
    at q (forms2.min.js)

 

Hmm, that’s not great. Granted, we didn’t URL-encode the param names or values, like a certain blogger always reminds us to. Could that be it?

﹥ decodeURIComponent("utm_medium")
⋖ 'utm_medium'
﹥ decodeURIComponent("email")
⋖ 'email'
﹥ decodeURIComponent("utm_campaign")
⋖ 'utm_campaign'
﹥ decodeURIComponent("new-years-discount-10%")
⮾ Uncaught URIError: URI malformed
     at decodeURIComponent(<anonymous>)
     at <anonymous>

 

Yep, that was it: decodeURIComponent throws a URIError if a value contains invalid percent-encoding. Make no mistake, a % must be followed by 2 hex digits (0-9 or A-F, case-insensitive) in a URL, per RFC 3986. That URL, as innocent + unambiguous as it looks, is invalid.

 

Marketo’s forms library — as do many others, we’re definitely not alone here! — still uses decodeURIComponent to parse query strings. Since it doesn’t wrap calls in try/catch or do any pre-fixup, a URIError is fatal.

 

Out of curiosity, let’s see what URLSearchParams does with the same query string:

﹥ const searchParams = new URLSearchParams(document.location.search);
﹥ searchParams.get("utm_campaign");
⋖ 'new-years-discount-10%'

 

Interesting. URLSearchParams — the new(ish) native way to read and write query strings — is forgiving of the dangling % and doesn’t throw an error. It obeys Postel’s Law, in other words, accepting lightly-broken input but not producing broken output.

 

 

But decodeURIComponent isn’t completely strict, since it tolerates unencoded input

It’s tempting to conclude decodeURIComponent only supports output that would be generated by its companion function encodeURIComponent, but that’s not so. Check this out:

﹥ decodeURIComponent("😊")
⋖ '😊'

decodeURIComponent is cool with an unencoded emoji, even though that’s also invalid in a URL (it should be %F0%9F%98%8A)!

 

So it’s strict when input is incorrectly percent-encoded: it also fails on, say, trailing %F or %AZ anywhere in a string, since those aren’t % followed by 2 hex digits. But it’s loose when a value simply isn’t percent-encoded.

 

Note URLSearchParams is consistently loose, also tolerating the unencoded emoji:

﹥ new URLSearchParams("test=😊").get("test")
⋖ '😊'

 

 

Protect decodeURIComponent-dependent parsers from bad percent-encoding

Later, I’ll show how to modify a parser that uses decodeURIComponent so it’s more URLSearchParams-ishly tolerant of bad encoding (though you should migrate to URLSearchParams itself, it’s great).

 

But what if you didn’t write the parser, so you can’t modify it — and say you accidentally sent out a bad URL and 1000s of people are about to land on your LP?

 

Not to worry, you have 2 great options and 1... let’s say, interesting option. Choose one and include it as high as possible in <head>.

 

Option 1: Proxy decodeURIComponent while still using it under the hood

This option adds a one-line “shim” to replace invalid sequences with valid equivalents before calling native decodeURIComponent. In my tests, it’s lightning-fast (to the tune of 2 million calls/sec) so you don’t have to worry about overhead. It also replaces the troublesome + sign in the same shot, so you don’t need to repeat that step.

window.decodeURIComponent = new Proxy(window.decodeURIComponent,{
    apply(target, thisArg, [encoded]) {
        const dUCSafeEncoded = encoded.replace(/(\+|%(?![0-9a-f][0-9a-f]))/ig, (match) => match === "+" ? "%20" : "%25" );
        return Reflect.apply(target, thisArg, [dUCSafeEncoded])
    }
});

 

Option 2: Overwrite decodeURIComponent to use URLSearchParams under the hood

This one is provided more as a curiosity. It overwrites decodeURIComponent to use the URLSearchParams parser. Short and sweet-looking, for sure. But URLSearchParams is designed to parse a full query string at once, not to be called for every component. There’s considerable overhead in parsing and getting the first value from the iterator. (We’re still talking ~400,000 calls/sec, so you’re really unlikely to feel any difference in MOps applications, but I can’t fully recommend it.)

window.decodeURIComponent = function(encoded){
    const uSP = new URLSearchParams("=" + encoded);
    return uSP.values().next().value;
}

 

Option 3: Canonicalize the query string upfront

This option leverages URLSearchParams’s loose input + strict output to in-place replace the query string (don’t worry, no refresh) with a properly encoded version. Any dangling % becomes %25[1], so native decodeURIComponent won’t have trouble.

const currentURL = new URL(document.location.href);
currentURL.search = currentURL.searchParams.toString();
history.replaceState({},"",currentURL.href);

 

 

Fix a decodeURIComponent-dependent parser

A ton of forms and analytics code still uses decodeURIComponent; again, Marketo users are not alone. Those query string parsers look like this:[2]

const queryParams = document.location.search
    .substring(1)
    .replace(/\+/g, "%20")
    .split("&")
    .reduce( (map, nextPair) => {
        const hasValue = nextPair.includes("=");
        const rawKey = hasValue ? nextPair.substring(0,nextPair.indexOf("=")) : nextPair,
              rawValue = hasValue ? nextPair.substring(nextPair.indexOf("=")+1) : "";
        const key = decodeURIComponent(rawKey),
              value = decodeURIComponent(rawValue);
        if( key ) {
          if( !map.has(key) ) { 
            map.set(key, value);
          } else if ( Array.isArray(map.get(key)) ) {
            map.get(key).push(value);
          } else {
            map.set(key, [map.get(key),value]);
          }
        }
        return map;
    }, new Map() );

 

Straightforward enough, right?

  • split query string on & to separate params
  • split each param on the first = to separate names from values
  • set a single value or append to an array of values (remember, ?a=one&a=two is valid)

 

About the plus sign thing

Perhaps not self-explanatory is replacing + with encoded %20. This is necessary because decodeURIComponent supports only RFC 3986-strict percent encoding, while in the wild you’ll see both RFC 3986 and x-www-form-urlencoded strings.[3] The latter uses + for spaces because it’s more concise, creating 30 years of confusion and counting!

 

To adapt that code to be URLSearchParams-ish, also replace %, when not followed by 2 hex digits, with %25:

 const queryParams = document.location.search
    .substring(1)
    .replace(/(\+|%(?![0-9a-f][0-9a-f]))/ig, (match) => match === "+" ? "%20" : "%25" )
    .split("&")
    .reduce( (map, nextPair) => {
        const hasValue = nextPair.includes("=");
        const rawKey = hasValue ? nextPair.substring(0,nextPair.indexOf("=")) : nextPair,
              rawValue = hasValue ? nextPair.substring(nextPair.indexOf("=")+1) : "";
        const key = decodeURIComponent(rawKey),
              value = decodeURIComponent(rawValue);
        if( key ) {
          if( !map.has(key) ) { 
            map.set(key, value);
          } else if ( Array.isArray(map.get(key)) ) {
            map.get(key).push(value);
          } else {
            map.set(key, [map.get(key),value]);
          }
        }
        return map;
    }, new Map() );

 

 

Should you fix it, though? 🤔

decodeURIComponent’s strictness has an upside. Provided you catch errors during testing, it tells you something’s wrong with whoever/whatever is creating your URLs.

 

Hiding the problem with new-years-discount-10% encourages broken “URL builders” to stay broken in worse ways, like failing to encode literal & as %26 (which will never throw a parsing error, it’ll just break reporting and/or on-page logic).

 

Might be worth enduring a post mortem if your team pays attention to encoding in the future.

 
 
Notes

[1] You could debate whether bad encoding should result in stripping the % or encoding the %. That is, it’s easy to say discount-10% was meant to be discount-10%25, but not easy to know whether newsletter-%signup was meant to be newsletter-%25signup or if the % was a typo. I personally respect URLSearchParams’s take on this, that the % was intentional.

 

[2] Okay, their code doesn’t usually use a Map but a regular JS Object. I couldn’t in good conscience publish code that mangles param order, so my example uses a Map.😛

 

[3] You don’t need to worry about this with URLSearchParams because it outputs + but accepts either %20 or + as input.