When “technically valid” goes wrong: don’t put leading spaces in your Marketo hrefs, or you’ll lose click tracking

SanfordWhiteman
Level 10 - Community Moderator
Level 10 - Community Moderator
tl;dr: Leading and trailing spaces are valid in an <a href> and the link will still work for the end user. But in a Marketo email, an accidental leading space means the link won’t be tracked.

 

Without reading the official standards, can you describe the differences between a valid URL and a valid href? How about between the href attribute of an HTML A tag and the href IDL attribute of its Location object?

 

My guess is there only a few people in the world who can recite these off the top — members of WHATWG or W3C. I certainly don’t know all this stuff by heart, but reading standards is fun.

 

Anyway, all these are valid A tags that link to the same destination URL:

<a href=" https://www.example.com">I am valid.</a>
<a href="   https://www.example.com   ">So am I.</a>
<a href="https://www.example.com ">Me too.</a>
<a href="https://www.example.com">And so (obviously!) am I.</a>

But only the last 2 will be tracked by Marketo.

 

Hold up: those spaces are valid?

Indeed.

 

The href HTML attribute is defined as “a valid URL potentially surrounded by spaces.” After stripping leading and trailing spaces, it must be a valid URL string, but the spaces themselves are fine.[1]

 

In other words, a URL can’t start or end with spaces. But even though an <a href> becomes a URL by design, the href itself can have spaces.

 

An even deeper detail is that when an <a> is parsed into a Location object, the Location object’s href property won’t have spaces. This is easy to demonstrate in the browser...

> document.links[0].getAttribute("href")
⋖ ' https://www.example.com'
> document.links[0].href
⋖ 'https://www.example.com/'

... but was difficult to find in the spec(s).

 

Finally, I found that a Location object is said to have a relevant Document, and any Document has a URL. That URL is  derived using the Basic URL Parser, which explicitly has the 3rd step:

3. Remove any leading and trailing C0 control or space from input.

So one thing with the name href can have spaces, while another type of href cannot. Confusing!

 

Back to the Marketo problem

So why are links with leading spaces left untracked? (And yes, I learned about this when a client messed up a big send this way.)

 

Because Marketo checks only the raw href attribute to see if something is a tracking-worthy link. If it doesn’t start with a sequence of letters followed with a colon — that includes not just http: and https: but also tel: and such — it’s thought to be some other kind of <a>, like a jump link within the email body, which shouldn’t be tracked.

 

Is this a bug? Possibly, but ultimately it’s an application-level decision. But worth fretting about rather than just keeping in mind? To me, that’s a no.

 
NOTES

[1] I’m not sure exactly why surrounding spaces are allowed — even haven’t been around that long! Maybe it’s in the W3C mailing list archives from 20 years ago, but I’ve got stuff to do.☺

2146
0