In Marketo String/Text fields, surrogate pairs must be URL-encoded, and you should encode line breaks

SanfordWhiteman
Level 10 - Community Moderator
Level 10 - Community Moderator

We like to think these are synonyms:

  • the value you tell Marketo to save
  • the value Marketo saves to the database
  • the output of a {{lead.token}}

But in fact, they can differ in interesting and largely undocumented ways.

 

Sometimes the saved value is truncated when the platform encounters a character it doesn’t like. Sometimes the saved value is what you expect, but the corresponding {{lead.token}} replaces original characters with new ones on the fly. And the behavior varies based on whether the update was sourced from a form, the API, or the Marketo UI.

 

The known offenders are:

  • line breaks: that is, real U+000A, not HTML <br> tags
  • surrogate pairs: any Unicode character requiring 2 paired bytes, such as common emojis like 👍 and 😛 and fancy arrows like 🡲 and 🡘 (plus hundreds of thousands of others, though most are archaic languages you’re unlikely to see in marketing data)

 

Line breaks are turned into spaces in {{lead.tokens}}

This factoid has been lurking in the corners of Marketo Nation for awhile. Someone will mention it but add the disclaimer “it must just be my instance?” Or they’ll try to use Velocity replaceAll() to turn line breaks into <br>s (the right move) but conclude “it didn’t work, my code must be wrong.”

 

Well, it didn’t work because the line breaks aren’t there. Marketo replaces them with standard spaces (U+0020). That’s why I provided that code to show every codepoint the other day, to prove it to myself and let you do the same in your instance.

 

If you have a field whose actual value is:

This once
had some
line breaks.​

 

That’ll display as expected in the Lead UI:

SanfordWhiteman_0-1697492857306.png

 

It’ll also maintain the line breaks in your CRM and when you fetch via the REST API, because the actual database value has line breaks.

 

But both as a {{lead.token}} and in Velocity, it’s gonna be:

This once had some line breaks.​

 

That’s it, you can’t reverse it to the actual database value. And that’s the, uh, breaks. Because it means you can’t format it in an email as originally entered.

 

Surrogate Pairs in an API or UI update: Value is truncated immediately before the first SP

This behavior is rarely, if ever, noted! Say you’re using the REST API (not a Marketo form) to update leads and a lead typed this in a Comments box:

Longtime product user.‌‌👍
Hoping to get pricing for an enterprise contract.

 

That value will be permanently truncated before the emoji. You’ll only get this:

SanfordWhiteman_1-1697492857309.png

 

And the API call won’t throw an error, either:

{
  "action" : "updateOnly",
  "lookupField" : "id",
  "input" : [{
    "id" :  16526262,
    "comments__c" :  "Longtime product user.\ud83d\udc4d\nHoping to get pricing for an enterprise contract."
  }]
}​
{
  "requestId": "15f52#18b35d2945b",
  "result": [
    {
      "id": 16526262,
      "status": "updated"
    }
  ],
  "success": true
}​

 

The only evidence that something went weird is the Change Data Value, which has “Missing history details”.[1] The requested value and actual value differ, so logging of the new/old value goes haywire:

SanfordWhiteman_2-1697492857258.png

 

If you make the same change in the Lead UI, it appears to work (no “Missing history details” but a standard Change Data Value😞

SanfordWhiteman_3-1697492857289.png

 

SanfordWhiteman_4-1697492857328.png

 

Yet when you refresh, you’ll see it’s truncated just like it is via API:

SanfordWhiteman_5-1697492857342.png

 

Surrogate Pairs in a form fill: SPs are removed

With a true Marketo form fill, the behavior is more gentle. If you put that same value in a Textarea:

SanfordWhiteman_6-1697492857329.png

 

Then the Filled Out Form activity shows the value with the SP removed (but not otherwise modified/truncated):

SanfordWhiteman_9-1697493624284.png

 

And that’s accurate, as the stored value preserves everything but the SP:

SanfordWhiteman_8-1697492857312.png

 

So not too destructive. But nevertheless if someone leaves a 👍 or ☹️ or 😊 in a field, I expect to see it in Marketo and in our CRM.

 

Solution: URL-encode the known offenders only

The solution is to URL-encode any SPs, line breaks, and (this will make sense when you think it through) the literal % character, leaving all other characters alone:

Longtime product user.%F0%9F%91%8D%0AHoping to get pricing for an enterprise contract.​

 

My next post will show how to do that selective URL-encoding in JavaScript — it’s simple and easily ported to other languages — and how to decode in Velocity for accurate output.

 

NOTES

[1] This error is mentioned in one official doc but the explanation there is not correct, or at least not anymore!

423
0