2021-06-07-01_45_14-ny-sfdcrj---Remote-Desktop-Connection---__Remote[1].png

The Marketo ↔︎ Salesforce “URL” type supports Unicode: make sure your data warehouse knows this!

SanfordWhiteman
Level 10 - Community Moderator
Level 10 - Community Moderator

It’s well-established that SFDC’s URL field type isn’t up to the task of storing real-world URLs, since it maxes out at 255 characters.[1]

What isn’t discussed much, if at all, is that those 255 characters aren’t limited to ASCII. Indeed, a URL field can contain an IRI, a.k.a. URL with Unicode characters that haven’t been percent-encoded. (And it might be smart to store IRIs: as I mentioned in the recent post on decoding URLs for attribution, IRIs are space-efficient.)

It bears repeating that there’s a clear career advantage to knowing your way around text encoding. You don’t want to be the one — and I’ve been the one myself, to my lasting shame — saying “Oops, we’ll never know what those ???? or ���� or ÿÿÿÿ characters were in the original data.”

When you pull data into your on-premises db, you need to know what you’re gonna get.

 

Don’t make an A— out of U(nicode) and Me

Earlier today I was auditing the column types in one of my clients’ SQL databases, which they use to store backed-up SFDC data. (Not Marketo-only data in this case.)

They were exceeding MSSQL’s max row length of 8060 bytes on occasion, so the goal was to find some columns that were mistakenly using NVARCHAR, when they could’ve just used VARCHAR based on the data being stored.

NVARCHAR takes 2x the bytes to store ASCII data. So if you know — and you do have to be 100% sure! — that all values are ASCII-only, now and forever, you can switch ’em to VARCHAR.[2][3]

General examples are Phone fields and Email fields, where the business meaning of the fields allows you to validate that they are ASCII-only on the way in. An SFDC-specific example is an ID type field, which is always 18 alphanumeric ASCII characters. You’re wasting 18 bytes by using an NVARCHAR for those.

I found lots of low-hanging narrowable fruit, but then also discovered one field that needs to go in the other direction:

2021-06-07-01_45_14-ny-sfdcrj---Remote-Desktop-Connection---__Remote[1].png

 

The database schema was automatically created by the (commercial) software that backs up SFDC, so it’s not anyone’s fault internally. But if any salesperson or API integration happened to put an IRI into the Website field, SFDC would let that happen, and then data would be lost on the way into the warehouse.

So for this field value in SFDC:

image-2

 

You’d get this value in the database:

image-4

 

Luckily, I couldn’t see a single Contact where this had happened yet. But if they start deliberately storing IRIs to fit more in the 255-character limit, the SQL column must be widened to NVARCHAR(255).

So keep this case in mind and/or share it with your DBA!

 

NOTES

[1] Which also affects the corresponding Marketo type, unfortunately.
[2] This client is using SQL Server 2017. SQL Server 2019 introduced VARCHAR columns with UTF-8 support, but jury’s still out on whether UTF-8 VARCHARs are even a good idea.
[3] “Switch” meaning create a new column, copy the values, drop the original column, and change the column name. Simple, but not actually a single step.

222
0