Add basic sanitization by removing Unicode control characters, whitespaces from Email Address field

0 Likes

Add basic sanitization by removing Unicode control characters, whitespaces from Email Address field

Today, if a Marketo user is entering email addresses into their system, there is no sanitization of characters that may not be allowed by RFC 6530. While the majority of Unicode is technically valid for an email address, the standard itself notes:


"The local parts of those addresses MAY be made up of any ASCII characters except the control characters that RFC 5321 prohibits, although some of them MUST be quoted as specified there."


If we cross-reference this section of RFC 5321, you'll notice that it explicitly calls out that the specific characters noted are "control characters (US-ASCII 0-31 and 127; inclusive)". Likewise, any form of whitespace is explicitly excluded from the email spec: RFC 5321, Section 4.1.2 defines the syntax for mailbox, local-part, and domain, which excludes whitespace and RFC 5322, Section 3.2.3 details the allowed characters (atext) for atoms and dot-atoms, excluding whitespace.

 

Recently, I was working with a client who had a record's email address formatted as "test@example.com " instead of "test@example.com", which lead to email delivery failures. However, there is no way to query whitespace characters (in this case, Narrow No-Break Space (U+202F)) in Marketo, so there is no way to find and correct this issue.

 

In this case, it makes logical sense to strip all types of whitespace and control characters from the Email Address field. This prevents user error (which is common with email addresses in uploaded lists) and provides a better experience—while never violating any email standards. Could this please be implemented?