Sanford Whiteman

Splitting delimited strings in a less-smelly way (the “Header String” way)

Blog Post created by Sanford Whiteman on Jan 27, 2019

This JS string-splitting approach is a sure code smell, but I see it all the time on LPs:

 

var partsOfString = stringifiedLeadInfo.split("|");
var firstName = partsOfString[0];
var lastName = partsOfString[1];
var companyName = partsOfString[2];
var phoneNumber = partsOfString[3];
/* ... and so on and so on... */

 

 

Presumably stringifiedLeadInfo when the code was first written was a string like

 

Sandy|Whiteman|FigureOne, Inc.|212-222-2222

 

But this code is clearly fragile: there's no guarantee that the “magic numbers” 0, 1, 2, and 3 will continue to represent the same data (business-wise) inside the string.  If order shifts around at the source, or if a new data point is added in the middle, all these lines may need to change. That leads to bugs.

 

Instead, use what I call a header string. It's nothing more than a sample string containing the variable names in the currently expected order

 

var delim = "|",
    stringifiedLeadHeaders = "firstName|lastName|companyName|phoneNumber",    
    leadHeaders = stringifiedLeadHeaders.split(delim);

var leadInfo = stringifiedLeadInfo
                 .split(delim)
                 .reduce(function(acc,next,idx){
                   acc[leadHeaders[idx] || "Unknown_Property_" + idx] = next;
                   return acc;
                 },{});

 

 

Now, leadInfo is a simple object:

 

{
  firstName: "Sandy",
  lastName: "Whiteman",
  companyName: "FigureOne, Inc.",
  phoneNumber: "212-222-2222"
}

 

 

And you only need to change the header string if the data starts coming in differently. No other lines need to be added or changed. 

 

(I also made the delimiter a variable, ’cuz that could change too. And if new data points appear in the data before you add them to the header, they're given automatic names like Unknown_Property_5 to help signal the change.)

 

Please use this — or something along these lines, there are other methods with the same effect — in your code. It makes it less painful to read (scrolling through 25 variable assignments ain’t fun) and because of my curious specialty I spend a lot of time reading other people's stuff.

 

Do it in Velocity, too

The equivalent can be done in any language. Always better than magic numbers, IMNSHO. Here's the comparable VTL:

 

#set( $delim = "\|" )
#set( $stringifiedLeadHeaders = "firstName|lastName|companyName|phoneNumber" )
#set( $leadHeaders = $stringifiedLeadHeaders.split($delim) )
#set( $leadHeadersCount = $leadHeaders.size() )
#set( $leadInfo = {} )
#foreach( $next in $stringifiedLeadInfo.split($delim) )
#if( $foreach.index < $leadHeadersCount )
#set( $void = $leadInfo.put($leadHeaders[$foreach.index], $next) )
#else
#set( $void = $leadInfo.put("Unknown_Property_${foreach.index}", $next ) )
#end
#end

 

 

The main difference here (Velocity's verbosity aside) is that Java's String.split always treats the delimiter as a regular expression, not a simple string. Since the pipe symbol "|" has special meaning in regex-land, I escaped it as "\|" to treat it non-specially. Character class "[|]" would also implicitly escape the pipe.

 

(JavaScript's split(delim) also supports regexes, but the language can tell the difference between a "string" and a /regex/ so you don't need to escape strings.)

 

Better yet, don't give yourself the need to split

It could be argued that all string splitting is smelly, and this improvement is just code cologne.

 

Indeed, the best string-splitting code is the code you don't have to write, because you store multivalued fields as JSON or some other well-known, self-describing  format. Private formats with pipes, semicolons, or commas are to be avoided when possible. We'll never completely get away from them, though, and they’re admittedly efficient storage-wise.

Outcomes