Consolidate and strip extra whitespace using Velocity #defines

SanfordWhiteman
Level 10 - Community Moderator
Level 10 - Community Moderator
Recently I completed a Velocity audit for a client – a kind of best practices report card. 
 
They have an experienced Java developer, but he’s not a VTL specialist. So while most of the logic was good, a ton of extraneous whitespace was being output, exactly as-is, in the Text version of emails
 
Remember, Marketo’s Velocity config is rigorously space-preserving¹, so line breaks and spaces, except for just after a #directive, don’t magically disappear. (This is A Good Thing™ and vital to achieve proper layout, but you need to know about it.)
 
There were long sections like this:

 

## START Define UTM
    ## START Set Campaign
## Alumni
#if( $segment.equals("Alumni") ) 
#set ( $utmCampaign = "december-2019-alum" )

        ## Fellow
#elseif( $segment.equals("Fellow") )
#set ( $utmCampaign = "december-2019-fellow" )

        ## Board
#elseif( $segment.equals("Board") )
#set ( $utmCampaign = "december-2019-board" )

        ## General Fund
#else 
#set ( $utmCampaign = "december-2019-emails" )
#end
    ## END Set Campaign

## START Set Source
## MGH Fund
#set ( $utmSource = "appeals")

## START Set Medium for All
#set ( $utmMedium = "email" )
        
## END Define UTM

 

That excerpt alone produces 3 empty lines, and in the full token there are 20+ more. The Text version ends up looking pretty bad.
 
Thing is, indents and breaks are good programming practice — at least in languages where whitespace is completely ignored by by the compiler and is just used for clarity. 
 
But template languages (not just Velocity, but others of its ilk) tend to assume that anything other than code is meant to be output. That's a sensible default, yet in sensitive contexts — Marketo uses it for emails, but outside of Marketo VTL could be used for, say, CSV files which are even more vitally space-sensitive — you dream of a strip-extra-whitespace mode, where “extra” can be defined by the end user.
 
Luckily, we have a secret weapon. A #define block, a.k.a. Block Reference (blockref), can be used to “buffer” all the output from a section of VTL. Then you can keep part, or even none, of the original output, while still executing the code inside.
 
See, lines of code and lines of output inside a #define are not parsed until the moment the blockref is stringified — turned into a String. We just have to capture that moment.
 
Stringification happens:
 
  • When you put a ${reference} to the #define in an output line. (N.B. formal ${} notation is advisable when you’re in an output context, as opposed to a code context, where the reverse is true.
  • When you include the ${reference} inside a quoted string, say within a #set or #if directive.
  • When you deliberately call $reference.toString() from code.
 
That last bullet is the weapon of choice today.
 
Check out this code:

 

#define( $buffer )
None
  Of
    This
      Is
        Output
          Unless
            You
              Say 
                So

## a leisurely comment with whitespace before and after

#if( true )

  #set( $myDefinedVar = "my value" )
  Some other output

#end
#end

 

If that #define is all you have, it won’t output (nor compute/assign) anything. So if you try to output ${myDefinedVar} right afterward, it will be null.
 
But if you output the ${buffer} first:

 

${buffer}
${myDefinedVar}

 

Then you’ll see:

 

None
  Of
    This
      Is
        Output
          Unless
            You
              Say
                So



    Some other output


my value

 

Note how the linebreaks are preserved, and perhaps more interestingly how the words “Some other output” are indented with 4 spaces, not just 2, because the line above also was indented.
 
You can also see that $myDefinedVar has a value. That’s because when the blockref was automatically stringified, that included running the code that #set the global variable.
 
Now, let’s massage some of the whitespace out on-the-fly by deliberately calling toString() and doing a regex replace:

 

${buffer.toString().trim().replaceAll("(?m)^\s*\r?\n","")}
${myDefinedVar}

 

You’ll see:

 

None
  Of
    This
      Is
        Output
          Unless
            You
              Say
                So
    Some other output
my value

 

The regex above strips any empty lines (lines with just a linebreak, or with spaces and a linebreak), so it does a nice job of consolidating the superfluous input.
 
We could instead bring all the indents over to the left:

 

${buffer.toString().trim().replaceAll("(?m)^\s+","")}
${myDefinedVar}

 

You’ll see:

 

None
Of
This
Is
Output
Unless
You
Say
So
Some other output
my value

 

The regex in this case removes any string of leading whitespace, as well as empty lines. Of course this may be more extreme than you want, so tweaking the regex for your content is left as an exercise for the reader.
 
Finally (and this is the coolest hack IMO) we can just execute the code, and suppress output lines completely, by calling toString() inside a #set directive:

 

#set( $void = $buffer.toString() )
${myDefinedVar}

 

This outputs only the value of $myDefinedVar:

 

#set( $void = $buffer.toString() )
${myDefinedVar}

 

You see, Velocity executed the code in the #define when we called toString(). The String output was also stored in the variable $void, but we didn’t have any obligation to output it. (Note “$void” isn’t a special name in VTL, I just use it as a reminder that a return value is not meant to be used, as void indicates no return value in lots of other languages.)
 
This would’ve been perfect for the client code I posted at the very top, because that’s a just a chain of conditional #sets for later tokens — not meant to generate any output at all.
 
Overall lesson: spend time perfecting your Text parts, they’re part of your presence.
 
 
Notes
[1] Technically it’s because of Marketo’s Velocity version, not just config, because 1.7 doesn’t let you tweak the space gobbling setting. 2.0 does have different options, but if the back end is ever upgraded, the correct move would be to enable the legacy 1.x behavior, because that’s the right fit for text in general.
4411
0