Disturbingly detailed Velocity: $string.substring(0,$string.length()) isn’t doing what you think

SanfordWhiteman
Level 10 - Community Moderator
Level 10 - Community Moderator

Memory still matters!

RAM sticks are cheap and plentiful, and it’s been over a decade since everyday users needed to worry about memory usage on their desktop or laptop.

Yet that’s not the case on the server side. There’s still a pretty low ceiling for per-process/per-tenant RAM allocation in traditional systems[1] and for SaaS apps that don’t explicitly charge you for memory usage.

A leaky or hungry process — remember, even one line of code can be hungry if it’s executed 100,000 times in rapid succession, let alone simultaneously — can still put the hurt on a server.

Let’s examine, with memory in mind, this strange snippet of Velocity from a recent Marketo Nation post:

#set( $total = "abcdef" )
#set( $stringLength = $total.length() )
#set( $totala = $total.substring(0,$stringLength) )
## some stuff using $totala instead of $total... 

The person who posted it didn’t write it: it was pasted from somewhere, so I’m not calling them out specifically. They didn’t even know what it was supposed to do, just figured it was important to set $totala and use that instead of $total.

So let’s try to explain $totala in plain language (or as close as we can get):

1. Get the number of Java chars in $total.[2] Store that number in $stringLength.
2. Start reading from the 0-th index in $total (remembering that character indexes start from 0!).
3. Read up to, but not including, the character at $stringLength.
4. Store all the characters you read in $totala.

So, in the specific case where $total is “abcdef”:

1. Store the number 6 in in $stringLength.
2. Start reading from the 0-th index “a”.
3. Read up to the 5th index, “f”.
4. Store “abcdef” in $totala.

Hmm.

So this is seemingly creating a new string variable, with the identical characters as the old string, and assigning it to a new variable name.

But that’s not actually what’s happening.

In fact, the result is no different from assigning another variable name to the original reference — in other words, merely creating an alias for $total. You might as well have done this, as it’s shorter and self-explanatory:

#set( $totala = $total )

But here’s the interesting part.

This next piece of code does create a new string $totalb , which also contains the characters “abcdef” but isn’t just an alias, it’s truly a different string of characters at a different memory location:

#set( $totalb = $total.substring(0,$math.sub($stringLength,1)) + $total.substring($math.sub($stringLength,1),$stringLength) )

This alternate approach splits the substring-ing into 2 parts. The 1st part is everything but the last character, the 2nd part is only the last character, then it concatenates the 2 parts. But while the final strings have exactly the same characters, it uses twice the memory as the first approach.

Now you may not care about the difference, but if you want to get deep inside the Velocity world, you have to understand why this happens.

 

Velocity is (in large part) Java. And Java is smart.

Velocity is itself a Java app; Velocity code is ultimately translated into Java. But the precise Java code that’s generated may or may not take advantage of Java's memory optimization techniques.

In Java, strings (java.lang.String objects) are immutable. This means if the string firstname is set to “Sandy”, you can’t later set only the first character to uppercase “T”. This code may appear to be only “replacing” the first letter...

String firstname = "Sandy";
firstname = "Tandy";

... but no. In fact, a chunk of memory is allocated for the 5 characters S-a-n-d-y, then another chunk is allocated for the 5 characters T-a-n-d-y. You can't do an in-place update of a string. Again, the code is correctly reassigning firstname, but under the hood, the original string and the new string both exist.

For a certain period of time, then, Java can be using much more memory than you think. Periodically, Java’s garbage collector kicks in and can determine that “Sandy” is no longer referenced in the above example, deallocating its memory. But until the GC comes around, you can have runaway memory use, impacting performance or even crashing the Java engine.

Creating a large number of strings in a short time – where the thresholds of large and short depend on overall memory and GC settings – can be a catastrophe, and you may not even think of yourself as “creating” strings when using a single variable like firstname.

 

String interning to the (partial) rescue

To partially alleviate the memory dangers of string immutability, Java uses a method known as string interning.

Here’s how that works: when you set a string to a certain value, if that exact value is already pointed to by another variable, Java just points to the original block of memory rather than allocating new memory.

This only applies to the exact same character sequence. So here, only 3 characters need to be allocated:

String string1 = "yes";
String string2 = "yes";

The 1st “yes” is is set and also interned. When you set the 2nd “yes”, Java looks in its string pool, sees “yes” is already there, and creates another pointer to it.

If string2 were “yep” or “Yes” though, interning wouldn’t help. Has to be the exact same string.

While interning isn't a cure-all, you can see how it’s a huge help when many string variables are set/reset/reused over time, but the number of unique values is much lower.

Interning doesn’t always kick in even when it theoretically could be. There are cases where Java is unable to understand that interning is possible, just because of the way code is structured. One example – to finally return to the above situation! – is when the string is concatenated from other substrings.

So that’s why this code doesn’t use interning, using 24 bytes (each 6 characters = 12 bytes):

#set( $totalb = $total.substring(0,$math.sub($stringLength,1)) + $total.substring($math.sub($stringLength,1),$stringLength) )

While this code reuses the existing 12 bytes:

#set( $totala = $total.substring(0,$stringLength) )

Because the 2nd approach reuses the existing bytes, it’s no different, finally, from:

#set( $totala = $total )

 

Does this matter for Marketo?

It does.

Remember how interning avoids memory catastrophes when the same string value is used over and over? Well, that’s what happens when you send a batch of 100,000 emails and each email runs Velocity like:

#set( $industry = $lead.Industry__c.toUpperCase() )

You’ll have a relatively small number N of unique Industry values in your database, and since interning will be used, the server only needs enough memory for N strings, instead of 100,000 strings.

The more convoluted your code, though, the more likely it is that Java will not be able to recognize that your strings are intern-able. As a result, you can slow down your Marketo instance just by having clunky code.

 

NOTES

[1] That is, not cutting-edge distributed systems which support near-infinite horizontal scaling, but systems where processing is still homed on a discrete server or servers.
[2] UTF-16 code units, a.k.a. 2-byte code units.

1175
0