SOLVED

Invalid CSV Escaping & PHP Serialization in form 'Form Fill' Activities in Bulk Activity Export

Go to solution
JoeDownham
Level 1

⚠️  This post has been edited by a moderator for accuracy.

 

When exporting Marketo Activity, specifically Form Fill activities, we have encountered two issues that cause problems when trying to reliably parse the exported CSV file.

 

1. 'checksumFields' Value Contains Unescaped Commas

 

Inside the attributes field, there is a Form Fields entry. This contains a checksumFields value which is a comma-separated list of field names. However, this is not properly escaped for CSV (RFC 4180). As a result:

  • Commas are interpreted as field delimiters.

  • CSV parsers break and produce misaligned columns.

  • Downstream systems cannot reliably import these exports.

Example:

"checksumFields":"FirstName,LastName,Title,Company,Email"

 Because of this, CSV readers see FirstName, LastName, etc., as separate fields.

 

2. 'Form Fields' is PHP Serialized, Not JSON

 

Unlike the rest of the attributes object (which uses JSON-like syntax), the Form Fields value is PHP serialized.
Example snippet:

""Form Fields"":""a:11:{s:6:\"module\";s:11:\"leadCapture\";s:6:\"action\";s:5:\"save2\";s:9:\"FirstName\";s:4:\"John\";s:8:\"LastName\";s:5:\"Doe\";s:5:\"Title\";s:7:\"Manager\";s:7:\"Company\";s:9:\"ExampleCo\";s:5:\"Email\";s:24:\"john.doe@example.com\";s:6:\"formid\";s:4:\"1628\";s:10:\"munchkinId\";s:11:\"000-XXX-000\";s:8:\"_mkt_trk\";s:53:\"id:000-XXX-000&token:_mch-example.com-1719224514646-57494\";s:7:\"formVid\";s:4:\"1628\";s:13:\"_mktoReferrer\";s:58:\"https://www.example.com/en/events/example-event\";s:14:\"checksumFields\";s:69:\"FirstName,LastName,Title,Company,Email,formid,munchkinId,_mkt_trk,formVid,_mktoReferrer\";s:8:\"checksum\";s:64:\"abc123def456ghi789jkl012mno345pqr678stu901vwx234yz567abc890def\";}""

Problems with this approach:

  • PHP serialization is not universally readable outside PHP.

  • Serialization outputs : and ; characters which don't interact well with CSV exports.

  • It introduces complexity for developers working in other languages.

  • It’s inconsistent with the surrounding JSON-style attributes.

 

Impact on Standard CSV Parsing

 

The current CSV export is not RFC 4180 compliant due to the unescaped commas within the 'checksumFields' data. This causes failures with any standard CSV parsing tools, which interpret these commas as new columns. We are saving the export exactly as provided by Marketo via the Bulk Activity Extract API — the issue exists in the raw export data.

 

To work around this, we have switched to using TSV exports for Activity Extracts, which avoid this issue because tabs do not appear within the serialized data.

Ideally, Marketo should ensure CSV exports follow the RFC 4180 standard by escaping commas correctly. Additionally, using JSON encoding for 'Form Fields' would bring consistency and improve interoperability.

 

 

I’m sharing this in the hopes it helps others encountering the same issues when working with these exports.

2 ACCEPTED SOLUTIONS
SanfordWhiteman
Level 10 - Community Moderator

You’re mistaken about RFC 4180 validation. The CSV is RFC-compliant and a proper CSV parser has no problem with it.

 

The attributes field is wrapped in unescaped double quotation marks. It’s not mandatory for a field to be wrapped in dquotes unless it contains characters that would create a delimiter collision. In this case, attributes absolutely does contain such characters, e.g. commas. That’s why it’s wrapped. Once it’s wrapped, the internal commas have no special meaning. It’s not true that “commas are interpreted as field delimiters.” 

 

attributes is a JSON-stringified object. JSON itself contains both unescaped dquotes (") and JSON-escaped dquotes (\"). In both cases the dquote is doubled to "" because that’s how you CSV-escape dquotes. As they’re now CSV-escaped, they do not function as delimiters; they’re the same as any other character. Likewise, as noted above, the commas within the value do not function as delimiters.

 

Note even if attributes were a simple string, all the above would still apply. It so happens that attributes is a JSON object that when parsed, has properties that need to be PHP-unserialized. The PHP-inside-JSON may be distracting, but it doesn’t change the fundamentals: the commas are not interpreted as field delimiters unless your CSV parser is badly broken.

View solution in original post

SanfordWhiteman
Level 10 - Community Moderator

What prompted my original post was the practical issue we encountered when using some standard CSV parsers — notably PHP’s native fgetcsv() and SplFileObject in CSV mode. In our case, rows containing checksumFields inside the Form Fields serialized data led to the parser treating the comma-separated values as separate columns, which caused headers like Email and FirstName to show up unexpectedly as subsequent column values instead of staying within the 'attributes' field. It seems this behaviour arose due to how the inner escaping interacts with parser expectations, despite the CSV being technically valid. I also tried a few other CSV parsing libraries, with similar results (clearly not the right ones!).

I found it hard to believe a function as venerable as PHP’s fgetcsv() didn’t understand basic CSV escaping.

 

So I put together this bit of PHP that shows it doesn’t have a problem with the Filled Out Form export, provided you pass the right arguments of course:

<?php
$row = -1;
$header;
if (($handle = fopen("sample_export.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, null, ",", "\"", "")) !== FALSE) {
      if(++$row == 0){
         $header = $data;
         continue;
      }
            
      $rowObj = array_combine(array_values($header), array_values($data));

      $attributes = json_decode($rowObj["attributes"]);

      $formFields = unserialize($attributes->{"Form Fields"});

      echo "Form fields for data row ${row}" . PHP_EOL;
      echo str_repeat("=",80) . PHP_EOL;
      var_export($formFields);
      echo PHP_EOL . PHP_EOL;
      
    }
    fclose($handle);
}

 

Sample output:

Form fields for data row 1
================================================================================
array (
  'module' => 'leadCapture',
  'action' => 'save2',
  'Unsubscribed' => 'yes',
  'Email' => 'test1@example.com',
  'formid' => '1357',
  'lpId' => '219',
  'subId' => '203',
  'munchkinId' => '410-XOR-673',
  'lpurl' => '//pages.vaneck.com/AUUnsubscribe.html?cr={creative}&kw={keyword}',
  'cr' => '',
  'kw' => '',
  'q' => '',
  '_mkt_trk' => 'id:410-XOR-673&token:_mch-vaneck.com.au-m9Nu6QH22eHRTnpEdjMP79qT5DFVtjbq',
  'formVid' => '1357',
  'mkt_tok' => 'G0rMYjuuJ3XgcjtB0WhKetYzKqQhwG39bjxPyzbC9jzvbMFiLtLCM9bYKgetx4Yed3NXNm9daAxGM3GVSiq5emxHbJeEV7Mr5VEHEcSS5RbFBUm4xrQBDuSRKi26gYq886R',
  '_mktoReferrer' => 'https://pages.vaneck.com.au/auunsubscribe.html?mkt_unsubscribe=1&mkt_tok=G0rMYjuuJ3XgcjtB0WhKetYzKqQhwG39bjxPyzbC9jzvbMFiLtLCM9bYKgetx4Yed3NXNm9daAxGM3GVSiq5emxHbJeEV7Mr5VEHEcSS5RbFBUm4xrQBDuSRKi26gYq886R',
  'checksumFields' => 'Unsubscribed,Email,formid,lpId,subId,munchkinId,lpurl,cr,kw,q,_mkt_trk,formVid,mkt_tok,_mktoReferrer',
  'checksum' => '757c309b14bc0ced72dc7969d852f77f8970bbcf43d2fbc8cc09a955cb3ae94d',
  'formServiceRequestId31337' => '4498#197294d1252',
)

Form fields for data row 2
================================================================================
array (
  'module' => 'leadCapture',
  'action' => 'save2',
  'Unsubscribed' => 'yes',
  'Email' => 'test2@example.com',
  'formid' => '1357',
  'lpId' => '219',
  'subId' => '203',
  'munchkinId' => '410-XOR-673',
  'lpurl' => '//pages.vaneck.com/AUUnsubscribe.html?cr={creative}&kw={keyword}',
  'cr' => '',
  'kw' => '',
  'q' => '',
  '_mkt_trk' => 'id:410-XOR-673&token:_mch-vaneck.com.au-3SCxxRf7HEf5mJ1ykuaUKQNb16wX9BEb',
  'formVid' => '1357',
  'mkt_tok' => '7FEuRL5uRKXPatf4HbW1yyYKGiQDn4CLivGRRjSgNhkahzC14EzjAgxNExh1pM359hLxh9DHa5mXyTqLCX33pmLA408Mf8ikggauyVb8iZU6UYQm0t6n6R6NGeMMn81k2iN',
  '_mktoReferrer' => 'https://pages.vaneck.com.au/auunsubscribe.html?mkt_unsubscribe=1&mkt_tok=7FEuRL5uRKXPatf4HbW1yyYKGiQDn4CLivGRRjSgNhkahzC14EzjAgxNExh1pM359hLxh9DHa5mXyTqLCX33pmLA408Mf8ikggauyVb8iZU6UYQm0t6n6R6NGeMMn81k2iN',
  'checksumFields' => 'Unsubscribed,Email,formid,lpId,subId,munchkinId,lpurl,cr,kw,q,_mkt_trk,formVid,mkt_tok,_mktoReferrer',
  'checksum' => 'f0766eea05c0ebd5d291a81657912451403964898f2cff659931c197d2a83101',
  'formServiceRequestId31337' => '129a3#1972a189e1d',
)

 

View solution in original post

9 REPLIES 9
SanfordWhiteman
Level 10 - Community Moderator

Serialization outputs : and ; characters which don't interact well with CSV exports.


While the unfamiliarity of PHP serialization is undeniable, can’t see how this claim is defensible.

 

: and ; are child’s play for CSV parsers, as they have no special meaning. (And of course they exist in JSON as well.)

 

Based on your earlier comments, I think you’re using a broken CSV parser.

Jo_Pitts1
Level 10 - Community Advisor

@JoeDownham I have to agree with @SanfordWhiteman here.

I've seen interesting issues in CSV files for a global company with different delimiters being used, and different characters being used to denote 10^3 multipliers in numbers.  An import CSV parser wasn't handling this well at all.  It wasn't the fault of the people generating the CSV files, but the fault of the parser interpreting them.

 

However a well written parser should be able to take configuration details to correctly parse this regardless of geography.

 

I wonder - where in the world are you?  If you generate a CSV file by doing a Save-As in Excel what do you see as delimiters?

 

Cheers

Jo

SanfordWhiteman
Level 10 - Community Moderator

I've seen interesting issues in CSV files for a global company with different delimiters being used, and different characters being used to denote 10^3 multipliers in numbers.  An import CSV parser wasn't handling this well at all.  It wasn't the fault of the people generating the CSV files, but the fault of the parser interpreting them.


I’fairness that problem starts with Excel still calling something a CSV even though it uses different regional delimiters. An RFC 4180-only parser is right to be confused by that.

 

Excel seems to treat “CSV” kind of like “KFC” or “SAT” where the “comma” is no longer literal. 😑

Jo_Pitts1
Level 10 - Community Advisor

Agreed.  But I'm just wondering if @JoeDownham has set up their parser to account for Excel-ified CSV files with a ; as the delimiter and then when the parser encounters a RFC 4180 file it is getting confused.

 

Just to be clear, RFC 4180 does specify ONLY a comma as the field delimiter.

 

All this is supposition until we hear back from OP.

 

SanfordWhiteman
Level 10 - Community Moderator

You’re mistaken about RFC 4180 validation. The CSV is RFC-compliant and a proper CSV parser has no problem with it.

 

The attributes field is wrapped in unescaped double quotation marks. It’s not mandatory for a field to be wrapped in dquotes unless it contains characters that would create a delimiter collision. In this case, attributes absolutely does contain such characters, e.g. commas. That’s why it’s wrapped. Once it’s wrapped, the internal commas have no special meaning. It’s not true that “commas are interpreted as field delimiters.” 

 

attributes is a JSON-stringified object. JSON itself contains both unescaped dquotes (") and JSON-escaped dquotes (\"). In both cases the dquote is doubled to "" because that’s how you CSV-escape dquotes. As they’re now CSV-escaped, they do not function as delimiters; they’re the same as any other character. Likewise, as noted above, the commas within the value do not function as delimiters.

 

Note even if attributes were a simple string, all the above would still apply. It so happens that attributes is a JSON object that when parsed, has properties that need to be PHP-unserialized. The PHP-inside-JSON may be distracting, but it doesn’t change the fundamentals: the commas are not interpreted as field delimiters unless your CSV parser is badly broken.

JoeDownham
Level 1

Thanks very much for clarifying — you're absolutely right regarding RFC 4180. I appreciate the correction, and I can see now that I was incorrect on that point. The CSV structure itself is compliant, and I agree that commas inside quoted fields are, in principle, not a problem for a compliant parser.

 

What prompted my original post was the practical issue we encountered when using some standard CSV parsers — notably PHP’s native fgetcsv() and SplFileObject in CSV mode. In our case, rows containing checksumFields inside the Form Fields serialized data led to the parser treating the comma-separated values as separate columns, which caused headers like Email and FirstName to show up unexpectedly as subsequent column values instead of staying within the 'attributes' field. It seems this behaviour arose due to how the inner escaping interacts with parser expectations, despite the CSV being technically valid. I also tried a few other CSV parsing libraries, with similar results (clearly not the right ones!).

 

To clarify, this was using the CSV data exactly as provided by Marketo via the Bulk Activity Extract API — we saved the response directly to a CSV file with no modification and then passed that file into the parser. The problem wasn’t introduced by any intermediate handling.

 

All other activity types have been parsing perfectly with our processes for some time now; it’s only when we started to include Form Fill activity data that the formatting caused our export processing jobs to start failing when such records were present. In my search for a solution, I was incorrectly led to believe that the serialization itself was making the CSV non-compliant. Switching to TSV resolved this issue for us because there’s no delimiter conflict — it’s been a simple and practical solution when dealing with these complex serialized fields.

 

Interestingly, opening the same CSV in Excel works without issue, which reinforces your point that the export conforms to CSV standards.

 

I’m not sure of the benefits of using PHP serialization here in this context, but it definitely adds complexity for anyone trying to work with this data. If this field were JSON-encoded like the rest of the attributes, this wouldn’t really be an issue at all.

SanfordWhiteman
Level 10 - Community Moderator

What prompted my original post was the practical issue we encountered when using some standard CSV parsers — notably PHP’s native fgetcsv() and SplFileObject in CSV mode. In our case, rows containing checksumFields inside the Form Fields serialized data led to the parser treating the comma-separated values as separate columns, which caused headers like Email and FirstName to show up unexpectedly as subsequent column values instead of staying within the 'attributes' field. It seems this behaviour arose due to how the inner escaping interacts with parser expectations, despite the CSV being technically valid. I also tried a few other CSV parsing libraries, with similar results (clearly not the right ones!).

I found it hard to believe a function as venerable as PHP’s fgetcsv() didn’t understand basic CSV escaping.

 

So I put together this bit of PHP that shows it doesn’t have a problem with the Filled Out Form export, provided you pass the right arguments of course:

<?php
$row = -1;
$header;
if (($handle = fopen("sample_export.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, null, ",", "\"", "")) !== FALSE) {
      if(++$row == 0){
         $header = $data;
         continue;
      }
            
      $rowObj = array_combine(array_values($header), array_values($data));

      $attributes = json_decode($rowObj["attributes"]);

      $formFields = unserialize($attributes->{"Form Fields"});

      echo "Form fields for data row ${row}" . PHP_EOL;
      echo str_repeat("=",80) . PHP_EOL;
      var_export($formFields);
      echo PHP_EOL . PHP_EOL;
      
    }
    fclose($handle);
}

 

Sample output:

Form fields for data row 1
================================================================================
array (
  'module' => 'leadCapture',
  'action' => 'save2',
  'Unsubscribed' => 'yes',
  'Email' => 'test1@example.com',
  'formid' => '1357',
  'lpId' => '219',
  'subId' => '203',
  'munchkinId' => '410-XOR-673',
  'lpurl' => '//pages.vaneck.com/AUUnsubscribe.html?cr={creative}&kw={keyword}',
  'cr' => '',
  'kw' => '',
  'q' => '',
  '_mkt_trk' => 'id:410-XOR-673&token:_mch-vaneck.com.au-m9Nu6QH22eHRTnpEdjMP79qT5DFVtjbq',
  'formVid' => '1357',
  'mkt_tok' => 'G0rMYjuuJ3XgcjtB0WhKetYzKqQhwG39bjxPyzbC9jzvbMFiLtLCM9bYKgetx4Yed3NXNm9daAxGM3GVSiq5emxHbJeEV7Mr5VEHEcSS5RbFBUm4xrQBDuSRKi26gYq886R',
  '_mktoReferrer' => 'https://pages.vaneck.com.au/auunsubscribe.html?mkt_unsubscribe=1&mkt_tok=G0rMYjuuJ3XgcjtB0WhKetYzKqQhwG39bjxPyzbC9jzvbMFiLtLCM9bYKgetx4Yed3NXNm9daAxGM3GVSiq5emxHbJeEV7Mr5VEHEcSS5RbFBUm4xrQBDuSRKi26gYq886R',
  'checksumFields' => 'Unsubscribed,Email,formid,lpId,subId,munchkinId,lpurl,cr,kw,q,_mkt_trk,formVid,mkt_tok,_mktoReferrer',
  'checksum' => '757c309b14bc0ced72dc7969d852f77f8970bbcf43d2fbc8cc09a955cb3ae94d',
  'formServiceRequestId31337' => '4498#197294d1252',
)

Form fields for data row 2
================================================================================
array (
  'module' => 'leadCapture',
  'action' => 'save2',
  'Unsubscribed' => 'yes',
  'Email' => 'test2@example.com',
  'formid' => '1357',
  'lpId' => '219',
  'subId' => '203',
  'munchkinId' => '410-XOR-673',
  'lpurl' => '//pages.vaneck.com/AUUnsubscribe.html?cr={creative}&kw={keyword}',
  'cr' => '',
  'kw' => '',
  'q' => '',
  '_mkt_trk' => 'id:410-XOR-673&token:_mch-vaneck.com.au-3SCxxRf7HEf5mJ1ykuaUKQNb16wX9BEb',
  'formVid' => '1357',
  'mkt_tok' => '7FEuRL5uRKXPatf4HbW1yyYKGiQDn4CLivGRRjSgNhkahzC14EzjAgxNExh1pM359hLxh9DHa5mXyTqLCX33pmLA408Mf8ikggauyVb8iZU6UYQm0t6n6R6NGeMMn81k2iN',
  '_mktoReferrer' => 'https://pages.vaneck.com.au/auunsubscribe.html?mkt_unsubscribe=1&mkt_tok=7FEuRL5uRKXPatf4HbW1yyYKGiQDn4CLivGRRjSgNhkahzC14EzjAgxNExh1pM359hLxh9DHa5mXyTqLCX33pmLA408Mf8ikggauyVb8iZU6UYQm0t6n6R6NGeMMn81k2iN',
  'checksumFields' => 'Unsubscribed,Email,formid,lpId,subId,munchkinId,lpurl,cr,kw,q,_mkt_trk,formVid,mkt_tok,_mktoReferrer',
  'checksum' => 'f0766eea05c0ebd5d291a81657912451403964898f2cff659931c197d2a83101',
  'formServiceRequestId31337' => '129a3#1972a189e1d',
)

 

JoeDownham
Level 1

Thanks for clarifying this — your example had made it clear where I went wrong. I hadn’t been explicitly setting the enclosure and escape arguments on fgetcsv(), and because the default settings had been working fine for all the other activity types, I hadn’t considered that they might not be suitable in this case.

 

This has helped me streamline my import process — I’ve been able to remove several of the steps I’d put in place to reformat the data to make it usable, which aren’t needed now with the proper parser setup.

 

One lingering concern is Marketo’s choice to include PHP serialized data inside JSON within a CSV export — especially given this is form fill data and PHP’s own documentation explicitly warns against unserializing user-submitted data (PHP Warning) due to the potential for exploits through object instantiation and autoloading. Of course, this is how the export works today, and I appreciate that’s unlikely to change.

 

I really appreciate you taking the time to explain this so clearly — it’s been very helpful in improving how we handle this data.

SanfordWhiteman
Level 10 - Community Moderator

PHP’s own documentation explicitly warns against unserializing user-submitted data (PHP Warning) due to the potential for exploits through object instantiation and autoloading

Well, this isn’t fully untrusted data, as by definition each field name and value is stringified by the form listener. You can’t force a serialized object in there because the serialized o: will simply become an s:. with literal o and : bytes in it, making it harmless.