19 Replies Latest reply on Apr 30, 2018 2:23 PM by Andrey Bulgakov

    Bulk Lead Extract API questions

    Kurt Koller

      Working with the Bulk API, have a few questions about it.

       

      1. What's the easiest way to just get all leads for an initial sync via the bulk API? You need to have a filter. Say I have an instance with 150k leads created over 3 years. Do I need to do 30 day bulk API requests starting at some arbitrary old date to get all leads? This needs to be done through the API. It seems that in this scenario using the non-bulk Rest API with a paging token would be faster (see #3) even though it would result in more calls. This is what we are currently doing but we'd hoped to replace it with the Bulk API. (If this is how this must be would LOVE to see the 30 limit upped to 31 days so we can logically pull by calendar month)

       

      2. (minor) Why does the Bulk API return the word "null" in a missing filed in a CSV? In a CSV a missing/null value is just missing, not the word "null". Here's an example:

       

      id,email,company,sfdctype,postalcode,inferredpostalcode,inferredmetropolitanarea,country,inferredcountry,persontype

      408,me@example.com,MyCompany,Contact,null,null,null,United States,null,contact

       

      In every other CSV scenario I've worked in, this would be represented as:

       

      id,email,company,sfdctype,postalcode,inferredpostalcode,inferredmetropolitanarea,country,inferredcountry,persontype

      408,me@example.com,MyCompany,Contact,,,,United States,,contact

       

      3. Why does the startedAt date change/why the time weirdness here? I created a lead export of 1 month of leads filtered by createdAt. Here are the stages:

       

      Create:

      {

         "fields": ["id", "email", "company", "createdAt", "updatedAt","unsubscribed", "emailInvalid", "originalSourceType", "sfdcType","postalCode", "inferredPostalCode",

      "inferredMetropolitanArea","country", "inferredCountry", "personType"],

         "format": "CSV",  "filter": {  "createdAt": {

               "startAt": "2017-06-01T00:00:00Z",  "endAt": "2017-07-01T00:00:00Z"  }   }

      }

       

      Response:

       

      {

          "requestId": "bcfe#15d53739e6b",

          "result": [

              {

                  "exportId": "b70642a3-8a89-4bf1-a441-24b214096b78",

                  "format": "CSV",

                  "status": "Created",

                  "createdAt": "2017-07-18T02:07:52Z"

              }

          ],

          "success": true

      }

       

      Status after enqueue:

       

      {

          "requestId": "1796#15d53747d57",

          "result": [

              {

                  "exportId": "b70642a3-8a89-4bf1-a441-24b214096b78",

                  "format": "CSV",

                  "status": "Queued",

                  "createdAt": "2017-07-18T02:07:52Z",

                  "queuedAt": "2017-07-18T02:08:49Z"

              }

          ],

          "success": true

      }

       

      Status after change from "Queued" to "Processing":

       

      {

          "requestId": "16721#15d5374f1fe",

          "result": [

              {

                  "exportId": "b70642a3-8a89-4bf1-a441-24b214096b78",

                  "format": "CSV",

                  "status": "Processing",

                  "createdAt": "2017-07-18T02:07:52Z",

                  "queuedAt": "2017-07-18T02:08:49Z",

                  "startedAt": "2017-07-18T02:09:00Z"

              }

          ],

          "success": true

      }

       

      Status after complete:

       

      {

          "requestId": "17c12#15d53bbac87",

          "result": [

              {

                  "exportId": "b70642a3-8a89-4bf1-a441-24b214096b78",

                  "format": "CSV",

                  "status": "Completed",

                  "createdAt": "2017-07-18T02:07:52Z",

                  "queuedAt": "2017-07-18T02:08:49Z",

                  "startedAt": "2017-07-18T03:26:09Z",

                  "finishedAt": "2017-07-18T03:26:25Z",

                  "numberOfRecords": 43,

                  "fileSize": 7389

              }

          ],

          "success": true

      }

       

      Why does the startedAt date change? I polled status.json every minute or so and it was the same as the status above until it finished at which time the startedAt jumped forward.

       

       

      3a. Also under what circumstances would an export of 1 month of lead records out of an instance that has 1600 leads take more than 1 hour 15 minutes? The final result looks like it took 16 seconds but it entered processing way before that. There is no other API usage, no other Bulk API jobs are queued, etc. I've repeated this 3 times in a row.

       

       

      4. There's a note in the API that the filter type updatedAt*
      * Filter type is unavailable for some subscriptions.  Marketo Support can provide you with this information

      How can we find out via the API whether this is available or not? Just call it and process a filter error? Or will it just return an empty set? I need to know what to do when we run against an instance and we have no idea whether they have this feature or not, how would we find/extract leads that are updated since a date through the bulk API? So far I haven't seen this on instances I've tested, but I'd like to know what to do.

       

       

      Any insight is appreciated.

        • Re: Bulk Lead Extract API questions
          David Everly

          1. Correct, using bulk you would have to pull leads in 30 day increments.  I have created an enhancment request to increase the date range to 31 days.

          2. This design choice is to maintain consistency with bulk lead export from within the Marketo UI.  i.e. a precedent had been set

          3. I am unable to reproduce this behavior, I recommend filing a support case with Marketo.

          3a. There are many variables that impact performance in a multi-tenant system.  You might try enqueuing the job during off hours.

          4. You can determine if the instance supports updatedAt by calling Create Export Lead Job endpoint.  If not supported, tou will receive an error 1035, "Unsupported filter type for target subscription".

          1 of 1 people found this helpful
            • Re: Bulk Lead Extract API questions
              Kurt Koller

              David Everly thanks for the response. With #2, I just pulled an export of leads from the UI with the same fields, and none of them contain the word null in them. Sample line from the CSV for my account in our system, note the ,,, and not ,null,null,null:

               

              1013663,kk@digitalpi.com,Digital Pi,2016-04-26 15:13:15,2017-07-27 23:44:17,,,1,Omega,List import

                • Re: Bulk Lead Extract API questions
                  David Everly

                  Slight correction to #2, design decision to be consistent with REST API.  For example, a call to Get Multiple Leads by Filter Type returns null for non-existent lead fields.  Here is an example with non-existent field "leadRole" for a given lead id "318581":
                  GET /rest/v1/leads.json?filterType=id&filterValues=318581&fields=leadRole

                  {

                      "requestId":"15426#15dc7cc2251",

                      "result":[

                          {

                              "id":318581,

                              "leadRole":null

                          }

                      ],

                      "success":true

                  }

                    • Re: Bulk Lead Extract API questions
                      Kurt Koller

                      OK. Here's my $0.02 on this.

                       

                      I'd say that CSV is a loose standard that existed well before JSON became a thing, and I've never seen a CSV implementation where null is returned.

                       

                      The JSON spec says you can have a string, numeric, or null for a value. The parsers can look for this easily, because a string is always quoted.

                       

                      CSV has never worked like this. CSV has unquoted values and quotes anything with a comma or a quote in it. i.e.

                       

                      sometext,123,"Last Try, The"

                       

                      If I try to parse a CSV I have no way to know if null is a string or a null. Sure, that's a rare occurrence but CSV parsers generally will treat null as a string because the loose standard has always had a missing value for a null. The worst parsers will return an empty string instead of a null in a typed language.

                      • Re: Bulk Lead Extract API questions
                        Kurt Koller

                        I will add logic to our parsers to look for a null string and replace it with an actual null in our database. One thing that gets lost is that sometimes people end up with the text "null" in a field in Marketo because of import issues etc, and encoding nulls the way they are currently encoded in bulk export means we can't detect if it's a real null or a text "null"

                          • Re: Bulk Lead Extract API questions
                            Sanford Whiteman

                            The string "NULL" is already a special value in Marketo -- it's a reserved word used to empty fields (though not, confusingly, to set them to the valueless null/NULL).

                             

                            Also, I don't agree that CSVs don't represent NULL values as strings. A SQL dump to CSV must use ,NULL, to set off valueless NULL from 'NuLl' fwiw....

                              • Re: Bulk Lead Extract API questions
                                Kurt Koller

                                Sanford,

                                 

                                The CSV export from the Marketo UI does exactly what I suggest.

                                  • Re: Bulk Lead Extract API questions
                                    Sanford Whiteman

                                    The point is that double-quoted or unquoted

                                     

                                         NULL

                                     

                                    (or any case variation) is very common in the wider world of CSVs, where it explicitly represents a SQL NULL.

                                     

                                    And since Marketo uses SQL on the back end, what's missing here is a distinction between single-quoted values and SQL special values. Basically, the CSV isn't accurately representing the source, in favor of making it look simpler.

                                     

                                    And when you take into account that the string 'NULL', which has no special meaning in SQL, has special meaning in Marketo, it's clear that the rules for clear import/export are out the door.

                                      • Re: Bulk Lead Extract API questions
                                        Kurt Koller

                                        I won't belabor this, but CSV != SQL dump, and again, the Marketo UI CSV dump of the lead data from the UI (same data) does the (correct in my opinion) ,, behaviour and doesn't include the word null for missing data.

                                          • Re: Bulk Lead Extract API questions
                                            Sanford Whiteman

                                            Sure, CSV isn't only for SQL exports, but it's not not a SQL export, either, since the values have no inherent meaning outside of some data domain. Like you said, CSV is a generic format.  I just don't see either variant as correct, because honoring null as a separate value is appropriate (it can be a separate value on the back end) and that's lost in the UI version, while separating empty values from null is also important, and that's lost in the API version. If this were not oversimplified and gave single-quoted strings or unquoted null, everyone would be happy!

                                              • Re: Bulk Lead Extract API questions
                                                Kurt Koller

                                                Sanford,

                                                 

                                                This is the last post I'll make on this, I'm not sure what your motive is for arguing, but to me this is pretty clear.

                                                 

                                                The CSV export of the _same data_ from the UI is, in my opinion, correct. The reason given for including ,null, in my opinion, is not a good one, beccause CSV != JSON. There is indeed a precedent in Marketo, and that precedent for CSV is that ,null, is not used, and that ,, is used. I believe they made a mistake here, and the mistake is that they applied the JSON null representation into CSV when they had a perfectly reasonable CSV implementation for the same data already. Thanks for your counter opinion, but I personally believe it's incorrect, since the existing CSV export that they have represents it the way I'm expecting it here.

                                • Re: Bulk Lead Extract API questions
                                  Kurt Koller

                                  My other response didn't post for some reason. Thanks for #1 request, and thanks for #4. Appreciate all your answers.

                                  • Re: Bulk Lead Extract API questions
                                    Kurt Koller

                                    Hi there David Everly, regarding:

                                     

                                    4. You can determine if the instance supports updatedAt by calling Create Export Lead Job endpoint.  If not supported, tou will receive an error 1035, "Unsupported filter type for target subscription".

                                     

                                    We do this, it's fine, but do you know, will the servers/APIs that currently do not support the updatedAt filter be getting an update to support that in the future?

                                    • Re: Bulk Lead Extract API questions
                                      Patrick Wilson

                                      David Everly wrote:

                                       

                                      1. Correct, using bulk you would have to pull leads in 30 day increments. I have created an enhancment request to increase the date range to 31 days.

                                      Are you able to confirm if this has come into effect yet? I recently did an export that seemed to allow 31 day date range.