2 Replies Latest reply on Mar 31, 2014 10:05 AM by 36082

    Speeding Up - getMultipleLeads API Call

      Hi, 
           
           I am refactoring some code that is used to download lead data from Marketo. The code utilizes the getMultipleLeads API call to request the lead data. As you may be aware this call only allows a request up to 1000 lead records at a time: 
           
           http://developers.marketo.com/documentation/soap/getmultipleleads


      We have had to download lead data in the 900K+ ~ 3 Million record range. As you can see this could be a very slow process...
           
           
           Problem:

           Speed up the time it takes to download large recordsets from Marketo via the getMultipleLeads API Call
           
           
           Solution:

           Use multi-threading to make concurrent getMultiLeads API Calls


      My Logic:


           1.     Trigger initial getMultipleLeads API call and get the size of the available recordset. Let’s use 1,050 as the returned recordset count for this example.


           2.     When multi-threading is running, I want a thread to request 100 records per getMultipleLeads call. I divide 1,050 by 100 and determine 11 threads need to be spawned.


           3.     This is the step that seems costly/inefficient to me at this time. The key piece of information I need in order to make the multi-thread solution work is a streamposition to associate to each thread. I do this by looping over the getMultipleLeads API call by the number of threads I need spawned. In this example 11 ....and then capture and store the returned streampositions in a SQL table... Let’s call it tbl_threadStreamPositions. 

                        Loop#01 -  Returns StreamPosition For Thread#02
                        Loop#02 -  Returns StreamPosition For Thread#03
                        Loop#03 -  Returns StreamPosition For Thread#04
                        Loop#04 -  Returns StreamPosition For Thread#05
                        Loop#05 -  Returns StreamPosition For Thread#06
                        Loop#06 -  Returns StreamPosition For Thread#07
                        Loop#07 -  Returns StreamPosition For Thread#08
                        Loop#08 -  Returns StreamPosition For Thread#09
                        Loop#09 -  Returns StreamPosition For Thread#10
                        Loop#10 -  Returns StreamPosition For Thread#11
                        Loop#11 -  Returns No StreamPosition

           I try to speedup this process by limiting the number of getMultipleLeads attributes requested in the call to 1-3
           Example: FirstName, LastName, Company


           4.     Once the previous step finishes processing my code fires off 11 threads concurrently that have code that pseudo looks like the code below.

           <mkt:paramsGetMultipleLeads> 

                          <if thread <> 1 >     
                               <streamPosition>{{thread streamposition from tbl_threadStreamPositions}}</streamPosition>      
                          <else> 
                          </if>
                                            
                      <includeAttributes>
                                      <loop list="{{list of attributes to request}}" index="local.x">
                                              <stringItem>{{local.x}}</stringItem>
                                           </loop>
           
                           </includeAttributes>
           
           </mkt:paramsGetMultipleLeads>
           
           
           
           Question:

           - Does this logic look sound?
           - Is there a better way to do accomplish this via the API
           - Anyone had any luck using ETL tools like Talend, Jitterbit or Mulesoft to speed up this process?



      Thanks...