Re: Java SOAP API - getMultipleLeads hangs even with small batch sizes

Jon_Wu
Level 4

Java SOAP API - getMultipleLeads hangs even with small batch sizes

getMultiple leads is hanging for me even with a batch size of 1 or 50 vs 1000. It simply never returns.

I'm trying to get all leads so that I can deduplicate them all. I have logic to grab leads in batches of 1000 where I only return 2 fields for the best performance. This was working fine in May when I wrote it, but that's when I only had about 40k leads vs 500k now.

I'm setting up my request like this:
// Create Request
    ParamsGetMultipleLeads request = new ParamsGetMultipleLeads();
    // Request Using LastUpdateAtSelector
    ////////////////////////////////////////////////////////
    LastUpdateAtSelector leadSelector = new LastUpdateAtSelector();
    GregorianCalendar since = new GregorianCalendar();
    since.set(2014, Calendar.MAY, 12);
    DatatypeFactory factory = null;
    try {
      factory = DatatypeFactory.newInstance();
    } catch (DatatypeConfigurationException e) {
      LOG.error("Could not create DatatypeFactory", e);
    }
    leadSelector.setOldestUpdatedAt(factory.newXMLGregorianCalendar(since));
    request.setLeadSelector(leadSelector);

ArrayOfString attributes = new ArrayOfString();
attributes.getStringItems().add("Email");
attributes.getStringItems().add("PersonContactId");
request.setIncludeAttributes(attributes);
It appears that the last updated at selector is what's causing an issue here, but it also looks like using that is the only way to fetch all leads. If I set the LastUpdateAtSelector to a few days ago everything works, but when I set it to something like May 2014 so I can get all my leads, it doesn't work.

Any ideas?

This sounds a bit like https://community.marketo.com/MarketoDiscussionDetail?id=90650000000PiMkAAK which was lacking a response.
Tags (1)
4 REPLIES 4
Anonymous
Not applicable

Re: Java SOAP API - getMultipleLeads hangs even with small batch sizes

I would try adding an oldestUpdateAt selector, and see if you performance improves. Please see this blog post.
http://developers.marketo.com/blog/find-leads-updated-on-specific-date-ranges/
Jon_Wu
Level 4

Re: Java SOAP API - getMultipleLeads hangs even with small batch sizes

Hi Murtza. Thanks for the quick response.

That's exactly what I'm using right now and the API just hangs. See in my post above I have the following in my example:
    leadSelector.setOldestUpdatedAt(factory.newXMLGregorianCalendar(since));
As is, it's impossible for me to get all the data I need due to this issue.

If I set oldest updated at to something a few days ago it's all good but anything much further back and the API doesn't respond. Even if it did work with oldest updated at in November it wouldn't help me since I need all leads ever created to find all duplicates dynamically.
Jep_Castelein2
Level 10

Re: Java SOAP API - getMultipleLeads hangs even with small batch sizes

Jon, you want to make sure that you download a smaller number of Leads, regardless of batch size. The way to do this is to set a start and end date (not just the start date) so that each call has a maximum of - say - 50k results. So - for example - first you download all Leads created in May, then in June, etc. Just make sure that you split up large data loads. So if 300k Leads were created in 1 day, split that up in 'morning', 'afternoon' and 'evening' or even smaller time periods. 

Alternatively, you can also use the manual export via Smart List in the UI, but - for faster export - it's also recommended to split it up in smaller batches (e.g. 100k per smart list). 

Generally, the REST API is faster than the SOAP API, so that's also something that you could try. 
Jon_Wu
Level 4

Re: Java SOAP API - getMultipleLeads hangs even with small batch sizes

Hi Jep,

Thanks for the info. I'm now trying to split up batches by using both sinceLatest and sinceOldest with a 1 day span. I will then try to parallelize this by getting leads from each day in batches, then combining all those into 1 big list to deduplicate everything. However, I'm getting unexpected results.

I'm also looking for an automated solution that I can keep running so ideally I'm not going to do any manual export since that would be hard to do continuously. We need batch deduplicaton because it looks like there are some platform bugs that cause our webhooks to fail. We try to deduplicate leads in real time as duplicates are created, but the API has some bugs that cause this to fail some small percentage of the time.

A few questions:

1. What date is the LastUpdateAtSelector selector using? I'd expect it to use the updated date that I see in the Marketo UI. However, for something like this:
sinceOldest.set(2014, Calendar.AUGUST, 1);
sinceLatest.set(2014, Calendar.AUGUST, 2);

I get leads with dates such as the following:
Created:
Apr 8, 2014 11:01 PM
Updated:
Dec 21, 2014 10:36 PM

Created:
Apr 8, 2014 11:00 PM
Updated:
Aug 12, 2014 9:30 PM

Created:
Aug 11, 2014 8:37 PM
Updated:
Aug 11, 2014 9:24 PM

What's going on here? The filter is definitely working in that it filters out a few leads at once and some dates return nothing, but the dates on the leads themselves don't make sense.

2. Is there a limit to how many concurrent connections the API can handle at once? For example, if we've had Marketo running for 500 days, could I make 500 requests at once? I seem to be hitting some limits now but it's really trial and error.

Thanks for all your help!