I would try adding an oldestUpdateAt selector, and see if you performance improves. Please see this blog post.
Hi Murtza. Thanks for the quick response.
That's exactly what I'm using right now and the API just hangs. See in my post above I have the following in my example:
leadSelector.setOldestUpdatedAt(factory.newXMLGregorianCalendar(since));As is, it's impossible for me to get all the data I need due to this issue.
If I set oldest updated at to something a few days ago it's all good but anything much further back and the API doesn't respond. Even if it did work with oldest updated at in November it wouldn't help me since I need all leads ever created to find all duplicates dynamically.
Jon, you want to make sure that you download a smaller number of Leads, regardless of batch size. The way to do this is to set a start and end date (not just the start date) so that each call has a maximum of - say - 50k results. So - for example - first you download all Leads created in May, then in June, etc. Just make sure that you split up large data loads. So if 300k Leads were created in 1 day, split that up in 'morning', 'afternoon' and 'evening' or even smaller time periods.
Alternatively, you can also use the manual export via Smart List in the UI, but - for faster export - it's also recommended to split it up in smaller batches (e.g. 100k per smart list).
Generally, the REST API is faster than the SOAP API, so that's also something that you could try.
Thanks for the info. I'm now trying to split up batches by using both sinceLatest and sinceOldest with a 1 day span. I will then try to parallelize this by getting leads from each day in batches, then combining all those into 1 big list to deduplicate everything. However, I'm getting unexpected results.
I'm also looking for an automated solution that I can keep running so ideally I'm not going to do any manual export since that would be hard to do continuously. We need batch deduplicaton because it looks like there are some platform bugs that cause our webhooks to fail. We try to deduplicate leads in real time as duplicates are created, but the API has some bugs that cause this to fail some small percentage of the time.
A few questions:
1. What date is the LastUpdateAtSelector selector using? I'd expect it to use the updated date that I see in the Marketo UI. However, for something like this:
sinceOldest.set(2014, Calendar.AUGUST, 1);
sinceLatest.set(2014, Calendar.AUGUST, 2);
I get leads with dates such as the following:
Apr 8, 2014 11:01 PM
Dec 21, 2014 10:36 PM
Apr 8, 2014 11:00 PM
Aug 12, 2014 9:30 PM
Aug 11, 2014 8:37 PM
Aug 11, 2014 9:24 PM
What's going on here? The filter is definitely working in that it filters out a few leads at once and some dates return nothing, but the dates on the leads themselves don't make sense.
2. Is there a limit to how many concurrent connections the API can handle at once? For example, if we've had Marketo running for 500 days, could I make 500 requests at once? I seem to be hitting some limits now but it's really trial and error.
Thanks for all your help!