The main issue I am running into is with the "Get Lead Activities" API:
For example, if I request all of the interesting moments from the past 90 days for a single lead, it takes 14 calls to the API passing new "nextPageToken" until there are no longer "moreResult" = true responses from Marketo.
This is extremely slow, in our testing, takes about 28 seconds.
In my first request, I pass a nextPageToken that is set at 90 days ago, and for my test Lead with id 835054, they have 2 interesting moments in that time period.
So the result set is much less than 300, which is the total max batch size returned in a single response. So a single response should be able to return all Interesting Moment activities for this lead in the last 90 days.
However it appears the Marketo implementation is not paginating by the batchSize parameter, but instead by some arbitrary period of time represented by each "nextPageToken".
This design approach causes the performance issue by requiring us to execute 14 API calls in what could be returned in 1. 12 of these API calls return no data, that's 12 wasted calls against our API quota in addition to the processing time this adds.
Need help to understand if there is anything we can do on our side, using the Marketo REST API, to improve this performance or if there needs to be a design change/enhancement to this API so that it paginates by returned row count, rather than arbitrary time periods.
Marketo seems to have a less limited/better implementation they use on their own UI as they can return this Leads activities in a single call. So not sure why customers are only given the option to this extremely limited API.
Solved! Go to Solution.
If you need to do this kind of work for time ranges of more than a day or so, you should be using Bulk Activity Extract, not the paginated API. Maintain your own offline mirror using the Extract results and query that.
The behavior of the nextPageToken is well-known: it's a cursor through the entire log, not through a filtered subset of the log by only a single activity.
If you need to do this kind of work for time ranges of more than a day or so, you should be using Bulk Activity Extract, not the paginated API. Maintain your own offline mirror using the Extract results and query that.
The behavior of the nextPageToken is well-known: it's a cursor through the entire log, not through a filtered subset of the log by only a single activity.
Why is that an acceptable solution?
That is an awful solution for the API consumers and introduces significant burden on API consumers to store and maintain entire databases and sync processes for ALL activity records to cover the situation where we might need to get the activity details on demand for a single lead.
You mention "The behavior of the nextPageToken is well-known: it's a cursor through the entire log, not through a filtered subset of the log by only a single activity.", its not well known to me, I read the API docs and this behavior was not mentioned or warned as a concern to be aware of. I had to find out by doing direct testing with the API. Your comment says exactly what is the problem, if they know our filter criteria from our API call, why are they querying the database in such a way that they are not using it in the query itself and applying the filter AFTER getting database results, giving this poor API experience. That's a benefit of databases, is to let them do the heavy lifting with your predicates, not apply them after the fact. So this answer only seems to reinforce my concern with the design issues of the API.
Why even offer an API if its so inflexible that the answer is to copy all the data locally and avoid using the API?
You say its not feasible, but it is, they make it work fast in other places.
Their own MSI solution, does not suffer from this issue. It can retrieve these same 2 activities for this lead in a fraction of the time. It does not use the REST API.
Their own web UI, does not use the REST API, it uses https://instance.marketo.com/leadDatabase/queryDetailLogData. Which again, does not suffer from this issue and returns all of this leads activities in under a second.
So why, is the REST API, designed to not leverage the same performance design that their other solutions use? The accepted solution here seems like it should not be "work around it" but rather should be that Marketo fix their API.
Absolutely, this is all well-trodden ground - but whether we agree or not about these frustrations, developers have to deal with the technical reality. Though the API has progressed a lot over the years on the writing side, the reading side emphasizes bulk downloads.
