Lead API bottleneck & multiple threads

Highlighted
Anonymous
Not applicable

Lead API bottleneck & multiple threads

We’re working on an automatic creation/update script for Leads in Marketo. Currently, we pull our data and decide to send an update or create request. I batch these in groups of 300 or payload size of ~1MB to comply with the API restrictions. We wound up going with option A in https://nation.marketo.com/thread/40577-syncing-data-with-marketo.

We’ve come across a case of needing to periodically update every record in Marketo from our data in our CRM, such as adding or deleting entire columns. The CSV upload functionality in Marketo won’t work for this situation as we need to maintain data so we need to use the API.

Currently, my script takes about 1 minute to process 1,000 records (give or take a few seconds). Roughly 95% of that time is spent sending that data to Marketo and waiting for the response so we can process.

So, we’ve been talking about running multiple threads of the script at the same time to get more data inserted and updated (it’s written in PHP to conform with the rest of our system).

If we do a modulo operation on our audit ID, we can break the records in to sets of 1-10. We then have 10 cron entries each passing which modulo set to use for that thread.

Assuming we maintain the ability to process 1,000 records per minute, we’ll then need to handle the 100 calls per 20 seconds. Each call has a max of 300 records so we can process a max of 3,000 per 20 seconds, or 9,000 per minute. This is all assuming peak performance which I’m not 100% sure of yet.

My first question is, is there any way to improve Marketo’s response time? What’s the bottleneck on that end?

Secondly, this solution adds a lot of complexity to an already complex process so I’m wondering if anyone else has experience with something along these lines and what their solution to it was. It’s extremely important that when doing these large data updates we continue to process updates from in the system as well.

9 REPLIES 9
Highlighted
Marketo Employee

Re: Lead API bottleneck & multiple threads

>The CSV upload functionality in Marketo won’t work for this situation as we need to maintain data so we need to use the API.

Would you care to explain more why this won't work?

>My first question is, is there any way to improve Marketo’s response time? What’s the bottleneck on that end?

The biggest bottleneck for throughput of lead synchronization to Marketo is the number of custom fields included in each record in the request.  Are you seeing issues with response times, or is this a precautionary question?

Highlighted
Anonymous
Not applicable

Re: Lead API bottleneck & multiple threads

Thanks for your reply!

Would you care to explain more why this won't work?

We can't use the CSV functionality because we can't specify the Marketo Lead ID as the key for list imports (or at least that's what we've been told).

The way we are using Marketo is a little complicated and atypical - we have contacts with N number of email addresses in our CRM that we will use with varying situations in Marketo. Our solution to this is to allow duplicate leads in Marketo's DB all sharing a Contact ID from our CRM. The script manages this by maintaining a map of 1:M ContactID:MarketoID and updating all leads in Marketo attached to that Contact ID when our contact data changes.

The biggest bottleneck for throughput of lead synchronization to Marketo is the number of custom fields included in each record in the request. Are you seeing issues with response times, or is this a precautionary question?

This is essentially a precautionary question as we're trying to figure out the best way to handle the whole situation. We currently have 90 custom columns in our sandbox - I expect that to go down when we start using production but I'm unsure of where we're going to land with that.

Our sync script is something that runs on a schedule and pulls data from a data warehouse table - this table is maintained outside of our CRM using the CRM's databases. The case I mentioned above where we need to update mass quantities of fields is why I'm asking - the way we've done this in the past with another marketing/email provider was with a CSV load which essentially wiped all our data and started over. We want to avoid that at all costs, and since Marketo's CSV importer is off the table, we thought we would write our own using the API.

However, the sync script seems fast enough that we could just use that but we'd still have data changing that we need to quickly get up to Marketo. So the thought was to have several instances of the sync script fire at the same time on schedule to speed up the entire operation.

Our current account limitations allow for up to 500k leads I believe and we'll probably be increasing that as time rolls on. So, down the road, if we need to add a column for every lead in Marketo we could have a very large number to update and we'll need to maximize things as much as possible.

I realize this is a complicated setup and I may not have explained it very well. Happy to elaborate on things further.

Highlighted
Marketo Employee

Re: Lead API bottleneck & multiple threads

>We can't use the CSV functionality because we can't specify the Marketo Lead ID as the key for list imports (or at least that's what we've been told).
Id should be a valid target for the lookupField in import leads:https://developers.marketo.com/rest-api/endpoint-reference/lead-database-endpoint-reference/#!/Bulk_...   If that's what's keeping you from using it, then I don't think you'll have issues.  Did you see this in a doc?  I can go correct it if you share.

Highlighted
Anonymous
Not applicable

Re: Lead API bottleneck & multiple threads

It was a question we had recently with our technical consultant over the phone. This person didn't know the answer during the call but later sent an email stating it wasn't possible:

Can the lead ID be used as the key for List imports?

Response: No, this isn't possible through the CSV import list.

To be clear, we are specifying 'id' in lookupFields for all of our updates to the lead API in our sync script already as that is what we're treating as the unique data point. What I'm talking about above is located here: Import a List of People - Marketo Docs - Product Docs.

Do we just have our wires crossed or are we able to actually use that ID in the CSV?

The whole idea here is so that non-programmers can manage this data without the programming team's help. Since we were told we couldn't do that, we decided to handle everything ourselves through the API. (We could initially use the CSV import once, but any subsequent updates to the Marketo DB structure would have to be handled through the API so our thinking was to just do everything in that regard.)

Highlighted
Marketo Employee

Re: Lead API bottleneck & multiple threads

Sounds just like a miscommunication.  The UI list import doesn't provide lookup field selection, but the Import Leads API does, so you would be able to submit these with id as your lookup field if you're using an API client, but not if you're uploading lists in the UI.

Highlighted
Level 10 - Community Moderator

Re: Lead API bottleneck & multiple threads

Michael's saying they would've used the user UI as opposed to the API of they could.

Highlighted
Anonymous
Not applicable

Re: Lead API bottleneck & multiple threads

That is correct.

Highlighted
Level 10 - Community Moderator

Re: Lead API bottleneck & multiple threads

Assuming we maintain the ability to process 1,000 records per minute, we’ll then need to handle the 100 calls per 20 seconds. Each call has a max of 300 records so we can process a max of 3,000 per 20 seconds, or 9,000 per minute.

Would actually be 30,000 (300 leads * 100 calls) per 20 seconds or 90,000 per minute.  But that's not going to be practically achievable anyway.

Like Kenny, I wonder why you say the CSV import won't work.

Also... the sanity of doing updating every lead daily via API naturally depends on your number of leads. What's your count + rate of growth? Also, when you say "adding... entire columns" do you mean setting the same value for every lead in the db?  What's the final application of this value (token in emails, other Smart Lists, etc.)?

Highlighted
Anonymous
Not applicable

Re: Lead API bottleneck & multiple threads

Would actually be 30,000 (300 leads * 100 calls) per 20 seconds or 90,000 per minute.  But that's not going to be practically achievable anyway.

Quite right, that was back of the envelope math I threw in there at the last second. Should have double checked or left it out.

What's your count + rate of growth?

I'm not entirely sure what our numbers are going to be initially. I believe we'll have somewhere around 30,000 but will quickly ramp up to much more once different phases in marketing start.

Also, when you say "adding... entire columns" do you mean setting the same value for every lead in the db?  What's the final application of this value (token in emails, other Smart Lists, etc.)?

This value could be anything but I really doubt we'd have the same values for every lead. I would imagine it would be different values for each field. We have a TON of data in our CRM spread across multiple databases that get condensed in our BI warehouse. The sync script is written to just handle whatever is in there and send to Marketo so there wouldn't be code-level updates required to do a significant change in Marketo. IE, someone adds a column in Marketo, BI changes the warehouse table to reflect and off goes the script happily updating everything.