9 Replies Latest reply on Jul 18, 2017 1:52 PM by Michael Tyson

    Lead API bottleneck & multiple threads

    Michael Tyson

      We’re working on an automatic creation/update script for Leads in Marketo. Currently, we pull our data and decide to send an update or create request. I batch these in groups of 300 or payload size of ~1MB to comply with the API restrictions. We wound up going with option A in https://nation.marketo.com/thread/40577-syncing-data-with-marketo.

       

      We’ve come across a case of needing to periodically update every record in Marketo from our data in our CRM, such as adding or deleting entire columns. The CSV upload functionality in Marketo won’t work for this situation as we need to maintain data so we need to use the API.

       

      Currently, my script takes about 1 minute to process 1,000 records (give or take a few seconds). Roughly 95% of that time is spent sending that data to Marketo and waiting for the response so we can process.

       

      So, we’ve been talking about running multiple threads of the script at the same time to get more data inserted and updated (it’s written in PHP to conform with the rest of our system).

      If we do a modulo operation on our audit ID, we can break the records in to sets of 1-10. We then have 10 cron entries each passing which modulo set to use for that thread.

       

      Assuming we maintain the ability to process 1,000 records per minute, we’ll then need to handle the 100 calls per 20 seconds. Each call has a max of 300 records so we can process a max of 3,000 per 20 seconds, or 9,000 per minute. This is all assuming peak performance which I’m not 100% sure of yet.

       

       

      My first question is, is there any way to improve Marketo’s response time? What’s the bottleneck on that end?

       

      Secondly, this solution adds a lot of complexity to an already complex process so I’m wondering if anyone else has experience with something along these lines and what their solution to it was. It’s extremely important that when doing these large data updates we continue to process updates from in the system as well.

        • Re: Lead API bottleneck & multiple threads
          Kenny Elkington

          >The CSV upload functionality in Marketo won’t work for this situation as we need to maintain data so we need to use the API.

          Would you care to explain more why this won't work?

           

          >My first question is, is there any way to improve Marketo’s response time? What’s the bottleneck on that end?

          The biggest bottleneck for throughput of lead synchronization to Marketo is the number of custom fields included in each record in the request.  Are you seeing issues with response times, or is this a precautionary question?

            • Re: Lead API bottleneck & multiple threads
              Michael Tyson

              Thanks for your reply!

               

              Would you care to explain more why this won't work?

              We can't use the CSV functionality because we can't specify the Marketo Lead ID as the key for list imports (or at least that's what we've been told).

               

              The way we are using Marketo is a little complicated and atypical - we have contacts with N number of email addresses in our CRM that we will use with varying situations in Marketo. Our solution to this is to allow duplicate leads in Marketo's DB all sharing a Contact ID from our CRM. The script manages this by maintaining a map of 1:M ContactID:MarketoID and updating all leads in Marketo attached to that Contact ID when our contact data changes.

               

              The biggest bottleneck for throughput of lead synchronization to Marketo is the number of custom fields included in each record in the request. Are you seeing issues with response times, or is this a precautionary question?

              This is essentially a precautionary question as we're trying to figure out the best way to handle the whole situation. We currently have 90 custom columns in our sandbox - I expect that to go down when we start using production but I'm unsure of where we're going to land with that.

               

              Our sync script is something that runs on a schedule and pulls data from a data warehouse table - this table is maintained outside of our CRM using the CRM's databases. The case I mentioned above where we need to update mass quantities of fields is why I'm asking - the way we've done this in the past with another marketing/email provider was with a CSV load which essentially wiped all our data and started over. We want to avoid that at all costs, and since Marketo's CSV importer is off the table, we thought we would write our own using the API.

               

              However, the sync script seems fast enough that we could just use that but we'd still have data changing that we need to quickly get up to Marketo. So the thought was to have several instances of the sync script fire at the same time on schedule to speed up the entire operation.

               

              Our current account limitations allow for up to 500k leads I believe and we'll probably be increasing that as time rolls on. So, down the road, if we need to add a column for every lead in Marketo we could have a very large number to update and we'll need to maximize things as much as possible.

               

              I realize this is a complicated setup and I may not have explained it very well. Happy to elaborate on things further.

            • Re: Lead API bottleneck & multiple threads
              Sanford Whiteman

              Assuming we maintain the ability to process 1,000 records per minute, we’ll then need to handle the 100 calls per 20 seconds. Each call has a max of 300 records so we can process a max of 3,000 per 20 seconds, or 9,000 per minute.

              Would actually be 30,000 (300 leads * 100 calls) per 20 seconds or 90,000 per minute.  But that's not going to be practically achievable anyway.

               

              Like Kenny, I wonder why you say the CSV import won't work.

               

              Also... the sanity of doing updating every lead daily via API naturally depends on your number of leads. What's your count + rate of growth? Also, when you say "adding... entire columns" do you mean setting the same value for every lead in the db?  What's the final application of this value (token in emails, other Smart Lists, etc.)?

              1 of 1 people found this helpful
                • Re: Lead API bottleneck & multiple threads
                  Michael Tyson

                  Would actually be 30,000 (300 leads * 100 calls) per 20 seconds or 90,000 per minute.  But that's not going to be practically achievable anyway.

                  Quite right, that was back of the envelope math I threw in there at the last second. Should have double checked or left it out.

                   

                  What's your count + rate of growth?

                  I'm not entirely sure what our numbers are going to be initially. I believe we'll have somewhere around 30,000 but will quickly ramp up to much more once different phases in marketing start.

                   

                  Also, when you say "adding... entire columns" do you mean setting the same value for every lead in the db?  What's the final application of this value (token in emails, other Smart Lists, etc.)?

                  This value could be anything but I really doubt we'd have the same values for every lead. I would imagine it would be different values for each field. We have a TON of data in our CRM spread across multiple databases that get condensed in our BI warehouse. The sync script is written to just handle whatever is in there and send to Marketo so there wouldn't be code-level updates required to do a significant change in Marketo. IE, someone adds a column in Marketo, BI changes the warehouse table to reflect and off goes the script happily updating everything.