Export All Leads and All Activities

Vinz
Level 1

Export All Leads and All Activities

Hello,

 

I need to export regularly (every month) into excel all the leads and activities that we have since the beginning of the year.

I am seeking for advice regarding the best option (most efficient and best performing) to handle this.

 

So far i am handling it with a python script that is extracting these data with the bulk API.

However it runs quite slowly (5-10 min) because of the 30-day batches that needed to be implemented to make it work.

I am wondering whether there is a better option. We don't have so many leads or activities (below 100k per year).

Would you have any idea/suggestion?

 

Thanks for your support

Vinz

 

3 REPLIES 3
SanfordWhiteman
Level 10 - Community Moderator

Re: Export All Leads and All Activities

Why are you exporting the same Leads over and over? This seems like a poor design.

 

Similarly, while Activities do need to be reconciled with back-dated, recently-promoted Munchkin sessions, this (a) only applies to web activities, and (b) you still only need to go back 90 days.

 

Not sure why 5-10 minutes once a month is bad. What’s the actual concern here?

Vinz
Level 1

Re: Export All Leads and All Activities

Hi SanfordWhiteman

 

Thanks for taking the time to respond. I appreciate your insights.

I understand your concerns about the design and the frequency of exporting leads. However, I believe that for the scale of data I’m working with—just a few hundred or thousand rows—this shouldn’t pose a significant issue. My main goal is to ensure that the process is efficient, especially since I expect the export to run in less than 15 seconds.

Could you share what methods or best practices you recommend for exporting data effectively? Any specific tools or approaches that have worked well for you would be greatly appreciated.

Thanks again for your help!

 

SanfordWhiteman
Level 10 - Community Moderator

Re: Export All Leads and All Activities

If you’re using the Bulk Extract API, it’s just not feasible to get a year of data in 15 seconds.

 

And this stays true even if you follow the correct practice of only getting the past 1 month of leads, the past 1 month of append-only activities, and past 3 months of possibly backfilled activities.

 

Even with just those 5 jobs, the Bulk Extract API’s Create-Enqueue-Poll-Download process simply isn’t designed to deliver instant-on streams of low-volume data; it’s designed to deliver high-volume data to large files on disk with a few simple API requests.

 

In contrast, the paginated Bulk Export API can consumes a massive amount of calls and requires more client-side logic to achieve the same result. For very small lead/activity fetches, the paginated API certainly can be faster — for example, getting all the people who filled out a form in the the past 10 days and who are in a certain static List — but you seem to have bound yourself to getting everything from the past year, even if you have 75% of it from last time. So paginated won’t help you.

 

I suggest pursuing caching strategies so you don’t download the same data over and over when it’s guaranteed not to have changed.