I am admittedly posting this prior to looking for documentation - as the community (especially Sanford) has been really helpful in helping us think of solutions to a few of our problems so far. For today's issue, I present to you the case of the growing database. Let's say for example that we have 950K records in Marketo, and because of the way we are loading records in via API - that number grows quickly to over 1M. Oh no! We now owe Marketo lots of money as we have entered a new payment tier.
Then (totally for example!) we decide a good way to get back below the payment threshold. We are going to run some jobs that delete user records for leads that hard bounce, opt out of our email program, unsubscribe (because of durability), etc. It works! Everything is going smoothly - everyone is chipper. We are below the payment threshold.
Then, an obstacle! We have a client call in and request an audit of a program they were enrolled in. Easy enough right? Until we realize that the data they are requesting has been deleted as a result of our new deletion policy. So, the question becomes:
How can we keep those records in order to have the data necessary for emergency audits, historical reports, etc. - without having them count against our known lead count? We already store some key reporting metrics elsewhere - so our main use case is really based on the ability to look back, audit, and be able to re-create the story of what really happened. Any ideas?
What we are doing is we are exporting all non-marketable leads into an excel sheet with all columns and save the excel sheet by date when the data is exported. I would also like to know what can be other good options here.
Glad you asked about lead offboarding strategies because I've been meaning to write a blog post about this area.
The simplest offboarding can be accomplished by sending lead data to a Google Sheet via a webhook, just before deletion. Upon later reentering the system, call the companion 'hook to look up in the existing sheet and pull the old data back in.
But this only applies to flat lead fields and, potentially, custom object records attached to the lead. It doesn't reproduce the program history, activity log, and so on. If you need to recapitulate the entire history of the lead, exactly as you would've seen in their Activity Log, that's unfortunately impossible. Even if you export the entire program tree structure and "entire" log (which isn't actually entire) into a set of timeseries and hierarchical databases, you can't repro the whole thing. (Don't listen to anyone who tells you otherwise.)
Can you tell me what you mean by "an audit of a program" exactly?
Do you have any documentation of on how to send the data to google excel sheet via webhook, I tried searching on your blog but couldn't find one.
Thanks for your response Sanford. I probably used the word 'program' poorly in my original post.
We are a bit unique (I think), because we offer a B2C email service to our B2B customers.
This is not actually our business model - but, imagine if you were a masseuse, and you signed up for a service that markets massage balls, foam rollers, yoga mats and various other health products to your customers and sells them on your behalf. If one of those customers comes to you upset about getting the emails, or questioning how you are using their data, or if you find out an email went out that you did not want to go out - you may request an audit from the 'service' provider saying, "how many emails did you send to how many people? How many of them engaged with those emails and how many products did you sell?"
That's an example of something that may warrant an audit. Now imagine if you decided to turn off the service - making all of your customers non-marketable (in which case they are subject to deletion to minimize known lead count) - and then requested an audit. If the non-marketable leads are gone - the audit cannot be completed - and the business relationship takes a huge hit.
Let me know if that helps.
Curious about what was turned into "massage *****".
You've clarified that you don't mean programs in the Marketo sense, so it's a lot easier to mirror the relevant data. If a core set of activities that are exposed via API (Sent Email, Visited Web Page, Had Interesting Moment) are needed for these leads, then create a static list like Tombstoned Leads and move people into it when it's time for them to go. Every night, download the list members, plus the Activity Log filtered by just that listId, via API. Delete a lead once you have its leads + logs safely on your system.
The next move might seem to be transforming and loading the exported streams from their native JSON into a relational database. But that may not be necessary. First, a document database and a timeseries database are better suited to JSON that's keyed on the lead, rather than an RDBMS. But more important, forensic/discovery-type applications don't need to be backed by a database in any of the traditional senses. They can use the raw text (JSON) files plus a continuously updated index: see a product like dtSearch. This would be the lightest-weight form of such an emergency audit tool.
(I purposely steered clear of talking about a general-purpose data warehouse because it's not tailored to this topic.)
Thanks a ton for your notes and response. I'm not well suited to carry the conversation forward because we don't have the technical resources to carry this out. We do understand what you are saying and appreciate the time you put in to the discussion.