Data Dumpster Dive - Six Steps to Trash Bad Data

Not applicable

This article has been updated in collaboration with the Marketing Nation Community Manager and the original author, @Jeff_Coveney3. If you have any questions or comments, feel free to shoot me a private message!  



If your database is overloaded with junk and dead leads then it’s likely that your marketing automation platform database is over the size limit. Bad data in your database is a lot like trash, you have to get rid of it!


Not only does bad data clog your database and hike up your costs and software subscriptions; it makes your marketing efforts ineffective, costing your organization in more ways than one.


In this post, we cover strategy and execution tips to help you get the dirt out of your data and to develop a six-step best practice data retention policy including:


  • Step 1 – Get Rid of Duplicates
  • Step 2 – Eliminate People without Email Addresses
  • Step 3 – Purge Recently Deleted Salesforce Records
  • Step 4 – Delete Disqualified Records
  • Step 5 – Remove the Hard Bounces
  • Step 6 – Delete the Inactive (but only after a Wake The Dead campaign!)



Bad Data Hikes Up Software Costs


“You are over your data limit.” How many times have you received this message from Marketo or other software services?

Most SAAS platforms charge based on record count. As your organization matures, you’ll have data living across CRMs, marketing automation solutions, ABM platforms, and others. If 20-30% of your data is bad, that’s needlessly hiking up the cost of multiple software subscriptions.


Bad Data Kills Marketing Results


Do you have low click-through rates or engagement activity? Does Management always ask why Marketing’s results are lower than industry standards?


Most likely, you are marketing to people that have no need for your service and this process is bringing down your overall results.

For example, we recently worked with a client whose click-through rates hovered around 0.5%–much lower than 1-4% we see for general mailings. The reason of course was that they were mailing to a wide target, including many folks with bad data.


Bad Data Leads Kills Segmentation and Personalization


If your data collection methods are all over the place, there is no good way to segment your database to offer a personalized experience.


For example, if you want to deliver persona-based content to IT executives but aren’t consistently collecting role or title information, you’ll be left with a generalized approach.


Bad Data Slows Down Your System


You know how when you buy a brand-new phone, it’s really fast? Then, six months later, it starts getting slower and slower?

Your marketing automation system and CRM have similar processes. From data syncs to record updates, it takes power to process every record. The bigger your database, the more processing it takes to make major changes —this is especially true for systems that are running inefficiently in the first place.


For example, we recently worked with a client with a 1 million+ database size. A data change on the Salesforce side affected all records and kicked off a data sync to Marketo that took a while to complete. We’ve since worked on a plan to pare that database by 10% to help with these performance issues.


The moral of the story—why perform system updates on data that is bad or outdated?


Before You Get Started – Set Data Strategy


Before you dig into the six-step process, think through some big picture strategies around a holistic data retention policy.

  • Are there any regulations to consider (e.g. Financial)? Does your industry have a time-limit for keeping, or expunging, data?
  • How long is right for your company from a reporting perspective? One year? Two years? Keep in mind that your retention policy affects your reporting intelligence–once the data is gone, so is the reporting in most cases. Some companies elect a general policy of fifteen months to ensure a rolling four quarters of intelligence. You might also vary your policy on the types of data. For example, spam data might have a policy of one month while disqualified data might have a retention policy of fifteen months.
  • What is your risk tolerance? Some companies keep EVERYTHING while others like to purge more often.
  • What’s your philosophy on appending? Think about whether or not you want to delete or run your aging data through an appending service like Oceanos, LeadSpace, ZoomInfo and others. (Check out Oceano’s no-cost data assessment which we use frequently with our clients.)
  • Do you value the data from hard bounces?
  • What are your data backup plans? Before you delete, have a backup policy in place to retain those deleted records.
  • What is your account-based strategy? If you have a heavy account-based focus, consider those factors into your data deletion policies. For example, if a VP-level person has bounced, you may want to leverage that intelligence to determine account coverage.
  • For organizations using Salesforce, do you treat Lead records differently than Contact records? For example, Lead records are less risky to delete. A Contact record might have an Opportunity attached to it.



Step 1 – Get Rid of Duplicates


Duplicates are great with babies but not with your database.


The first step is to identify the duplicates by performing a quick assessment. If using a solution like Marketo, run a list of likely duplicates with the built-in Smartlist that identifies multiple records that share the same email address.


Marketo offers a one-time deduplication service. If you want to perform the one-time deduplication yourself or with a partner. We like the DemandTools solution because it helps you figure out all the business logic from a Salesforce perspective which is usually the system of record (e.g. A duplicate exists as two Contacts with two different Opportunities). RingLead is another option.


For ongoing deduplication management, check out Validity’s DupeBlocker and RingLead which help prevent duplicates on an ongoing basis.


Make sure to read Josh Hill’s Deduping Leads in Marketo for other deduplication considerations.


Step 2 – Eliminate People without Email Addresses


In your marketing automation platform, records without email addresses are virtually useless. You can’t email them and the records increase the possibility for future duplicates.


Quick Win Candidate – If there is one step that is easy to adopt, getting rid of people without email addresses is an option to consider. As a first pass, focus on Salesforce Lead records. Contact records are more complex since they may have Opportunities associated with them.

We saw one client upload 10K+ names to its database without email addresses which spiked up its database limit and causing a letter from Marketo. This step identified those historic records and deleted them out. Sales representatives can also add records without email addresses to your CRM which is a bad process.

A solution involves identifying these records and deleting them on a one-time and ongoing basis.


Step 3 – Purge Recently Deleted Salesforce Records


By default, records that are deleted in Salesforce are NOT deleted in Marketo. On one hand, this serves as a nice backup in case a Sales representative deletes a bunch of data in Salesforce. On the other hand, this deleted data lives in Marketo forever unless you do something specific to delete these records.


Quick Win Candidate – With a simple campaign, clear out your deleted Salesforce records in Marketo on a regular basis.


Step 4 – Delete the Disqualified


This step seems easy but it’s not. If Sales or Marketing has disqualified a record, it might be time to delete it. I say “might” because sometimes a Sales representative disqualifies a record when it should really be recycled. Or, it’s possible that your Disqualified lead lifecycle process is not quite ready for prime time.


Do you trust Disqualified data inputted by Sales reps? If the answer is “No,”, you’ll need to work on a consistent lead lifecycle process with your Sales team.

This step involves solidifying a Disqualified process with your Sales team so you can manage the data appropriately. You’ll also want to consider adding Disqualified Reasons to your sales process to get a more granular understanding of the data. For example, if a record is Disqualified with a Disqualified Reason of Spam, that’s different than a Disqualified Reason of No Current Budget.


Step 5 – Remove the Bounces


If someone has bounced email, these should be easy candidates to delete, right? It’s not so simple. You might want to keep these around to maintain some reporting intelligence. You might also want to mail a few last times to mine data from those returned emails.


Additionally, you might have a bounce that indicates a spam bounce rather than a no-longer-there bounce–you need to be careful to distinguish between the two.


Deliverability Best Practice: Put an automated campaign in place in Marketo to manage bounced emails. The program reviews your bounces for addresses/domains that are fake.


Step 6 – Delete the Inactive – The Big Sweep


How long should the couch potatoes live in your database without activity before you delete?


If you set your other five filters as conservative, these inactive criteria can serve as the final sweep. We’ve seen companies use very complex criteria to define this last step since it’s the end of the line for some data. Here are a few criteria to start with.


  • 15 months of inactivity
  • Exclude/include select Lead Sources.
  • Exclude/include Sales generated records.
  • Salesforce Lead records only
  • No active opportunities


Tip: Send a Wake the Dead email before deleting with a few last chance emails, with copy like: “Hey, we love you but is it time to say goodbye? If you want to keep receiving great content, click here. Or, we’ll remove you from our database.”




Keeping bad data around is not a strategy for long-term success. The Dumpster Dive Data Strategy is a methodology that brings clarity to your data retention policies.


Once adopted, expect to see a database reduction of 10-40%+ which will boost your performance and decrease your data costs. Good luck mining.


If you have any questions or need help getting your data strategy up and running, reach out to @Jeff_Coveney3