Help identifying duplicates across instances

Help identifying duplicates across instances

Hey Marketo Community,

A company that I am working with is going through some merging and acquisition changes. One of the objectives in the near to mid term is to consolidate into one Marketo instance.

We are in the process of doing a lot of the prep work and one of those tasks is to identify how many email address duplicates that we have across our two instances.

Has anyone had to do something similar? And, if so, what was your process? Currently the game plan is to export just email addresses from instance A and instance B and then to remove duplicates to just see a raw number of unique emails. There will be a little under 1 million email addresses that we'll be looking for duplicates. And while this particular scenario isn't a really difficult task, I wanted to see if there were more efficient or effective processes to do something of this nature. Especially if, in the future, we want to de-dupe based off of more than just email addresses because that could, and would, bog down the spreadsheet with so many columns and rows filled with data. Or we could possibly run into limitation issues in Excel: Excel specifications and limits - Excel

Thanks in advance for the help.

Level 10 - Community Moderator

Re: Help identifying duplicates across instances

Certainly Excel wouldn't be the ideal tool for datasets of this size, but rather a SQL database. For SQL it's tiny, for Excel it's huge.

On the SQL side it's not difficult to get the report you want. The larger question with such things is how to do intelligent merging of the duplicates using the inbound API. Note across instances it's impossible to merge activity histories.

Re: Help identifying duplicates across instances

Thanks, Sanford. Appreciate the input. I figured that we would have to go with something like this.

Just wasn't sure if there was a nifty Marketo specific tool that did this.

Champion Moderator

Re: Help identifying duplicates across instances

During the merging, you should be taking care of the following things:

  • De-duping the records based on email address or any other composite key
  • Check with system has the latest subscription preferences
  • If you have multiple product lines, you have to check which system has the latest subscription for specific product
  • You have to make sure you do not loose your lead source tracking data. For lead source give preference to the system that acquired the lead first
  • You have to also see which system has the most updated profile data and give the preference to that one
  • If you have any custom object build, you have to migrate those as well on the master instance

Just double check the specifics for GDPR, CASL and double opt-in.

Re: Help identifying duplicates across instances

Hey Amit,

We already have a lot of these on our project list and timeline. Didn't think about the custom object point yet though. So I appreciate it!