Dirty Data and GDPR Prep

Discussion created by e2d5dd46e885dee20dbc50e28b7823f59daf0048 on Mar 29, 2018
Latest reply on Mar 29, 2018 by 82505809d622c23f726413215af831a696928bed

If your instance of Marketo is like mine, you have legacy data pulled from multiple platforms, sales computers, and in some cases, written in short hand on business cards and sheets of paper. This makes it that much more fun to prepare for a double opt-in campaign for EU citizens. A few weeks ago I was 100% sure that EU citizens were not properly tagged in our instance, but after a few manual processes (hours of excelling and whiskey to stay sane) we're getting closer and closer to knowing how exposed we are from a legacy data perspective.


Here are a few processes that you can do to clean your data to prepare for an April/May campaign:


  • Inferred/Billing Geo: I found that we had a lot of 'Country' fields empty, but had some useful data in the 'Inferred Country' and 'Billing Country' fields. At first I was a bit skiddish to run smart campaigns to pull data into the 'Country' field, but after a bit of auditing the data seemed sound and we updated our empty Country fields.
  • Cities/States/Zips: Much like inferred/billing country, we were able to pull some insight from filled in 'city', 'state', and 'zip' fields. If you're targeting EU citizens, you can do a look up for these data points and run a smart list for "if fields are/contain" and update Country fields appropriately. If anyone finds it helpful, I can compile the list of cities/states/zips that we found to be in the EU and attach it here.
  • Country Codes: One thing that can be extremely helpful, from a geo perspective, is country codes in email domains. You can find many of the Country Codes here. Unfortunately, Marketo does not have an 'email address: ends with' lookup, so you have to get a little creative. If you pull your database (or subset) to CSV/Excel, you can break these down. To do so:
    • Copy/Clone the email address column, and place it at the end of your data set (you're going to push data to many columns)
    • Highlight the new column, and go to Data>>Text to Columns>>Delimited>>Other: @ >>Next>>Finish
    • You now have a 'Domain' column
    • Highlight the Domain column, and go to Data>>Text to Columns>>Delimited>>Other: . >>Next>Finish
    • You now have between 2 and ∞ columns with data (yay!)
    • Filter Row 1 (freeze for ease of scrolling) and search for where the domains end
    • Sort the rows as you go (from the farthest to the closest columns) and if a domain meets a requirement (ends with .de for Germany, ends with .fr for France, etc.) you can do a smart campaign with "Smart List=if email is:___ and Country is Empty // Flow=Country is:___"


Some of this is intuitive, but I haven't found this flow anywhere in the Marketo docs so I thought I'd share.


If you have any additions/alternative ways to work through dirty data, let us know!