Merging People in Bulk Using the API

Tyron_Pretorius · ‎10-19-2021

Flow Diagram of the Python Duplicate Merging Script

This post will walk you through how to use the merge endpoint to merge people in bulk with Python code that will allow you to set custom rules for how you want to prioritize conflicting values for the same fields on different people.

If it is your first time using the API or you need a quick refresher then check out the Quick-Start Guide to the API to see how to make your first API requests in Postman before transitioning to making requests in code or in the Zapier automation tool.

Also if you are looking for a course for learning the Marketo API then DM me 🙂

You can find the main Python script and all the supporting functions linked in the paragraphs below in the Bulk Merging Github repository.

Supporting Functions

Getting Access Token

The getToken function makes a get request to the token endpoint, passing the client id and client secret, to return the API access token, and the length of time that this token will be valid for.

Merging People

The mergeLead function makes a post request to the merge endpoint and takes in the winning person id, an array of losing person ids, and a boolean value to indicate whether or not the people should be merged in your CRM tool e.g. Salesforce.

When the “mergeInCRM” parameter is set to false the winning person and all losing person ids can be merged in a single call. However, when this parameter is set to true Marketo is restricted to merging only 2 people at a time so if there is a triple duplicate i.e. three people with the same email address, then the winner must first be merged with one of the losers and once this is complete the new resultant merged person must then be merged with the other loser. Therefore, to achieve this functionality a for loop is used to iterate through each of the loser ids and successively merge them with the winning id.

Rules for Winning Field Values

Choosing winning field values when merging manually in Marketo

When manually merging in Marketo with discrepancies between the person field values, the interface allows you to choose which values you want to win and therefore keep on the resultant merged person. The Priority.py script does the exact same thing, using custom rules specified by you to determine which values are prioritized for person fields in a merge.

To do this it uses the following functions to compare the disparate values of the person for the same field.

Get First Created Date

The createdAt function returns the earliest created at date and the list index at which it occurs.

Get Maximum Lead Score

The leadScore function returns the maximum score and the list index at which it occurs.

Get First Non-Null Value

The notNull function returns the first non-null value to appear in the list, excluding null-like values such as “empty”, “unknown”, “n/a”, and ‘none’. If no such values exist this function returns the value and index of the first list element.

Define Field Value Rankings

The priority function makes use of a dictionary, which contains a list for each field where the list is sorted from values of highest priority to lowest priority e.g. 'leadSource': ["Advertising", "Paid Search", "Organic"].

The list of prioritized values is then iterated through starting from the highest priority value e.g. “Advertising”, and each value from the prioritized list is compared to each person value. If either of the person values match the prioritized value then the value and index of this person value are returned.

Else the for loop continues with the next highest priority value to see if this is present in the person values. In this way, values for a person field can be prioritized according to a custom user-specified order so that these values will be retained on the merged person.

Get First False Value

The boolTest function was built for the “Unsubscribed” and “Account Blocked” boolean fields where the priority was to retain a TRUE value for either of these fields if they existed e.g. you do not want to email the merged person if one of the constituent people was unsubscribed. Therefore, this function returns the value and index of the first TRUE value in the list of person values.

Call Appropriate Prioritization Function

The ruler function gets called from the main program within the Priority.py script and then takes the list of person values given for the field being evaluated and passes them into the rules dictionary. The rules dictionary maps a field to the function (one of the functions described above) that should be called to get the highest priority value for that field from the person values that were passed in.

Append Dictionary

The appendDict function does what it says on the tin. It takes an input dictionary and for each value this dictionary has for a key it will append each value to the value(s) already in the master dictionary for the same key.

Updating Winning Person Values

The createUpdateLead function makes a post request to the create/update lead endpoint to update a person with the winning field values after the merge is complete. The winning person id and winning field values are passed in JSON format to the payload of the request to make the update.

Although this endpoint allows for updating 300 people at once in a single call, it was found to be safer to update each winning person with the winning field values straight after the merge was complete in case the script were to fail before 300 people were merged meaning that none of the winning people would have been updated.

Main Script

Flow diagram of the main Python script

Getting Person Data & Access Token

Before running the main merging script, go to the “Possible Duplicates” system smart list, sort by email address (the script assumes that the input data is sorted by email), and then export all the duplicates from a view that contains all the fields you are interested in setting up priority rules for when merging.

N.B. This script was originally configured to pull person data using the API, which returns createdAt date in ISO 8601 (2016-11-03T19:17:57Z) standard and boolean values as either “TRUE” or “FALSE”. When exporting as a CSV the date needs to be converted to ISO 8601 standard and the 1’s or 0’s representing boolean values need to be converted to “TRUE” or “FALSE”. Importing the CSV into a sheet tool will get this done quickly or you can modify the createdAt and boolTest functions in the priority script to work with the default exported values. To save on API calls and make the script faster to run it is recommended to pull in the person information via CSV rather than relying on the API to pull in person information from person ids.

Once the data has been extracted from Marketo it can then be imported into the main script as a dictionary using the Pandas library in Python. Before looping through this dictionary of duplicates, it is necessary to get an access token and ensure that there are more than 60secs left on the token life so that there is enough time to complete the merge and update the winning person in a single iteration of the loop. This is achieved using nested while loops with the inner while loop checking that token life is greater than 60 secs before every run and if this is not true the outer while loop will wait until the access token has expired and then get a new access token with 3600secs of life.

Getting, Storing, & Prioritizing Person Values

Since the input dictionary is sorted by email address the field values from successive rows with the same email are pulled into the field_dict dictionary which stores the person values using the field names as the keys. Once this is done the comparison of the person values for each field is achieved using a for loop to iterate through each one of the field keys in field_dict.

If the values for a particular field match then this value is put in the final_dict dictionary for that field, otherwise the field values are evaluated using the custom rules declared in the Priority.py script using the ruler function (see the “Rules for Winning Field Values” section above).

The first non-null sfdcLeadId value is stored in final_dict and the corresponding index can be used to pull in the 'id' and 'mcUserId__c" values from the same person who has this sfdcLeadId value and who will be the winner of the merge. If none of the people exist in Salesforce then the id , mcUserId__c, and createdAt values of the person who was created first are stored in final_dict.

Similarly, the "Lead_Source_Detail__c" and the three "utm_xxx__c" values are obtained from the person who has the highest priority leadSource value.

Merging People & Updating Winner

Once all the prioritized values have been stored in final_dict, they are logged in a timestamped text file along with the field_dict values before the script proceeds to merge the two people by making a request to the merge endpoint.

The response of the merge is logged and then if the merge is successful the newly merged person will be updated with the prioritized values for the fields stored in final_dict. Sometimes even though the script logic selects a winning id (see section above) Marketo has an overriding rule that if a lead and contact are being merged in SFDC then the contact id will always win.

Therefore in some cases, the supposed “winning” person id specified in final_dict will be the losing id if this “winning” id was a lead and the other id corresponded to a contact. If this is the case "status":"skipped" will be returned in the response meaning it is necessary to update one of the other person ids stored in field_dict with the prioritized values using a while loop to iterate over the remaining ids until the actual winner is found and updated.

You can avoid the need for this while loop and ensure that the winner selected in your Python logic always matches that of Marketo’s inbuilt logic. This can be done by modifying the main merging script to pull in the sfdcType field to ensure that an SFDC contact will always be selected as the winner if the other people to be merged are SFDC leads.

N.B. Out of the 13,000 people I merged with this script there were a stubborn 1000 people who would not be merged because of a vague 611 error returned from SFDC and a lot of these just had to be merged manually. If you are encountering this error and have too many people to merge manually I can give some tips to try that showed some success in merging people with a previous 611 error. Just leave a comment below 🙂

Rinse & Repeat

After merging, a 0.2-second delay is implemented so that the API rate limit of 100 calls in 20 seconds is not exceeded. Finally, the remaining time on the token is calculated at the end of the inner while loop and if this time is greater than 60secs and there are still more people to merge then the next iteration of this while loop will proceed.

If the remaining time is below 60secs, the script breaks out into the outer while loop and then waits for this time to elapse before requesting a new access token and diving back into the inner while loop. This process will be repeated until all the people in the CSV file have been merged.

Take a look at the merging script in action by viewing a sample output file from the merging script.

Merging Automatically Upon Creation With Zapier

Great! You are now able to clean up that HUGE pile of duplicates that has been eyeballing you for the past year by taking this GitHub code and making it your own.

In the scenario, where you are not able to stop the cause of duplicates or you are in the process or working on a fix and you do not want these duplicates to build up then you can merge people automatically upon creation using Zapier.

Take a look at the Merging Automatically with Zapier post to see how you can use the merge endpoint that you know and love so well to merge people as soon as they are created.

The content in this blog has been reviewed by the Community Manager to ensure that it is following Marketing Nation Community guidelines. If you have concerns or questions, please reach out to @Jon_Chen or comment down below.