The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

Anonymous
Not applicable

The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

I'm also asking our friendly integration consultant, but given that I'm working late, I thought I'd try my luck here, too.

pastedImage_0.png

How come the same mail address is assigned two different LeadIds ?

I was under the impression that the email address is a primary look-up key for leads and should not be mapped to more than one lead.

These leads are created using the create-or-update-leads JSON API with no custom merge/lookup key.

Tags (2)
8 REPLIES 8
SanfordWhiteman
Level 10 - Community Moderator

Re: The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

Are you running upserts in parallel?

Anonymous
Not applicable

Re: The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

Yes, up to 8 of them in parallel. Trying to serialize before upload is unlikely to work on our side as we're looking at more volume than a single host is good for, especially during spiky peak times.

SanfordWhiteman
Level 10 - Community Moderator

Re: The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

You don't have to serialize, but use a sharding algorithm so updates for the same lead are processed by the same worker.  There's no utility to attempting parallel upserts of the same record (and certainly no guarantee that updates will be applied in order).

Anonymous
Not applicable

Re: The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

Thanks for the thoughts!

Unfortunately, static sharding has not worked out for us in the past. When one of those dedicated machines blow up, saying "every customer whose customer ID ends in 7 cannot use the service right now" doesn't work very well. Instead, we use amorphous sharding, where any worker host can work on any particular item.

It's also the case that, when we did due diligence and implementation planning, we were told in no uncertain terms that "the primary key is email, all leads will be uniquely identified by email" and we've gone pretty far down that line. (The fact that custom activities require a lead ID and cannot identify a lead by email came as a surprise to us; before we found that out we didn't even cache lead IDs locally.)

I'm OK with ordering being non-guaranteed between requests; that's expected and we do well with that. I'm not OK with an alleged "primary key" field being present more than once. (I'd be more OK with an error message from one of the attempts, although ideally, the requests compete, one wins, and the other writes-after the allocating request.)

Think about it: When I upsert a customer-identified-by-email, which of the leads is updated?

When I select-lead-by-email, which of the leads is returned?

When I run a campaign, which of the leads get the campaign?

I strongly feel that this consistency must be enforced on the Marketo side, because I don't think the system will be very resilient in the face of the same e-mail address mapping to multiple lead IDs.

SanfordWhiteman
Level 10 - Community Moderator

Re: The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

Hey, I didn't say "static sharding"!  Any (request) sharding architecture must take availability into account.  Rerouting in response to runtime conditions would have no ill effect here, whereas --  as you're seeing -- not applying any routing intelligence can be dangerous.

I certainly understand the frustrating consequences of what you're experiencing. Possible cause: a race condition in Marketo's upsert logic due to unlocked operations.  Key uniqueness may be enforced at the application level and tuned for performance over guaranteed safety.  Of course this is merely a guess based on experience and I could be totally off, but I could imagine (as I'm sure you can as well) building a system that produces your results you're seeing under parallel load.

Ultimately, though, I believe you can work around this on the client side and still pump information into Marketo at maximum speed (that is, they will remain the bottleneck). As for due diligence, well, maybe you shoulda called me.

P.S. Also consider the default 100 calls/20 sec limit: this doesn't suggest that 8 parallel workers is feasible (are you cooperatively limiting your queries to 5 calls/sec?).  Or were you able to upgrade to a higher request rate as well?

Anonymous
Not applicable

Re: The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

I can imagine someone building a system that traded consistency for performance. However, someone who did that, would not then go around and say that it is a consistent system. Given that we bought the system based on consistency promises, well, there's one end of that scale that I prefer.

If I had known, I might very well HAVE called you 😉

Trying to do consistent (semi-static) sharding here will be annoying, because we use Kafka for our internal work queue, and it doesn't quite like that idea. Each topic has X partitions, but all the partitions live on each broker in parallel, without coordination between them.

We batch work up to X requests or Y seconds, whichever comes first, per worker. On average, this will be slightly more than 1 request per second. The problem arrives when some currently-unknown user generates multiple activity-events, and they happen to go to different brokers/partitions, and those timeouts then happen to strike at approximately the same time. (currently X = 200 and Y = 10, and activities and leads are different queues)

Also, the aggregation code is so straightforward right now that if the same user is discovered to be missing a cached leadid twice in a row, and end up in the same batch with two upsert rows. Again, I'm expecting this to "do the right thing," but if it doesn't, at least that would be easier to work around on the sender side.

SanfordWhiteman
Level 10 - Community Moderator

Re: The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

Surely the people making guarantees about consistency + concurrency weren't core architects, though?

Seems kind of weird to me, honestly, that Marketo would be sold as an API-first platform.  SFDC being sold that way I can understand (at this mature point), but Marketo is still a UI-first, marketer-first platform.  My impression at least. As more features get exposed via REST and kinks get worked out (like the one you're encountering) I could see it being more of a "headless" MA engine, but not quite yet.  Or maybe (I assume you've opened a case) a tech will just flip a hidden switch for your instance and you'll be golden.

Anonymous
Not applicable

Re: The same e-mail address was assigned multiple Lead Ids -- why, and how to avoid it?

The follow-up from our helpful Marketo contacts is that this is most likely to happen when the same batch contains the same e-mail address more than once. We're going to pre-filter batches to eliminate duplicates, and hopefully we'll work around this issue. Meanwhile, I'm told the API people will implement detecting this and making sure it's right on the back-end in a future release, which would be best!

Thanks for your suggestions and comments.