Sanford Whiteman and others gave me some great advice on how I could use lists as a work queue to work around Munchkin v2 limitations on what actions could be done when a lead was associated: When a lead is merged / associated how can I make previously anonymous activity send an email?
I'm looking for ways to avoid a race condition where people in a list may get processed more than once.
My goal is to put people into a list for specific campaigns, and use the list as a queue that will be flushed by "cron job" / scheduled campaigns that run once an hour, and also by triggered campaigns that hopefully happen more quickly. The triggers may not always happen, so the cron is the fallback to flush the queues regularly.
Imagine I have a Smart Campaign that finds people in a specific list, then in the flow, removes them from the list, and sends an email.
Could this SC be invoked twice at the exact same time e.g. once from a scheduled campaign, and another time from a triggered campaign? If that was the case, maybe the user gets 2 emails.
What if the user is removed from the list in between when the SC criteria makes them eligible and when the flow actually runs? Should I be re-checking list membership before sending the email? Are there any other gotchas here? I can't say only send the email once, b/c sometimes this is for transactional reasons so people may be queued up in the same list multiple times (but no more than 1 time at once).
Thanks!
Solved! Go to Solution.
Unfortunately, you're still going to have race conditions in this scenario. There simply isn't enough atomicity or transaction awareness (not in the transactional email sense but in the SQL transaction sense).
What you need is an Atomic Compare-and-Swap (CAS) or true stack to make this work. You could definitely accomplish this using a webhook and a key-value store with CAS abilities.
(Also, just FYI, I'm primarilySanford Whiteman -- you @'d one of my secondary accounts!)
Unfortunately, you're still going to have race conditions in this scenario. There simply isn't enough atomicity or transaction awareness (not in the transactional email sense but in the SQL transaction sense).
What you need is an Atomic Compare-and-Swap (CAS) or true stack to make this work. You could definitely accomplish this using a webhook and a key-value store with CAS abilities.
(Also, just FYI, I'm primarilySanford Whiteman -- you @'d one of my secondary accounts!)
Hi Jon,
Re: "Could this SC be invoked twice at the exact same time e.g. once from a scheduled campaign, and another time from a triggered campaign? If that was the case, maybe the user gets 2 emails."
Sanford is right. However, it seems like you could minimize this possibility by adding the choice to the flow of each campaign - "Member of Smart Campaign is Not <The Other Campaign>.
Re: "What if the user is removed from the list in between when the SC criteria makes them eligible and when the flow actually runs? Should I be re-checking list membership before sending the email?" Are you worried the user who has been removed from the list will get the email when she shouldn't because she is no longer eligible by the time the flow runs? If so, re-checking list membership before sending the email is a reasonable approach.
Denise
If so, re-checking list membership before sending the email is a reasonable approach.
But the lookup and the subsequent send are not interlocked, and there isn't a unified view of the database that's guaranteed to persist across the steps of a flow. (Think about how many flows would be broken if this kind of isolation were in place!)
This race condition is more like the classical examples from programming. It's a fine-grained example of how checking for a condition, then proceeding as if the condition is still true despite the surrounding system not making that guarantee, is ultimately unreliable -- however unlikely it seems that you'll run into the bad case.
One way to avoid this is to use a system that can deliberately invalidate the condition at exactly the same time (interlocked) it evaluates the condition. That guarantees that any later attempt to read the same condition will fail, even if it's only one clock tick later. Or you can use a system that uses at-most-once to pop something off a stack and guarantee to never pop it again.
Seems like without the ability to lock, you basically can't ever implement at-most-once. Outside of Marketo we use Pub/Sub, which has at-least-once delivery as is common with distributed systems, so we have to track each message ID centrally in MySQL with locking to avoid duplicate processing.
It seems like something similar to the list / flow I have in my screenshots in my other post are as close as I'm going to get in Marketo. Thanks for verifying, just wanted to make sure I wasn't missing some expert strategy. It would be kind of nice if Marketo couldn't run a specific SC for the same user more than once in parallel, but that's probably way too complicated in a distributed system.
Seems like without the ability to lock, you basically can't ever implement at-most-once.
Only via a webhook which uses a back end that supports it. In case of network or local processing errors still have to consider the zero case (when the service gives you the result but you mishandle/can't handle it and it never will give it to you again).
Hi Denise,
Yes in addition to lack of "locking read" support for exclusive access to the list within a few second time period (much like Sanford is talking about and I'm thinking about as an engineer), I'm also concerned about the gap of time between member list creation and campaign execution (from Under the Hood II: Batch Campaigns recording and How Campaign Processing Works ).
Since it seems like there could be a big delay between member list creation / initial eligibility via the Smart List and the actual Flow steps, it seems like the secondary check would help. Hopefully that extra check doesn't slow down flows too much, but it seems like that's a fast (< 50 ms) operation so it's probably a good idea. Upon thinking about this race condition more, it seems like I want to remove the user from the list ASAP to reduce the time window between checking eligibility and list removal. Since flow steps like email and webhook sending might take a while, I'm now thinking I'd use a choice step to Remove From Flow as the first step if the user isn't in the list. Then proceed to remove them from the list and then do any flow like webhooks and emails.
Here's the template I think we'll use for each campaign where we use a queue.
Hi Jon,
I think you mean to say Member of List "Not in" rather than "in" in Flow step 1. Otherwise, looks good.
Denise
Thanks for taking a look Denise.
It reads in a confusing way, but I think it's doing what I want, while avoiding "not it" / negation for the best list querying performance although I haven't had a chance to test yet. The intent is that we don't remove from flow if you're still in the list, then continue on to remove you and send emails or fire webhooks. However, if you aren't in the list, I'd expect the default choice to kick in and remove you from the campaign to avoid firing the steps in case you were removed from the list by another concurrent campaign between list creation and flow execution.
Does this seem right?
Hi Jon,
Ah, I see what you mean now. The logic works but I think it's unnecessarily complicated. I think querying static list for membership is a pretty light load and I would opt for the easier-to-understand option: Remove from Flow if not member of list, otherwise do nothing.
Denise