Re: Another service interruption affecting trigger campaigns - when is Orion going to prevent this?

Dan_Stevens_
Level 10 - Champion Alumni

Another service interruption affecting trigger campaigns - when is Orion going to prevent this?

This morning, we received a troubling notification that our instance experienced a service interruption this week (from Monday-Wednesday):

The Marketo system hosting your subscription experienced a Service Interruption which affected the logging of activities. We have identified the issue and current activity logging has been fully restored. 

When: Wednesday, June 21st 5:30 AM PT

Duration: Monday 6/19 - Wednesday 6/21

Service affected:  During the affected period, you would have seen a delay in building smart campaign membership, along with a small subset of activities that were not processed.  Login to the Marketo application, Landing Pages, and the rest of the platform 

were not affected.

Cause: The cause has been identified as a storage issue within our  Activity Service Servers that are used to log lead activities.

Resolution: To address this, we have added more capacity to our Activity Service Servers, along with an optimization patch for faster campaign membership processing. 

Activities that were not logged during this time are being reinserted. This process can take up to two weeks to complete. During this time, you may see some duplicate activities in lead activity logs. We  will be removing those duplicate entries through a data fix, which could take 4 weeks to process. While the duplicate activities are being reinserted and cleaned up, they will not trigger any campaigns. If you have Smart lists listening for any duplicate activity filters such as (Visit Web Page, Click link, etc) with a constraint of number of times, this may qualify your leads temporarily, until we complete the cleanup. We will be sure to provide a progress update next week.

We want to assure you that we take these incidents very seriously, and we are taking steps to prevent similar situations going forward. If you have any questions about the Service Interruption, please contact Marketo Customer Support team at http://support.marketo.com and they will be happy to answer any questions you might have.  Thank you for understanding and being a Marketo Customer.

This is very concerning - especially the amount of time it's going to take to resolve this (which sounds like it will also affect our reporting).  Isn't this something that the Orion architecture is supposed to prevent?  And if so, what is the status of getting Orion deployed to Marketo's customers?  Seems odd, that the server's aren't being monitored today to ensure they contain proper storage.

Just curious, has this impacted any of your instances?

2 REPLIES 2
Josh_Hill13
Level 10 - Champion Alumni

Re: Another service interruption affecting trigger campaigns - when is Orion going to prevent this?

Yep....

My understanding is that almost everyone is on Orion now. I also had a major slowdown yesterday, although it wasn't clear why and may not have been related to this particular issue.

My understanding is that Orion sped up the trigger eval queue and munchkin...but I could be wrong on precisely how.

SanfordWhiteman
Level 10 - Community Moderator

Re: Another service interruption affecting trigger campaigns - when is Orion going to prevent this?

Yeah, something tells me this was more Orion's fault (if you will) than something Orion would prevent.

Even with storage ultra-cheap in general, you can still run out. Can happen when storage is supposed to "auto-grow" as opposed to needing manual intervention to increase caps, since then you stop paying attention to it.

P.S. If it makes you feel any better (it won't!) AWS had an hours-long outage last night that they attributed to a failure of their capacity governor (i.e.the thing that was supposed to balance resources across all of their users instead denied access to all of their users).