Hi - Is there a time cutoff between the when the job is created/enqueued and the activities returned for the bulk activities export? I think a simple example will better explain by question .
Suppose I run the 2 below bulk export activity jobs with the corresponding time parameters. Should I expect Job 1 and Job 2 to return the exact same results?
Time I launch(ie /create and /enqueue) the job: '2018-03-22T00:00:01+00:00'
Time I launch(ie /create and /enqueue) the job is one day later: '2018-03-23T00:00:00+00:00'
Because of the asynchronous nature of certain processes, I don't believe you can guarantee that all writes will be complete in the first example.
However, it will be an accurate representation of the way the ActLog looked at the point of execution, which is another source of truth (i.e. only those activities would have triggered SCs in the timeframe).
Hi Sanford Whiteman - Many thanks for your response. Much appreciated! Makes sense; is there any documentation outlining this with more solid timeframes of when we can expect activity data to be final without any further writes to the activity log? Guaranteed/Finalised with no new activities after x hours/days?
Instead of pulling 1 day of activities at a time with the given startAt, endAt above I can pulling a larger window to circumvent missing out on activities if there is a small delay in them being written to the log. Pulling last 3 days, etc however the bulk export quota daily limits may make it a bit difficult to pull for larger timeframes.
is there any documentation outlining this with more solid timeframes of when we can expect activity data to be final without any further writes to the activity log? Guaranteed/Finalised with no new activities after x hours/days?
You can't ever get this with an eventually consistent (asychronous writes) system. You can hope for no more than hour, but there could be outliers, it's just the way it is. Usually you try to set an SLA for committed writes -- but when you break that internal SLA, nobody knows!
Ok - I would expect some ballpark SLA available in the docs that api users could work with. For now I will increase the timeframes to allow for a 24 hour buffer period. Will run some tests to see the difference in number of results returned.
I've asked about the whole asynchronous thing and I was told that that isn't an issue and that the only things would be people moving in and out of a partition and anonymous leads being converted to non anonymous. You won't see anonymous activity in bulk but if a lead is converted later to known then if you got the same bulk data later you would see their activity
It's not true, though. You can demonstrate that the read-after-write results for a form post at 12:00:00 are not the same as the Activity Log if fetched at exactly the same time.
Thank you very much for chiming in, Kurt.
anonymous leads being converted to non anonymous. You won't see anonymous activity in bulk but if a lead is converted later to known then if you got the same bulk data later you would see their activity
If I am reading that correctly, that seems like sort of a big deal. I could very well be missing out on a large amount of activities if the person became a known lead a few days after their corresponding activity while they were still anonymous. If that is correct, may I ask how have you gotten around this? Other than pulling for a larger time period in the bulk api, hitting the /activities endpoint seems like an alternative.
There's no way to "work around" it -- it defines the way Munchkin tracking works. Someone's session can become associated weeks, months, or years after the anonymous session begins.
The extract accurately reflects (with the exception of the async commit discussed above, which is an actual exception despite the quote from support) a snapshot of the activities in associated web sessions during the given period. It can't know any more than that because it can't see the future!