10 Replies Latest reply on Mar 23, 2018 9:55 AM by Kurt Koller

    Bulk Activities Export Recency

    Marco Tavecchio

      Hi - Is there a time cutoff between the when the job is created/enqueued and the activities returned for the bulk activities export? I think a simple example will better explain by question .

       

      Suppose I run the 2 below bulk export activity jobs with the corresponding time parameters. Should I expect Job 1 and Job 2 to return the exact same results?

       

      Job 1:

      Time I launch(ie /create and /enqueue) the job: '2018-03-22T00:00:01+00:00'

      startAt='2018-03-21T00:00:00+00:00'

      endAt='2018-03-22T00:00:00+00:00'

       

      Job 2:

      Time I launch(ie /create and /enqueue) the job is one day later: '2018-03-23T00:00:00+00:00'

      startAt='2018-03-21T00:00:00+00:00'

      endAt='2018-03-22T00:00:00+00:00'

        • Re: Bulk Activities Export Recency
          Sanford Whiteman

          Because of the asynchronous nature of certain processes, I don't believe you can guarantee that all writes will be complete in the first example.

           

          However, it will be an accurate representation of the way the ActLog looked at the point of execution, which is another source of truth (i.e. only those activities would have triggered SCs in the timeframe).

            • Re: Bulk Activities Export Recency
              Marco Tavecchio

              Hi Sanford Whiteman - Many thanks for your response. Much appreciated! Makes sense; is there any documentation outlining this with more solid timeframes of when we can expect activity data to be final without any further writes to the activity log? Guaranteed/Finalised with no new activities after x hours/days?

              Instead of pulling 1 day of activities at a time with the given startAt, endAt above I can pulling a larger window to circumvent missing out on activities if there is a small delay in them being written to the log. Pulling last 3 days, etc however the bulk export quota daily limits may make it a bit difficult to pull for larger timeframes.

            • Re: Bulk Activities Export Recency
              Kurt Koller

              I've asked about the whole asynchronous thing and I was told that that isn't an issue and that the only things would be people moving in and out of a partition and anonymous leads being converted to non anonymous. You won't see anonymous activity in bulk but if a lead is converted later to known then if you got the same bulk data later you would see their activity

                • Re: Bulk Activities Export Recency
                  Sanford Whiteman

                  It's not true, though. You can demonstrate that the read-after-write results for a form post at 12:00:00 are not the same as the Activity Log if fetched at exactly the same time.

                  • Re: Bulk Activities Export Recency
                    Marco Tavecchio

                    Thank you very much for chiming in, Kurt.

                    anonymous leads being converted to non anonymous. You won't see anonymous activity in bulk but if a lead is converted later to known then if you got the same bulk data later you would see their activity

                    If I am reading that correctly, that seems like sort of a big deal. I could very well be missing out on a large amount of activities if the person became a known lead a few days after their corresponding activity while they were still anonymous. If that is correct, may I ask how have you gotten around this? Other than pulling for a larger time period in the bulk api, hitting the /activities endpoint seems like an alternative.

                      • Re: Bulk Activities Export Recency
                        Sanford Whiteman

                        There's no way to "work around" it -- it defines the way Munchkin tracking works. Someone's session can become associated weeks, months, or years after the anonymous session begins.

                         

                        The extract accurately reflects (with the exception of the async commit discussed above, which is an actual exception despite the quote from support) a snapshot of the activities in associated web sessions during the given period. It can't know any more than that because it can't see the future!

                        • Re: Bulk Activities Export Recency
                          Kurt Koller

                          Well I'm going to maybe blow your mind some more here because those show up I believe as a lead merge activity. We then take the targetid of the merge and go back with non-bulk API for any leads that have been made known and get their activities. It's not great.

                           

                          Also speaking of merge, we also need to keep activities sorted and mapped to the proper leads for aggregate reporting, and we need to keep leads sorted, so we have to look at all delete lead and merge lead activities. If we get a delete lead, we mark the lead as deleted. If we get an n-way merge for leads and they were merged into another person, we then change the leadid for all old activities for merged leads, and mark the old leads as merged/deleted.

                           

                          This also means we need to regenerate summaries related to lead/activity counts since the activities can be remapped to different leads later in merges.

                           

                          There are a lot of gotchas if you're trying to keep data synced with marketo.