SOLVED

Marketo Activity Ingestion - Understanding Behaviour of bulk extract

Go to solution
DJ_Erraballi
Level 2

Hi there, 


got a multiparter question: 

Question #1 

 

I am currently debugging some issues with our Marketo activity feeds. Noticed recently that we obtained an activity from marketo that looked like this: 

 

result = {OrderedDict: 8}  
'marketoGUID' = {str} '149877905'
'leadId' = {str} '3173901'
'activityDate' = {str} '2020-06-05T23:08:07Z'
'activityTypeId' = {str} '1'
'campaignId' = {NoneType} None
'primaryAttributeValueId' = {str} '31707'
'primaryAttributeValue' = {str} 'www.multicare.org/photos/'
'attributes' = {NoneType} None 

 This landing page activity didn't have the attributes set entirely. (This is where i am usually obtaining the web page url, referral url if present, etc.). Is this expected to occur? Been ingesting landing page activities since 2019 for multiple clients and this is the first time we have come across the above so wondering if it somethign that is likely to occur again.

Currently: not only do we depend on attributes to be set, we also depend on 'Webpage URL' to be set on the attributes field in order to properly report on data in marketo. 


 

Question #2

 

It appears that we are actually missing some data in our extract from marketo, and it is possible that we may need to tweak our strategy.

 

Currently we execute a bulk extract with a start date a couple minutes before our most recent known stored activity. If we look at each extract as a slice, currently we are guaranteed to include every single time slice that is possible in our extracts. 

 

Where i am worried, if i query something like (dummy values): 


startAt: 3:00pm 

endAt: 4:00pm

at 4:01pm. i would get a set of activities that is DIFFERENT

than if made the same query at 4:16pm. (This is my current working theory for why it appears there are activities missing). 

Is it possible that marketo can add out of order activities (with activity dates in a past time range)? If so is there a recommended buffer to add to our start_at time period, to ensure we don't miss any activities? Also how long after a time range has elapsed could activities be added to that time range? 

Tags (1)
3 ACCEPTED SOLUTIONS
SanfordWhiteman
Level 10 - Community Moderator

Yeah, it's almost impossible to solve this. You do have to be aware of it, though: any dashboard is necessarily frozen in time and under some circumstances might be showing you a minority of the activities that would be shown if you re-downloaded later. For example, if you accidentally no-tracked a link, it wouldn't associate people's Munchkin sessions. Then in the future, a tracked link would associate the session and replay all their old activities.

View solution in original post

SanfordWhiteman
Level 10 - Community Moderator

By "Control K" do you mean ASCII %0B?  Not sure why that would have any affect on your CSV parser.

View solution in original post

DJ_Erraballi
Level 2

Sigh, so for anyone looking into this, i am still in the process of debugging/ working around. But the gist is that there was acontrol character present in the primary attribute value id. 

Python has as string command "splitlines" which is generally used w/ python csv readers to parse files. The splitlines commands willa ctually attempt to split on the ^k character as well, which is a fairly underdocumented feature of splitlines. This caused the parsing of the activity to likely fail in the way i have experienced. 

since this issue is client-side i will close this issue, (i am assuming there are no guarantees we won't get future files with control characters like this, and that recieving control characters is expected behaviour). 

View solution in original post

12 REPLIES 12