Re: getMultipleLeads - streamPosition ID not respected?

Anonymous
Not applicable

getMultipleLeads - streamPosition ID not respected?

I'm using the getMultipleLeads API call to retrieve mutliple Leads.  There is a local limit to the number of leads I can process in each run, so I need to be able to page by lead record in some way.  I tried paging using the IDs by inputting the id into the streamPosition as such:
"id:foo:ts:0:os:0:rc:0"
where the foo is a number corresponding to the Lead's ID such as 500.  
The result is that the os and ts are respected (so pulling records from offset 0 / time 0 onward as the functionality would seem to indicate) and the ID position is not respected.  Is there some way to make the request such that the ID passed in the streamPosition is respected as a "pull records with ID after X".

I looked into using the timestamp (ts) and the offset/index (os) fields individually and in combination.  Unfortunately this results in exrtremely complex coding to walk backward through records to account for situations where records are deleted (os shifts behind the scenes, so when I make the call using the streamPosition with the ts and os from the previous request the results are missing the leading edge records).  This is the solution I'm currently working through due to the ID issue noted above, and it requires I keep a handle on all of the previously seen records to check as I walk backwards until I find the correct record to restart from.  The gap in fetches could be up to a day long, so I cannot rely on the interval being so short as to make the deletion of a record inbetween syncs unlikely.

I also see that there is the option to use the leadSelector and pass in a leadSelectorRef key enum of IDNUM, however this takes in a list of IDs, which I would not already have.  It does not appear to provide the ability to do any sort of greater-than operation, so this does not appear to be an option for my case.

Since I know the question will come up 😉
request trying to use ID for streamPosition:
<env:Body>
  <mkt:paramsGetMultipleLeads>
    <streamPosition>id:500:ts:0:os:0:rc:0</streamPosition>
    <batchSize>100</batchSize>
  </mkt:paramsGetMultipleLeads>
</env:Body>


Situation mentioned in the second part:
request 1:
<env:Body>
  <mkt:paramsGetMultipleLeads>
    <streamPosition>id:0:ts:1381251748:os:0:rc:0</streamPosition>
    <batchSize>100</batchSize>
  </mkt:paramsGetMultipleLeads>
</env:Body>
newStreamPosition returned is "id:1214:ts:1381251748:os:100:rc:50"

Record with ID 1211 is deleted for example then the second request is made:

<env:Body>
  <mkt:paramsGetMultipleLeads>
    <streamPosition>id:1214:ts:1381251748:os:100:rc:50</streamPosition>
    <batchSize>100</batchSize>
  </mkt:paramsGetMultipleLeads>
</env:Body>

record 1215 will have moved one position earlier in the total list of lead records.  As a result it will be at os:99 and will not be returned when this second call is made.  In our situation the user is able to configure runs with periods of up to 1 day.  In the event during that day a record is deleted, for example it is a duplicate, then the shift would happen and we would miss pulling out the leading record in the next batch as it would have shifted to the previous batch.

Any suggestions?
Tags (1)
7 REPLIES 7
Anonymous
Not applicable

Re: getMultipleLeads - streamPosition ID not respected?

Eric,
Stream position elements contain a position reference for one or more logical streams of time sequenced data. The position reference may be an approximate external specification like a timestamp or an exact, but opaque internal specification of position returned by an operation. Stream positions may be defined as a complex, multi-element type or may be a string. Examples of the streams support:
1. Sequence of new lead by creation time.
2. Sequence of activity history records for a single lead.
3. Sequence of all activity log records (possibly filtered by activity type).
The StreamPosition is used to retrieve data in batches, and allows the caller to page through the result. The StreamPosition passed into API is the value of the StreamPosition returned in the previous response.

The key thing to remember here is that you should NOT modify the StreamPosition, but just pass in what the previous call returned.  We have some sample code at http://developers.marketo.com/documentation/soap/getmultipleleads/

R
aj
Anonymous
Not applicable

Re: getMultipleLeads - streamPosition ID not respected?

Raj,

That methodology fails if a record is deleted, hence the issue.

Imagine the following scenario:
Lead records (ID's for leads): ["1", "2", "3", "4", ... , "5000"]
The first call uses the streamPosition: "id:0:ts:0:os:0:rc:0" such that the earliest records are returned.  batchSize = 1000.  This returns 1000 records and a new streamPosition of "id:1000:ts:1234567890:os1000:rc:4000".  Before I make the second call to the API for the next batch of lead records someone from marketing finds a duplicate entry in records 1-1000 and decides to delete that record.
When I make the second call now that a record has been deleted I'll receive leads with ID's [1002 - 2001] instead of [1001 - 2000].  Thus the Lead record with ID 1001 is not retrieved by the API call.  I've validated that this is how the API is behaving.  Due to this and the need to ensure I do not miss any records I have to have in hand the previous records I've seen, move the os back by 1 and check that the first returned record existed in my previously seen data.  If it does not then I have to step back again, make the api request again, and check if that first record has been seen.  This loop continues until I've found a matching record and then I'm sure I've not missed one.

It would seem to make sense for the id in the streamPosition to act as an ID based offset, but that is not the case currently.  For simple ID-based paging it would be helpful to implement that as a feature for getMultipleLeads.
Anonymous
Not applicable

Re: getMultipleLeads - streamPosition ID not respected?

Eric,
Any leads not returned should be assumed to be deleted or merged or not created/updated during the given time period. You should not modify the streamPosition in the response and should just pass back what you receive in your next call.

Raj
 
Anonymous
Not applicable

Re: getMultipleLeads - streamPosition ID not respected?

Please try the following:

Leads ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

request 1 "id:0:ts12345:os:0:rc:0" set batchSize to 5
returns: ["1", "2", "3", "4", "5"] and a new stream position of "id:5:ts:12345:os:5:rc:5"

Delete lead "4" which has already been retrieved

request 2 "id:5:ts:12345:os:5:rc:5"
returns ["7", "8", "9", "10"]

Lead "6" was not returned, but it has not been deleted.  
Anonymous
Not applicable

Re: getMultipleLeads - streamPosition ID not respected?

Eric, has this been resolved?
Anonymous
Not applicable

Re: getMultipleLeads - streamPosition ID not respected?

No, there is a defect in the way the stream position handles deleted contacts.  I have not yet seen that addressed.  Instead the suggestion in the earlier response was to assume any missing records from the calls were actually deleted.  That assumption is only true when the deleted record existed within the return set for the new stream position.  If the record deleted existed in the previously fetched set that spawned the new stream position then all reords appear to shift and the leading record(s) which should have been returned on a subsequent call is not returned.  The record was not deleted however.  My post from 10/9/2013 4:45 AM illustrates the issue.
Anonymous
Not applicable

Re: getMultipleLeads - streamPosition ID not respected?

Thanks for your update, Eric. Hope the Marketo folks will respond.