Our automated dedupe script tracks the contact ID it got up to each run & then does the next chunk the next run. This used to be fine as we didn't do deduping on ingress so we were always adding - but it feels like maybe we should consider looking at modified_date now - so that when a contact is updated they are then queued for dedupe
This could be a bit tricksy when we do large contact updates - ie at fiscal year end - because when we do that many contacts will have a recently modified date....