We need to make a decision on how we label the first event in a user or page's timeline. Our current thinking is:
In our understanding, a user timeline is coherent (edits happen AFTER creation). Therefore we think users having no creationTimestamp or a creationTimestamp after the user's firstEditTimetamp should be set to firstEditTimetamp. The formula being: eventTimestamp of create-event = MIN(user_registration, date-of-logging-table-create-event, first-edit). This will set both user_creation_timestamp and event_timestamp for the first event in the user's timeline. We'll keep userCreationTimestamp to be MIN(user_registration, date-of-logging-table-create-event), and will also keep userFirstEditTimestamp.
Pages can be created as partial restores of an older page, with older revisions. So the first edit on a page could happen before the page's creation date without anything being erroneous. Since this is part of how mediawiki works, we'd like to highlight it in the data and use date-of-logging-table-create-event as the page_creation_timestamp. If such a create event doesn't exist, we leave page_creation_timestamp null. And we populate page_first_edit_timestamp so it can be used instead of creation date where needed. If we find some other data that lets us tell the difference between "old restored first edits" and "edit that created the page", then we can add another field to further clarify. About events, timestamp of a page create event is set to the pageCreationTimestamp, even if it is null, to let analysts know that the page creation is unknown (first edit should however be populated). We might take the MIN(pageCreation, pageFirstEdit) when loading in druid, as druid don't accept NULL timestamps though.
This task can be marked resolved if there's general agreement on the above.