We need to make a decision on how we label the first event in a user or page's timeline. Our current thinking is:
Users
=====
Users can be created after they make edits. Probably because mediawiki was at times slow to write to the logging table,In our understanding, a user timeline is coherent (edits happen AFTER creation). so a user created an account,Therefore we think users having no creationTimestamp or a creationTimestamp after the user's firstEditTimetamp should be set to firstEditTimetamp. made editsThe formula being: `eventTimestamp of create-event = MIN(user_registration, and later ended up having a create row in logging. Since this is not normal, we think the user's creation date should be: `MIN(user_registration,date-of-logging-table-create-event, first-edit)`. This will set both `user_creation_timestamp` and `event_timestamp` for the first event in the user's timeline. date-of-logging-table-create-eventWe'll keep userCreationTimestamp to be `MIN(user_registration, first-edit)`. This will set both `user_date-of-logging-table-creation_timestamp`e-event)`, and `event_timestamp` for the fwill also keep userFirst event in the user's timelineEditTimestamp.
Pages
=====
Pages can be created as partial restores of an older page, with older revisions. So the first edit on a page could happen before the page's creation date without anything being erroneous. Since this is part of how mediawiki works, we'd like to highlight it in the data and use `date-of-logging-table-create-event` as the `page_creation_timestamp`. If such a create event doesn't exist, we leave `page_creation_timestamp` null. And we populate `page_first_edit_timestamp` so it can be used instead of creation date where needed. If we find some other data that lets us tell the difference between "old restored first edits" and "edit that created the page", then we can add another field to further clarify. About events, timestamp of a page create event is set to the pageCreationTimestamp, even if it is null, to let analysts know that the page creation is unknown (first edit should however be populated). We might take the MIN(pageCreation, pageFirstEdit) when loading in druid, as druid don't accept NULL timestamps though.
This task can be marked resolved if there's general agreement on the above.