When analyzing the data related to the Editor Journey project (aka "Understanding first day"), we'll run into challenges and caveats. This task tracks those.
- The Growth Team has a bunch of test accounts that have been used for this survey. I'll have to put together a list of those and filter them out.
- Accounts created through the API, which is mainly app accounts, are assigned a survey group and therefore appears to have gotten the survey. We don't support the apps at the moment, so those need to be filtered out (they can be identified through the ServerSideAccountCreation schema's data).
- T210003 results in inconsistent hashing for the first few events for some users. We'll be able to collect a list of affected user IDs and exclude those from analysis if necessary.
- T210004 results in the CreateAccount event in the EditorJourney schema being out of order, it shows up as the second event but should always be first. Might require me to ignore those.
- T210417 is worth noting, but should not be an issue for us as we're not interested in what specifically users we're reading/editing prior to creating their account.
- T213974 is worth noting. It is currently not an issue for us as we do analysis based on namespace and title. If we're doing analysis where we're examining the full page title, it is necessary to either strip the HTML out or figure a way around it.