Once we have determined in T205762 what other schemas we want use, we need to dig deeper to make sure that the business logic for them will allow us to use them. In particular:
- Are they still running and active?
- Do they have sampling logic that does not record 100% of events for Czech and Korean Wikipedias?
- Are they keyed on IDs that will allow us to join them in?
* Do they have timestamps that we can use for analysis? (Yes, all of them do.)
- Are they writing to Hive? We learned that not all schemas are writing to Hive, which has to do with the differences in MariaDB and Hive column names and datatypes. We will need to reconcile or work around this
Here are those questions as a table to fill in like a checklist:
|Echo||No (see notes)||N/A||recipientUserId||100%|
|Edit||Yes||No (see notes)||user.id (see notes)||6.25%, but soon to be configurable to 100%|
|GettingStartedRedirectImpression||Yes||Yes||userId||100% of logged-in users, nothing from anonymous users|
|ServerSideAccountCreation||Yes||Yes||userId||100% in theory, but note "There is no guarantee this will be called in a successful account creation process"|
- Echo schema: the echo_event and echo_notification tables in MW can get us data we need.
- Edit schema: not currently in Hive, but we are working on that in T202348. The user.id field will then be renamed.