Understanding first day: testing and QA
Closed, ResolvedPublic

Description

This task is for tracking how we test/QA the new EditorJourney schema, as well as our ability to use its data along with the data from other relevant schemas. If we find bugs, we should file those separately. On this task, we'll just discuss issues related to our ability to test.

These are our plans:

  • @Etonkovidova will test in Korean and English Beta Labs.
    • Testing in mobile to make sure that things are recorded correctly and the is_mobile flag is set to true.
    • Looking for whether events are recorded from the EditorJourney schema according to the business rules laid out on the schema talk page.
    • In particular, it will be important to ensure that the right URLs are and are not being obfuscated according to these namespaces:
      • Obfuscated
        • Article (0)
        • Article talk (1)
        • File (6)
        • File talk (7)
        • Portal (100)
        • Portal (101)
        • Draft (118)
        • Draft talk (119)
      • Not obfuscated (all others, including the following)
        • Help (12)
        • Help talk (13)
        • Wikipedia (4)
        • Wikipedia talk (5)
        • User (2)
        • User talk (3)
        • Special (-1)
        • [all others] e.g. Template (10)
    • Verifying that this is only recording events for accounts less than 24 hours old. Events should stop after 24 hours.
    • Verify that we log events when someone goes to change their email in Preferences.
    • Verify that we log events for when someone does an action from the View History page of an article.
    • It would probably be good to test out this sequence of activities, to make sure that it can be reconstructed from the events. Here is an example sequence.
      • 1) User creates account from editing context.
      • 2) After account creation, user lands back on article in editing context. URL should be obfuscated with a hash, but the action should specify "edit".
      • 3) User clicks a link in the article and see another article. URL should be obfuscated with a different hash.
      • 4) User clicks back button to go back to first article. URL should be obfuscated with the same hash as before.
      • 5) User clicks "Help" in the left nav. URL should not be obfuscated.
      • 6) User searches the title of the article they were originally on. URL should be obfuscated with the same hash as before.
      • 7) User clicks "View history". Event should include a "history" action.
      • 8) User clicks on a username in the history list and goes to a User page. URL should not be obfuscated.
      • 9) User clicks back and returns to article, and then clicks "Talk". URL should be obfuscated with a different hash than the "Article" page.
  • @nettrom_WMF will also test in Korean and English Beta Labs.
    • Focusing on whether the events recorded are usable for answering the questions listed on T205758.
    • Also verifying that the new schema is usable with the other schemas we need to use in conjunction with it to give us the full picture of the user's journey.
    • Can use the queries from T206182 to do test driven development.
  • Other engineers from the team can also help test.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 17 2018, 5:22 PM

We'd like to test our instrumentation on the Beta Cluster. Here's a relevant link with information about testing EventLogging on the Beta Cluster.

JTannerWMF moved this task from Inbox to Q2 2018-19 on the Growth-Team board.Oct 24 2018, 4:45 PM
MMiller_WMF updated the task description. (Show Details)Oct 24 2018, 10:40 PM
MMiller_WMF added subscribers: Etonkovidova, Catrope, kostajh and 4 others.
Restricted Application added a subscriber: revi. · View Herald TranscriptOct 24 2018, 10:40 PM
MMiller_WMF updated the task description. (Show Details)Nov 7 2018, 12:00 AM

@kostajh says we'll be able to test on Test Wiki as of 9 AM PT on 2018-11-07 (tomorrow). That will be good because then events will stream into Hadoop, which will be easier to query and test with.

MMiller_WMF updated the task description. (Show Details)Nov 8 2018, 9:19 PM
MMiller_WMF updated the task description. (Show Details)Nov 9 2018, 12:16 AM
MMiller_WMF added a comment.EditedNov 9 2018, 1:18 AM

@kostajh -- I did testing yesterday and today, and found some potential issues. I went over them with @Etonkovidova, @nettrom_WMF, and @Catrope. Here was my process:

  1. Yesterday, I created a new account in Test Wiki (MMiller Test 01 test).
  2. I did about 25 things, and recorded all the things I clicked on with timestamps.
  3. Then this morning, I looked in Hadoop for the events, and compared to see if everything ended up in the database.

There were some discrepancies that we should go over together in my spreadsheet. Here are the issues at a high level:

  • We are treating the Main Page as a normal article page and obfuscating its information. That might cause some unneeded challenges in our analyses later, when we want to see if people visited the homepage. Since it's not a sensitive page, we might want to special-case it so it doesn't get hashed.
  • There are a couple times when an event seems to be recorded twice.
  • We think redirects are being treated in a somewhat weird way. Some of the fields record the identifiers of the link that was clicked on, and some record the identifiers of the destination. We should discuss.
  • Something weird happened when I clicked on "Recent Changes".
  • No event is recorded for logging out, but we would like to know when users log out.
Etonkovidova updated the task description. (Show Details)Nov 10 2018, 2:46 AM

Comments on the items marked with
(1)

Testing in mobile to make sure that things are recorded correctly and the is_mobile flag is set to true.

So far, log db in betalbs did not record any of mobile events except of View. There is ongoing testing in testwiki - wating updates from @MMiller_WMF.

(2)

In particular, it will be important to ensure that the right URLs are and are not being obfuscated according to these namespaces:
Portal (100)
Portal (101)
Draft (118)
Draft talk (119)

In betalabs log db:

[log]> select  event_namespace, event_page_title from  EditorJourney_18504997  where  event_namespace in (100, 101, 118, 119);
+-----------------+------------------------------------------------------+
| event_namespace | event_page_title                                     |
+-----------------+------------------------------------------------------+
|             100 | Portal:Featured content                              |
|             100 | Portal:Featured content                              |
|             100 | Portal:Featured content                              |
|             100 | Portal:Featured content                              |
|             100 | Portal:Featured content                              |
|             118 | Draft:ET2 test1 with Declined category               |
|             119 | Creating Draft talk:ET2 test1 with Declined category |
|             119 | Creating Draft talk:ET2 test1 with Declined category |
|             119 | Draft talk:ET2 test1 with Declined category          |
|             119 | Draft talk:ET2 test1 with Declined category          |
|             100 | Portal:Cats                                          |
|             101 | Creating Portal talk:Cats                            |
|             101 | Creating Portal talk:Cats                            |
|             101 | Portal talk:Cats                                     |
|             101 | Portal talk:Cats                                     |
+-----------------+------------------------------------------------------+
15 rows in set (0.02 sec)

Current status on my end is that I have not found any significant issues with the data.

Etonkovidova updated the task description. (Show Details)Nov 13 2018, 4:50 PM
Etonkovidova updated the task description. (Show Details)Nov 13 2018, 5:26 PM

@kostajh, @Catrope, @Etonkovidova, @nettrom_WMF, and I discussed this today. These are the remaining action items blocking release:

  • @kostajh is working on a patch that does these two things:
    • Logs when a user logs out.
    • Makes the "Main Page" into a special case that is not obfuscated.
  • @kostajh will look into why the Portal, Portal talk, Draft, and Draft talk namespaces are not being obfuscated.
  • @kostajh will look into @nettrom_WMF's issue that he filed about a title hashing issue (T209401).
  • @kostajh will address some of the edge cases in which Special pages, such as Special:RecentChanges get logged with the wrong information. The proposed solution makes the namespace -1 and the page_id 0.
  • @Etonkovidova will check how logging goes when a user tries to edit a semi-protected or protected page.
  • @nettrom_WMF and @kostajh will look into why a certain event got recorded twice, with exactly the same content, but three seconds apart. The event in question was when I clicked to edit/create my User Talk page as User:MMiller Test 01 test.

@nettrom_WMF and @kostajh will look into why a certain event got recorded twice, with exactly the same content, but three seconds apart. The event in question was when I clicked to edit/create my User Talk page as User:MMiller Test 01 test.

I looked at this by pulling up https://www.mediawiki.org/w/index.php?title=User:KHarlan_(WMF)/Blah and looking at network requests after clicking "Edit". I end up with two network requests a few seconds apart with the request URL of https://www.mediawiki.org/w/index.php?title=User:KHarlan_(WMF)/Blah&action=edit. The first one is MediaWiki returning the edit page, the second one is generated by VisualEditor which makes another request to https://www.mediawiki.org/w/index.php?title=User:KHarlan_(WMF)/Blah&action=edit during its load process. I don't think there's anything we can do about this other than to account for it in analysis.

Re

The event in question was when I clicked to edit/create my User Talk page as User:MMiller Test 01 test.

I just saw exactly the same issue:

 select event_action, event_title, event_path, event_page_title, event_page_id, event_namespace, timestamp  from EditorJourney_18504997 where event_user_id=15541 and event_action='edit' and event_page_title like 'Creating Talk%' \G
*************************** 1. row ***************************
    event_action: edit
     event_title: 6c37f3c5ce600d01e1d237fb79d20dc26a13c62f53ad5871d731159a21f[...]
      event_path: /w/index.php
event_page_title: Creating Talk:96a3549424148ebe70521d68c46ea52f6073072c144b1e298efe6f7[...]
   event_page_id: 0
 event_namespace: 1
       timestamp: 20181113170305
*************************** 2. row ***************************
    event_action: edit
     event_title: 6c37f3c5ce600d01e1d237fb79d20dc26a13c62f53ad5871d7[....]
      event_path: /w/index.php
event_page_title: Creating Talk:96a3549424148ebe70521d68c46ea52f6073072c144b1e298efe6f73e4544[...]
   event_page_id: 0
 event_namespace: 1
       timestamp: 20181113170308
2 rows in set (0.07 sec)

Change 473290 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@master] Exclude Main_Page from hashing, fix namespace, log userLogout events

https://gerrit.wikimedia.org/r/473290

  • @nettrom_WMF and @kostajh will look into why a certain event got recorded twice, with exactly the same content, but three seconds apart. The event in question was when I clicked to edit/create my User Talk page as User:MMiller Test 01 test.

I checked the EditAttemptStep schema regarding this, and it has three events stored around the same timestamps for each of the three stages of the editor's loading: init, loaded, and ready. These are all part of the same single edit. As @kostajh points out, the VisualEditor makes two requests, so we'll see those in EditorJourney. I'm sure I can figure out some way to combine the two data sources to account for this in our analysis.

Etonkovidova added a comment.EditedNov 13 2018, 11:17 PM

@nettrom_WMF
(1) while I was checking how actions on protected pages are recorded, I saw that viewing source is counted as edit, e.g.

event_action: edit
event_title: bb01a035ecef3f153f14715e7eeb49fc0a4e217ef[...]
event_path: /w/index.php
event_page_title: View source for bb01a035ecef3f153f14715e7eeb49fc0a4e217ef[...]

(2) counting distinct timestamp shows definite discrepancy between number of total events and distinct timestamps:

[log]> select count(*)  from EditorJourney_18504997 where event_user_id=15541\G
*************************** 1. row ***************************
count(*): 67
1 row in set (0.10 sec)

MariaDB [log]> select count(distinct(timestamp))  from EditorJourney_18504997 where event_user_id=15541\G
*************************** 1. row ***************************
count(distinct(timestamp)): 53
1 row in set (0.02 sec)

(3) event_permission_error is indiscriminately given for many 'view' actions in event_namespace 0, -1, 4

[log]> select event_action, event_title, event_permission_errors  from EditorJourney_18504997 where event_user_id=15542;
+--------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
| event_action | event_title                                                                                                                      | event_permission_errors              |
+--------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
| view         | CreateAccount                                                                                                                    | badaccess-group0,ns-specialprotected |
| view         | 0502dcc361c9559f1[...] | badaccess-group0                     |
| view         | 0502dcc361c955[...]| badaccess-group0                     |
| view         | 0502dcc361c955[...] | badaccess-group0                     |
| view         | Preferences                                                                                                                      | badaccess-group0,ns-specialprotected |
| view         | Preferences                                                                                                                      | badaccess-group0,ns-specialprotected |
| view         | 0502dcc361c9559f16f5166b[...] | badaccess-group0                     |
| view         | Preferences                                                                                                                      | badaccess-group0,ns-specialprotected |
| view         | RecentChanges                                                                                                                    | badaccess-group0,ns-specialprotected |
| view         | Watchlist                                                                                                                        | badaccess-group0,ns-specialprotected |
| view         | Contributions/ET155                                                                                                              | badaccess-group0,ns-specialprotected |
| view         | Contributions/ET155                                                                                                              | badaccess-group0,ns-specialprotected |
| view         | Contributions/ET155&group=exp1 group1                                                                                            | badaccess-group0,ns-specialprotected |
| view         | 0502dcc361c9559f[...] | badaccess-group0                     |
| view         | 0502dcc361c955[...] | badaccess-group0                     |
| view         | 8df18a5460737ef[...]                  |
| view         | New user landing page                                                                                                            | badaccess-group0                     |
| view         | New user landing page                                                                                                            | badaccess-group0                     |
| view         | Search                                                                                                                           | badaccess-group0,ns-specialprotected |
| view         | NewPagesFeed                                                                                                                     | badaccess-group0,ns-specialprotected |
+--------------+----------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+

I just filed an issue at T209454, in which I found an edge case where page titles are not being obfuscated on mobile.

Change 473290 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Exclude Main_Page from hash, configurable namespace, handle logout

https://gerrit.wikimedia.org/r/473290

Change 473653 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[operations/mediawiki-config@master] Configure sensitive namespaces for EditorJourney schema

https://gerrit.wikimedia.org/r/473653

Change 473742 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[operations/mediawiki-config@master] Configure sensitive namespaces for EditorJourney schema

https://gerrit.wikimedia.org/r/473742

Change 473748 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Exclude Main_Page from hash, configurable namespace, handle logout

https://gerrit.wikimedia.org/r/473748

Change 473653 merged by jenkins-bot:
[operations/mediawiki-config@master] Beta labs: Configure sensitive namespaces for EditorJourney schema

https://gerrit.wikimedia.org/r/473653

Change 473811 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@master] Handle sanitizing sensitive namespace content from redirects

https://gerrit.wikimedia.org/r/473811

Change 473748 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Exclude Main_Page from hash, configurable namespace, handle logout

https://gerrit.wikimedia.org/r/473748

Change 473742 merged by jenkins-bot:
[operations/mediawiki-config@master] Configure sensitive namespaces for EditorJourney schema

https://gerrit.wikimedia.org/r/473742

Mentioned in SAL (#wikimedia-operations) [2018-11-15T19:51:11Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Configure sensitive namespaces for EditorJourney schema (T207307) (duration: 00m 53s)

Change 473836 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@master] Exclude users where getRegistration() returns null

https://gerrit.wikimedia.org/r/473836

Change 473836 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Exclude users where getRegistration() returns null

https://gerrit.wikimedia.org/r/473836

Change 473861 had a related patch set uploaded (by Catrope; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Exclude users where getRegistration() returns null

https://gerrit.wikimedia.org/r/473861

Change 473861 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Exclude users where getRegistration() returns null

https://gerrit.wikimedia.org/r/473861

Change 474303 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@master] Override getUser for improved handling of user logout events

https://gerrit.wikimedia.org/r/474303

Change 473811 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Handle sanitizing sensitive namespace content from redirects

https://gerrit.wikimedia.org/r/473811

Change 474303 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Override getUser for improved handling of user logout events

https://gerrit.wikimedia.org/r/474303

@kostajh : Good news: Logout events are now captured by the schema! Bad news: user_id = 0 for all events, so we don't know which user logged out.

Is that in beta labs or in production? We will swat the fix to production on Monday so the logout events are captured.

Oh, my bad, sorry! Should've asked first. I somehow ended up thinking that it was in production since I was seeing the logout events in the data.

Change 474687 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Handle sanitizing sensitive namespace content from redirects

https://gerrit.wikimedia.org/r/474687

Change 474688 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Override getUser for improved handling of user logout events

https://gerrit.wikimedia.org/r/474688

Change 474687 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Handle sanitizing sensitive namespace content from redirects

https://gerrit.wikimedia.org/r/474687

Change 474688 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.33.0-wmf.4] Override getUser for improved handling of user logout events

https://gerrit.wikimedia.org/r/474688

Mentioned in SAL (#wikimedia-operations) [2018-11-19T19:44:32Z] <catrope@deploy1001> Synchronized php-1.33.0-wmf.4/extensions/WikimediaEvents/: EditorJourney fixes (T207307) (duration: 00m 46s)

Etonkovidova moved this task from QA to Needs PM Review on the Growth-Team (Current Sprint) board.
MMiller_WMF closed this task as Resolved.

Thank you. We accomplished this.