Page MenuHomePhabricator

New Pages Feed: incorporate draft backlog into the feed (1.3)
Closed, ResolvedPublic5 Story Points

Description

Together with the work in T195545 and T195924, this task is part of the first useful feature change we can roll out for the New Pages Feed. The work in the task is needed to accomplish these user stories:

  • As a reviewer, I need to filter to all drafts (not only those submitted to AfC).
  • As a reviewer, I need to be able to filter to only those drafts that have been submitted to AfC and are awaiting review. This would include drafts that are awaiting their second, third, etc. review, but it would exclude drafts that have been submitted for review, have already been reviewed, and awaiting resubmission by their authors.

The work in T195545 and T195924 will only apply to new drafts going forward. Therefore, the work in this task is to make the New Pages Feed include existing drafts the same way that it will include future drafts. The reason this is important is because AfC reviewers need the New Pages Feed and the improvements we're making to it in order to prioritize and make progress on their existing backlog. Without incorporating legacy drafts into the New Pages Feed, AfC reviewers will have to use both their new and old system to review, and it will be difficult for reviewers to switch to the new system without worrying that old drafts will fall through the cracks.

Specifically:

  • At the time the addition of drafts to the New Pages Feed is rolled out, the New Pages Feed should contain all draft pages as they exist at that time, regardless of when they were created or submitted to AfC.
  • Those legacy drafts should be sortable, filterable, and displayed just as all new drafts will be -- as if drafts had been incorporated since the beginning of the New Pages Feed.
  • ORES and copyvio scores will be needed for those legacy drafts (see T196178 and T193809 for more information).

Event Timeline

MMiller_WMF renamed this task from New Pages Feed: incorporate draft backlog into the feed to New Pages Feed: incorporate draft backlog into the feed (1.3).Jun 1 2018, 5:59 PM
MMiller_WMF created this task.
Niharika set the point value for this task to 5.Jun 5 2018, 11:44 PM
Samwilson moved this task from Ready to In Development on the Community-Tech-Sprint board.

Change 442777 had a related patch set uploaded (by Samwilson; owner: Samwilson):
[mediawiki/extensions/PageTriage@master] Add maintenance script to backfull Draft NS pages in the triage queue

https://gerrit.wikimedia.org/r/442777

The first iteration of this is ready for review. We'll probably want to change it later to be able to rebuilt tags for Draft pages that are already in the triage queue (when next a new tag is introduced, I mean). But for the current purposes, I think including just the missing Drafts is the way to go.

• Vvjjkkii renamed this task from New Pages Feed: incorporate draft backlog into the feed (1.3) to utbaaaaaaa.Jul 1 2018, 1:06 AM
• Vvjjkkii removed Samwilson as the assignee of this task.
• Vvjjkkii triaged this task as High priority.
• Vvjjkkii updated the task description. (Show Details)
• Vvjjkkii removed the point value for this task.
• Vvjjkkii removed subscribers: gerritbot, Aklapper.
MusikAnimal renamed this task from utbaaaaaaa to New Pages Feed: incorporate draft backlog into the feed (1.3).Jul 1 2018, 1:54 AM
MusikAnimal assigned this task to Samwilson.
MusikAnimal set the point value for this task to 5.
MusikAnimal added a subscriber: Aklapper.
MusikAnimal raised the priority of this task from High to Needs Triage.Jul 1 2018, 1:56 AM
MusikAnimal updated the task description. (Show Details)

Change 442777 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Add maintenance script to backfill Draft NS pages into the triage queue

https://gerrit.wikimedia.org/r/442777

I believe we still need to run this in enwiki, do we keep this task open until that's relevant, or do we want to have a reminder to run the script in the deployment ticket for Growth team when they enable it there? @Catrope what do you think about this? You guys will likely have to run/deal with this when the features are rolled out to enwiki.

@Mooeypoo -- has the script been run on any wikis at this point, such that we could test out whether it adds everything?

Mooeypoo added a comment.EditedJul 30 2018, 3:22 PM

It hasn't yet. It's ready to, but it needs a place where you had drafts before afc. From our investigation, that's true for one page on beta, and I'm not sure if there are such pages on test.

We can try and set up a specific test on our standalone wiki, if needed.

There's plenty of drafts on testwiki that aren't in the feed. Here's a few - 1, 2, 3.

Thanks, @Niharika. @Mooeypoo -- could you run this on testwiki? Then when you've done that, @Etonkovidova can check that all the drafts made it into the feed.

Moving to "In Development" so that the script can be run.

The script's been run -

1thcipriani@mwmaint1001:~$ mwscript extensions/PageTriage/maintenance/populateDraftQueue.php --wiki=testwiki
2Processing drafts in NS 118...
3- batch 1
4Complete; 7 drafts processed.

The drafts I linked to earlier now appear in the feed.

@Etonkovidova can look this over now.

Thanks @Niharika.

So @Etonkovidova, I think the thing to check is that every draft page in the Test wiki should now be in the feed, regardless of how long ago it was created.

A bit of manual work, but here's the list of draft pages: https://test.wikipedia.org/wiki/Special:PrefixIndex?prefix=&namespace=118

We can also confirm all drafts are known to PageTriage with:

mysql:research@analytics-store.eqiad.wmnet [testwiki]> SELECT COUNT(page_id)
    -> FROM page
    -> WHERE page_namespace = 118
    -> AND NOT EXISTS(
    ->   SELECT 1
    ->   FROM pagetriage_page
    ->   WHERE ptrp_page_id = page_id
    -> );
+----------------+
| COUNT(page_id) |
+----------------+
|              0 |
+----------------+
1 row in set (0.00 sec)

though that just tells us the data is there, not if tags are correct, etc.

@SBisson @Niharika @MusikAnimal -- I am checking things out in testwiki now, and it looks like not all drafts are there. Leon's link says that there should be 19 drafts in the list, but I am only seeing 12. Niharika's example drafts are not there. Is it because a new deploy happened or something, and so now we would need to re-run the script?

@MMiller_WMF That's odd. The drafts were indeed not there. I re-ran the script and they're back in now but we should keep an eye on it for a bit. I see 37 pages in my list when I use AfC and filter by All.

There is currently 37 unreviewed drafts on testwiki. That's consistent with what we see in the UI.

mysql:research@s3-analytics-slave [testwiki]> select count(page_id) from pagetriage_page join page on ptrp_page_id=page_id where page_namespace =118 and page_is_redirect=0 and ptrp_reviewed=0;
+----------------+
| count(page_id) |
+----------------+
|             37 |
+----------------+
1 row in set (0.00 sec)

Out of those, 6 don't have the afc_state tag.

mysql:research@s3-analytics-slave [testwiki]> select page_id, page_title, page_touched from pagetriage_page join page on ptrp_page_id=page_id where page_namespace =118 and page_is_redirect=0 and ptrp_reviewed=0 and page_id not in ( select page_id from pagetriage_page join page on ptrp_page_id=page_id join pagetriage_page_tags on ptrpt_page_id=page_id where page_namespace=118 and ptrpt_tag_id=18 ) order by page_title;
+---------+-----------------+----------------+
| page_id | page_title      | page_touched   |
+---------+-----------------+----------------+
|  100315 | Foo_bar_12345   | 20180712191237 |
|  100293 | Foo_bar_foo_bar | 20180711212842 |
|  100301 | Test_Draft_001  | 20180711221858 |
|  100302 | Test_Draft_002  | 20180712082949 |
|  100303 | Test_Draft_003  | 20180712183204 |
|  100304 | Test_Draft_004  | 20180712183143 |
+---------+-----------------+----------------+
6 rows in set (0.00 sec)

I've edited Draft:Foo_bar_12345. It forced recompilation and initialization of the afc_state tag. It now shows up in Unsubmitted.

The backfill maintenance script is not designed to detect that some drafts are in the PageTriage queue but don't have the tag initialized. On testwiki we can just edit those 6 (now 5) pages to force recompilation but on enwiki production we'll have to make sure steps are executed in the right order so that all drafts get correctly initialized with the tag. Those steps are:

  1. Make sure NO drafts are in the queue to start with
  2. Create the afc_state tag by running PageTriage/sql/PageTriageTagsPatch-AfC.sql (does it need to be done by a dba?)
  3. Run maintenance/populateDraftQueue.php
  4. Make sure ALL drafts in the queue have the sfc_state tag

I've edited the following pages manually (to force recompilation) and they seem to show up correctly based on their state.

Draft:Foo_bar_12345
Draft:Foo_bar_foo_bar
Draft:Test_Draft_001
Draft:Test_Draft_002
Draft:Test_Draft_003
Draft:Test_Draft_004

There is currently 37 unreviewed drafts on testwiki. That's consistent with what we see in the UI.

And they all are shown on Special:NewPagesFeed.

Everything else seems fine too.

MMiller_WMF closed this task as Resolved.Aug 2 2018, 7:53 PM
MMiller_WMF moved this task from QA to Q1 2018-19 on the Community-Tech-Sprint board.

Okay, this looks good to me. Since we're going to need to run this script in English Wikipedia when we go to production, I am resolving this and I created a new task for running it in English Wikipedia: T201087

Niharika added a comment.EditedAug 3 2018, 5:34 PM

The feed is down to 33 articles now and the examples I linked to are no longer in the feed. I think Elena found the bug: T201098#4477072.

EDIT: No, the database is consistent with the UI:

wikiadmin@10.64.32.136(testwiki)> select count(page_id) from pagetriage_page join page on ptrp_page_id=page_id where page_namespace =118 and page_is_redirect=0 and ptrp_reviewed=0;
+----------------+
| count(page_id) |
+----------------+
|             33 |
+----------------+

That bug might be separate.

@Niharika -- but there are definitely drafts missing from the feed. @MusikAnimal's search of all Draft pages (https://test.wikipedia.org/wiki/Special:PrefixIndex?prefix=&namespace=118&hideredirects=1) shows 42 drafts in test wiki, and there are only 32 in the feed. For instance, this one is missing: https://test.wikipedia.org/wiki/Draft:David_Putman

Why do you think they keep disappearing?

@Niharika -- are you saying that those 10 drafts have been patrolled, so every time we run the script they briefly appear in the feed, but then disappear again?

@Niharika -- are you saying that those 10 drafts have been patrolled, so every time we run the script they briefly appear in the feed, but then disappear again?

That seems to be the likely explanation. I don't know where the bug lies but it's not with the backfill script.