Page MenuHomePhabricator

Investigation: Number of unreviewed pages on English Wikipedia skyrocketing in last few months
Closed, ResolvedPublic5 Story Points

Description

See Kudpung's chart:

No one seems to know exactly why this is the case, but we should make sure it isn't due to a software change. Is there any reason why either:

  • more articles would be set as unreviewed than previously (for example, we include redirects when we didn't before [purely hypothetical])
  • the page patrolling interfaces aren't working correctly (for example, reviewers are not able to successfully mark pages as reviewed)
  • or, the statistics are wrong (never trust your assumptions)

The two page patrolling interfaces are Special:NewPages and Special:NewPagesFeed.

One possible cause is https://gerrit.wikimedia.org/r/#/c/272142/, which switched around a bunch of the UI elements in the reviewing interface, including the "mark as reviewed" button, replacing it with a "skip" button instead.[1][2] (This has since been reverted.)

Are there any other changes to the software that could have caused such a dramatic rise in unreviewed pages?

Event Timeline

kaldari created this task.Sep 29 2016, 12:32 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 29 2016, 12:32 AM
kaldari updated the task description. (Show Details)Sep 29 2016, 12:33 AM

Good catch on the icon change. That went into production in February, and while it has been reverted, that only happened last week. So if that was the cause, we wouldn't see the effect of the revert in Kudpung's graph.

Do we have other dashboards or stats available for tracking:

  • how many pages are getting reviewed
    • In Special:NewPagesFeed (PageTriage)
    • In Special:NewPages
  • Plus, stats over time?
  • Plus, stats of new article creations in general

to compare with the chart above by Kudpung.


We want to determine if the problem is coming from:

  • New pages increasing more rapidly than usual?
  • Use of NewPagesFeed declining?
  • Use of NewPages declining?

Note, see also Samtar's new dashboard for NewPagesFeed http://tools.wmflabs.org/nppdash/

kaldari added a comment.EditedOct 3 2016, 11:11 PM

For the Page Curation/PageTriage interface, it looks like reviews per days increased significantly around October 9th of last year and remained high until crashing around May 20, 2016. The rate remained low after that, and after August 30 the data just stops completely:
https://datasets.wikimedia.org/public-datasets/enwiki/page-curation/pc_logging_actions_daily.csv

It looks like page patrolling levels remained relatively steady throughout that time (although I also couldn't get data for this past August 30). There is a slight downward slope over the past couple years, but no major jumps.

According to https://stats.wikimedia.org/EN/TablesWikipediaEN.htm, the number of new articles created per day has remained relatively steady over the last couple years (although data for September is not yet available).

kaldari added a comment.EditedOct 3 2016, 11:50 PM

In order for new code to have affected the PageTriage review rate on May 20th, it would have had to have been merged between May 10th and May 17th (to get included in the 1.28.0-wmf.2 branch). The only commits made to PageTriage during that time were localization updates: https://phabricator.wikimedia.org/diffusion/EPTR/history/master/. This does not rule out a change in core affecting the rate, however.

I couldn't find any obvious changes in core (between May 10th and 17th) that would have affected this either.

All PageTriage actions are thoroughly logged in the logging table, so it should be possible to figure out if there are any significant differences between the pre-May-20 reviews and the post-May-20 reviews (besides the overall volume change) by carefully reviewing that data and looking for patterns. For example, were there certain reviewers who quit reviewing or significantly reduced their reviewing? Was there a significant change in the types of reviews that were made (i.e. with tags, with deletion)?

To see some example log entries from the database:
select * from logging where log_type = 'pagetriage-curation' LIMIT 10;

Here's a chart showing the sudden decline in reviews via Page Curation:

User:SwisterTwister may have significabtly reduced his patrolls during the sample period on my graph. He had been making an unusually high number of patrols. In the same period some other users have mentioned that they have shifted the focus of their work away from patrolling new pages and several new and/or inexperienced users have been asked to sop patrolling.

Due to the anomaly of two separate user patrol logs (one for Page Curation and one for Twinkle) we currently have no accurate overview of any individual user's patrolling.

Can we have Kaldari's chart with an X axis at 14-day steps since 01 Jan 2016 and a logarithmic line? Perhaps also with an extrapolation to what we can expect by the end of 2016.

MusikAnimal added a comment.EditedOct 4 2016, 1:29 AM

Due to the anomaly of two separate user patrol logs (one for Page Curation and one for Twinkle) we currently have no accurate overview of any individual user's patrolling.

For new pages, both Page Curation and Twinkle (or just clicking "Mark this page as patrolled") will write to the patrol log. The additional "page curation" log I assume was created so we can track redirects that become articles (correct me if I'm wrong). The pagetriage_log table is also in the replica database, so we could JOIN with the normal logging table to get a complete list of patrols by a particular user.

kaldari set the point value for this task to 5.Oct 4 2016, 10:24 PM
kaldari triaged this task as High priority.Oct 4 2016, 10:37 PM
kaldari edited projects, added Community-Tech-Sprint; removed Community-Tech.
kaldari moved this task from Ready to In Development on the Community-Tech-Sprint board.
kaldari closed this task as Resolved.Oct 4 2016, 11:03 PM
kaldari claimed this task.

Number of articles reviewed per day by SwisterTwister:

Case closed.