Page MenuHomePhabricator

[wmf.18] enwiki NPP page - no scroll
Closed, ResolvedPublic

Description

The issue is reported on https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Page_Curation&diff=next&oldid=848562822

Go to Special:NewPagesFeed - make sure that the filter selection displays large amount of pages in your feed. Scroll to the bottom of the page - the new portion of feed does not load.

Details

Related Gerrit Patches:
mediawiki/extensions/PageTriage : masterEnsure retrieval and storage of page metadata
mediawiki/extensions/PageTriage : masterAlways retrieve compiled metadata, save with a job in GET requests
mediawiki/extensions/PageTriage : masterAllow deferred writes on GET for pages with missing metadata
mediawiki/extensions/PageTriage : wmf/1.32.0-wmf.18Allow deferred writes on GET for pages with missing metadata
operations/mediawiki-config : masterEnable 'PageTriage' log channel

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 25 2018, 6:41 PM

I was able to briefly reproduce this but no longer can. When I could reproduce it, I didn't see any error in the browser console, but in the Network tab I could see that the API request (e.g. https://en.wikipedia.org/w/api.php?action=pagetriagelist&format=json&namespace=0&showunreviewed=1&dir=newestfirst&limit=20&offset=20180825115736&pageoffset=58264629) was only returning back 6 results on my second scroll down, so PageTriage JS assumed there was no more need to offer additional scrolling. In mediawiki/extensions/PageTriage/modules/ext.pageTriage.models/ext.pageTriage.article.js:

if (response.pagetriagelist.pages && response.pagetriagelist.pages.length > this.apiParams.limit) {
	// Remove the extra page from the list
	response.pagetriagelist.pages.pop();
	this.moreToLoad = true;
} else {
	// We have no more pages to load.
	this.moreToLoad = false;
}

I am sorted with Oldest and am showing Namespace (Article), State (Unreviewed) which has 2846 results as of this posting. I have 7 articles which show. I am on Chrome Version 68.0.3440.87 however if I go on Safari in iOS I get the exact 7 same results and again can't scroll formore results.

Vexations added a subscriber: Vexations.EditedAug 26 2018, 8:59 PM

Firefox 61.0.2 on Mac OSX 10.13.6. Filters set to Unreviewed pages and Show All. Sort By is set to Newest. 2836 pages in your filtered list. I can see the first 20 entries, but can't scroll. When I switch Sort by to Oldest, and hit refresh, I see 5 , but and again, cannot scroll. I get similar results on Chrome and it doesn't seem to matter if I'm logged in or not. The console gives a number of errors: load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:139 JQMIGRATE: Migrate is installed with logging active, version 3.0.1
VM21:141 This page is using the deprecated ResourceLoader module "jquery.ui.widget".
(anonymous) @ VM21:141
VM21:355 This page is using the deprecated ResourceLoader module "mediawiki.api.options".
Use "mediawiki.api" instead.
(anonymous) @ VM21:355
load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140 JQMIGRATE: jQuery(window).on('load'...) called after load event occurred
migrateWarn @ load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140
VM21:136 This page is using the deprecated ResourceLoader module "jquery.ui.core".
Please use "mediawiki.ui.button" or "oojs-ui" instead.
mw.loader.implement.css @ VM21:136
load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140 JQMIGRATE: jQuery.fn.unbind() is deprecated
migrateWarn @ load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140
load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140 JQMIGRATE: jQuery.fn.bind() is deprecated
migrateWarn @ load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140
load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140 JQMIGRATE: jQuery.fn.hover() is deprecated
migrateWarn @ load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140

@Barkeep49 @Vexations thank you for this. We're looking into it. If you don't mind, can you see if you can reproduce this on https://test.wikipedia.org/wiki/Special:NewPagesFeed ? (I'm unable to reproduce there.) I have some ideas about what might be causing this and will post notes later.

It works fine for me on the test wiki. I am now, however, only able to get 4 pages when sorting from oldest (2837 pages in the feed) from en-wiki. Here are my console errors fwiw:

load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:139 JQMIGRATE: Migrate is installed with logging active, version 3.0.1
VM572:251 This page is using the deprecated ResourceLoader module "jquery.ui.position".
(anonymous) @ VM572:251
VM572:141 This page is using the deprecated ResourceLoader module "jquery.ui.widget".
(anonymous) @ VM572:141
VM572:547 This page is using the deprecated ResourceLoader module "mediawiki.api.options".
Use "mediawiki.api" instead.
(anonymous) @ VM572:547
load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140 JQMIGRATE: jQuery(window).on('load'...) called after load event occurred
migrateWarn @ load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140
VM572:296 This page is using the deprecated ResourceLoader module "jquery.tipsy".
mw.loader.implement.css @ VM572:296
VM572:136 This page is using the deprecated ResourceLoader module "jquery.ui.core".
Please use OOUI instead.
mw.loader.implement.css @ VM572:136
load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140 JQMIGRATE: jQuery.fn.unbind() is deprecated
migrateWarn @ load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140
load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140 JQMIGRATE: jQuery.fn.bind() is deprecated
migrateWarn @ load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140
load.php?debug=false&lang=en&modules=jquery&skin=vector&version=0nvcwmf:140 JQMIGRATE: jQuery.fn.hover() is deprecated

Scrolling works in the test wiki set to Unreviewed and Show all, no boxes checked for predictions. I have no problems getting scrolling past 500 entries regardless of whether I sort by Oldest or Newest.

Some notes, no answers yet:

  1. AFAICT, this is not a problem with any of the JavaScript changes, but something to do with the API and database queries returning incorrect results. You can verify this by opening your browser's console, looking at the network tab, and you'll see that PageTriage's List API is returning fewer than 20 results when sorting by Oldest.
  2. This query shows articles in PageTriage older than 2010. You can see that the content you see the default filters of https://en.wikipedia.org/wiki/Special:NewPagesFeed, if you select "Sort by = Oldest", is contained in the results of the query. But there are several entries missing, API PageTriage List is only returning ~15 items.
  3. At first I thought the metadata for a particular item might be missing (possibly related to the changes in T154719: PageTriage opens master connection on GET for ArticleMetadata cache misses), but the query in the above link shows the metadata is present. And you can confirm this by calling the PageTriage List API with a page ID for one of the items that should be visible in the UI and isn't, for example this one and you can see that 1) metadata is present and 2) the PageTriage List API returns it.
  4. The fact that PageTriage appears to work fine in Test and is not working in enwiki points to a configuration issue, but the only difference I see in configuration is wgPageTriageEnableOresFilters which shouldn't have an impact
  5. It's odd that the PageTriage Stats API appears to work properly but the List API is not returning results.
kostajh added subscribers: SBisson, Catrope.EditedAug 27 2018, 2:30 AM

Looking at it some more, it seems like missing metadata is the most likely cause, and that was probably introduced by T154719, although I'm not sure why the logging statement added there isn't registering any warnings.

This query mimics what PageTriage does when its List API module is called. The IDs in the IN() subquery are returned from ApiPageTriageList::getPageIds():

SELECT ptrp_page_id  FROM `pagetriage_page` INNER JOIN `page` ON ((ptrp_page_id = page_id))   WHERE page_is_redirect = '0' AND page_namespace = '0'  ORDER BY ptrp_created ASC, ptrp_page_id ASC LIMIT 21

So, ApiPageTriageList::getPageIds() returns the correct IDs to load data for, but then ArticleMetadata::getMetadata(), when passed these IDs, is not consistently returning data for all of them. For example, this ID (because of this edit) is in the list returned getPageIds(), but no metadata is returned.

ApiPageTriageList::execute() tries to return metadata for all items from getPageIds(), but instead of returning 20 items it returns, say, 15. PageTriage's JS looks to see if fewer than 20 items are returned, and if so, it doesn't offer to load any more data when the user scrolls down.

Prior to T154719, ArticleMetadata::getMetadata() would load and compile metadata in a web request if it couldn't be found, so that's probably why we didn't see this problem before. So, a couple of notes to review with @Catrope and @SBisson tomorrow:

  1. It seems incorrect that ApiPageTriageList::getPageIds() can return IDs for pages that don't have saved metadata.
  2. Since metadata isn't getting saved to the database for some pages in the queue, we probably need to take another look at the hooks PageTriage implements and make sure we're not missing a spot where we should be compiling metadata
  3. In ArticleMetadata::getMetadata(), we have this code:
// Compile the metadata if it is still not available, and if in context of a POST
// request.
if ( $articles ) {
	$acp = ArticleCompileProcessor::newFromPageId( $articles, false, DB_REPLICA );
	if ( $acp && RequestContext::getMain()->getRequest()->wasPosted() ) {
		$pageData += $acp->compileMetadata();
	}

Could we do something like this?

// Compile the metadata if it is still not available, and if in context of a POST
// request.
if ( $articles ) {
	$acp = ArticleCompileProcessor::newFromPageId( $articles, false, DB_REPLICA );
	if ( $acp && RequestContext::getMain()->getRequest()->wasPosted() ) {
		$pageData += $acp->compileMetadata();
	}
	if ( $acp && !RequestContext::getMain()->getRequest()->wasPosted() ) {
		$pageData += $acp->compileMetadata( self::JOB_QUEUE );
	}
}

We'd then modify compileMetadata() to register a job for saving the metadata to the DB -- most of what compileMetadata() currently does is select data from the replica.

Change 455554 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/PageTriage@master] Always retrieve compiled metadata, save with a job in GET requests

https://gerrit.wikimedia.org/r/455554

kostajh claimed this task.Aug 27 2018, 1:15 PM
kostajh triaged this task as High priority.
kostajh edited projects, added Growth-Team (Current Sprint); removed Growth-Team.
kostajh moved this task from Incoming to In Progress on the Growth-Team (Current Sprint) board.

Change 455581 had a related patch set uploaded (by Sbisson; owner: Sbisson):
[operations/mediawiki-config@master] Enable 'PageTriage' log channel

https://gerrit.wikimedia.org/r/455581

Change 455604 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/PageTriage@master] Allow deferred writes on GET for pages with missing metadata

https://gerrit.wikimedia.org/r/455604

Change 455608 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/PageTriage@wmf/1.32.0-wmf.18] Allow deferred writes on GET for pages with missing metadata

https://gerrit.wikimedia.org/r/455608

Change 455581 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable 'PageTriage' log channel

https://gerrit.wikimedia.org/r/455581

Mentioned in SAL (#wikimedia-operations) [2018-08-27T18:34:46Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Enable PageTriage log channel (T202815) (duration: 00m 57s)

Change 455608 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@wmf/1.32.0-wmf.18] Allow deferred writes on GET for pages with missing metadata

https://gerrit.wikimedia.org/r/455608

Mentioned in SAL (#wikimedia-operations) [2018-08-27T18:56:08Z] <catrope@deploy1001> Synchronized php-1.32.0-wmf.18/extensions/PageTriage/includes/ArticleMetadata.php: Allow deferred writes on GET for pages with missing metadata (T202815) (duration: 00m 47s)

@Barkeep49 @Vexations it should be OK now, would you mind confirming please? Thanks for your help with reporting the issue and providing debug info!

Change 455604 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Allow deferred writes on GET for pages with missing metadata

https://gerrit.wikimedia.org/r/455604

Change 455554 abandoned by Kosta Harlan:
Always retrieve compiled metadata, save with a job in GET requests

Reason:
Follow up patch coming tomorrow

https://gerrit.wikimedia.org/r/455554

Checked the fix in enwiki (wmf.18) - the additional data set is fetched successful with scrolling to the bottom for both - newest and oldest - sorting order.

Etonkovidova closed this task as Resolved.Aug 27 2018, 9:52 PM

It works now; I can scroll through the entire feed from Newest as well as Oldest without any issues. Thanks!

Change 455870 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/extensions/PageTriage@master] Ensure retrieval and storage of page metadata

https://gerrit.wikimedia.org/r/455870

Change 455870 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Ensure retrieval and storage of page metadata

https://gerrit.wikimedia.org/r/455870