Page MenuHomePhabricator

Number of "found articles" is not matching the real number of articles
Open, In Progress, LowPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • Navigate to Special:Homepage in arwiki
  • Select "Link recommendation" as the task type and topics "Africa" + "Food and drink", the result count for found articles is 9.
  • Click on the next arrow to navigate to the next suggestion as many times as possible

See video.

What happens?:

The feed shows the No more suggestions message at the 5th suggestion.

What should have happened instead?:

The feed should show 9 suggestions

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:
TBD

Event Timeline

Hmm, yeah it looks like totalCount is 9 in the API response, but only 3 pages are in the result:

{
	"batchcomplete": true,
	"continue": {
		"ggtoffset": 3,
		"continue": "ggtoffset||"
	},
	"growthtasks": {
		"totalCount": 9,
		"qualityGateConfig": {
			"image-recommendation": {
				"dailyLimit": false,
				"dailyCount": 0
			},
			"link-recommendation": {
				"dailyLimit": false,
				"dailyCount": 0
			}
		}
	},
	"query": {
		"pages": [
			{
				"pageid": 366627,
				"ns": 0,
				"title": "الدستور الغذائي",
				"tasktype": "link-recommendation",
				"difficulty": "easy",
				"order": 2,
				"qualityGateIds": [
					"dailyLimit"
				],
				"qualityGateConfig": {
					"image-recommendation": {
						"dailyLimit": false,
						"dailyCount": 0
					},
					"link-recommendation": {
						"dailyLimit": false,
						"dailyCount": 0
					}
				},
				"token": "ie0kv8cbh8ednhc78q4gmqjig4gbi826",
				"topics": [
					[
						"food-and-drink",
						0.68379104
					]
				],
				"contentmodel": "wikitext",
				"pagelanguage": "ar",
				"pagelanguagehtmlcode": "ar",
				"pagelanguagedir": "rtl",
				"touched": "2022-04-28T13:48:29Z",
				"lastrevid": 55228549,
				"length": 6121,
				"protection": [],
				"restrictiontypes": [
					"edit",
					"move"
				],
				"revisions": [
					{
						"revid": 55228549,
						"parentid": 54691052
					}
				],
				"thumbnail": {
					"source": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Schematorganizacyjny.png/332px-Schematorganizacyjny.png",
					"width": 332,
					"height": 237
				},
				"original": {
					"source": "https://upload.wikimedia.org/wikipedia/commons/d/d9/Schematorganizacyjny.png",
					"width": 1255,
					"height": 896
				},
				"pageimage": "Schematorganizacyjny.png"
			},
			{
				"pageid": 1533675,
				"ns": 0,
				"title": "إينجيرا",
				"tasktype": "link-recommendation",
				"difficulty": "easy",
				"order": 0,
				"qualityGateIds": [
					"dailyLimit"
				],
				"qualityGateConfig": {
					"image-recommendation": {
						"dailyLimit": false,
						"dailyCount": 0
					},
					"link-recommendation": {
						"dailyLimit": false,
						"dailyCount": 0
					}
				},
				"token": "vocn601eardaiadl37vf26mlmqkrbmdc",
				"topics": [
					[
						"food-and-drink",
						0.6295542
					]
				],
				"contentmodel": "wikitext",
				"pagelanguage": "ar",
				"pagelanguagehtmlcode": "ar",
				"pagelanguagedir": "rtl",
				"touched": "2022-05-07T23:09:30Z",
				"lastrevid": 58003453,
				"length": 10231,
				"protection": [],
				"restrictiontypes": [
					"edit",
					"move"
				],
				"revisions": [
					{
						"revid": 58003453,
						"parentid": 51554239
					}
				],
				"thumbnail": {
					"source": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/03/Alicha_1.jpg/332px-Alicha_1.jpg",
					"width": 332,
					"height": 249
				},
				"original": {
					"source": "https://upload.wikimedia.org/wikipedia/commons/0/03/Alicha_1.jpg",
					"width": 800,
					"height": 600
				},
				"pageimage": "Alicha_1.jpg"
			},
			{
				"pageid": 8333819,
				"ns": 0,
				"title": "ام رقيقة",
				"tasktype": "link-recommendation",
				"difficulty": "easy",
				"order": 1,
				"qualityGateIds": [
					"dailyLimit"
				],
				"qualityGateConfig": {
					"image-recommendation": {
						"dailyLimit": false,
						"dailyCount": 0
					},
					"link-recommendation": {
						"dailyLimit": false,
						"dailyCount": 0
					}
				},
				"token": "tb6eupqb5oltopjajknh2debmlkg9fk4",
				"topics": [
					[
						"food-and-drink",
						0.4504298
					]
				],
				"contentmodel": "wikitext",
				"pagelanguage": "ar",
				"pagelanguagehtmlcode": "ar",
				"pagelanguagedir": "rtl",
				"touched": "2022-05-07T23:10:02Z",
				"lastrevid": 55799787,
				"length": 3676,
				"protection": [],
				"restrictiontypes": [
					"edit",
					"move"
				],
				"revisions": [
					{
						"revid": 55799787,
						"parentid": 54929751
					}
				],
				"thumbnail": {
					"source": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/%D8%A3%D9%85_%D8%B1%D9%82%D9%8A%D9%82%D8%A9.jpg/177px-%D8%A3%D9%85_%D8%B1%D9%82%D9%8A%D9%82%D8%A9.jpg",
					"width": 177,
					"height": 332
				},
				"original": {
					"source": "https://upload.wikimedia.org/wikipedia/commons/a/a2/%D8%A3%D9%85_%D8%B1%D9%82%D9%8A%D9%82%D8%A9.jpg",
					"width": 512,
					"height": 960
				},
				"pageimage": "أم_رقيقة.jpg"
			}
		]
	}
}

I suspect it is caused by the link recommendation filter removing items from the task set (this happens if there is no database entry backing the link suggestion task), but we are still using the original, unsubstracted count in the API.

There are a number of issues that could cause small discrepancies - filtering out protected articles, deduplication (if you have selected multiple task types and some pages has more than one tasks), possibly a page getting edited between the initial homepage load and the user paging. IMO it's not worth sinking much time into it. Maybe what we should do is to check that we are the last page (ie. we have gotten less tasks in the last API request than we have asked for) and modify the counter if that happens.

kostajh changed the task status from Open to In Progress.May 12 2022, 8:56 AM
kostajh triaged this task as Medium priority.

There are a number of issues that could cause small discrepancies - filtering out protected articles, deduplication (if you have selected multiple task types and some pages has more than one tasks), possibly a page getting edited between the initial homepage load and the user paging. IMO it's not worth sinking much time into it. Maybe what we should do is to check that we are the last page (ie. we have gotten less tasks in the last API request than we have asked for) and modify the counter if that happens.

I think there is some room for improvement in the UX when the user starts out with a small number of tasks, because otherwise this is a pretty confusing experience. And having smaller number of tasks will be more common when T305408: Newcomer tasks: deploy AND selection to all wikis is done.

What seems weird to me is that the result set in the example above is small enough (9) that we should already have all the information about protected and filtered out tasks.

There are a number of issues that could cause small discrepancies - filtering out protected articles, deduplication (if you have selected multiple task types and some pages has more than one tasks), possibly a page getting edited between the initial homepage load and the user paging. IMO it's not worth sinking much time into it. Maybe what we should do is to check that we are the last page (ie. we have gotten less tasks in the last API request than we have asked for) and modify the counter if that happens.

I think there is some room for improvement in the UX when the user starts out with a small number of tasks, because otherwise this is a pretty confusing experience. And having smaller number of tasks will be more common when T305408: Newcomer tasks: deploy AND selection to all wikis is done.

What seems weird to me is that the result set in the example above is small enough (9) that we should already have all the information about protected and filtered out tasks.

Yes, the case in the description is somewhat unusual - the small size of counted result shows the issue.

The discrepancies were much less noticeable before (with only OR). e.g. the count when only "Food and drink" is selected - 488, when only "Africa" is selected - 1,536 (the task type "Link recommendations"). The sum - when both "Food and drink" +"Africa" (with "Match at least one selected topic" option), should be 2,034, but it'd 2,024.

I checked the pilots wikis - it's not difficult to reproduce the issue: cswiki (all task types selected, Business and economics+Computers and internet) the count is 160 - displayed 136 cards. Society+Chemistry - the count is 24, displayed 22 cards.
bnwiki(Copyedit+Link recommendations, Society+Technology) the count is 22, displayed.
However, there are more cases when the count and the number of displayed cards were matched.

Change 793035 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: filter out protected pages

https://gerrit.wikimedia.org/r/793035

There are a number of issues that could cause small discrepancies - filtering out protected articles, deduplication (if you have selected multiple task types and some pages has more than one tasks), possibly a page getting edited between the initial homepage load and the user paging. IMO it's not worth sinking much time into it. Maybe what we should do is to check that we are the last page (ie. we have gotten less tasks in the last API request than we have asked for) and modify the counter if that happens.

From the filtering scenarios you mention (1) Protected articles (2) Task type de-duplication (3) Edits made between homepage load and user paging I focused on (1) because it seemed the most obvious. Some findings:

We're passing a maxLength of 1 in the call to $tasks = $this->protectionFilter->filter( $tasks, 1 ); in SuggestedEdits::getTaskSet() , I guess because that's the call for the "first-render" where only one card will be shown. However I think that's causing the filter to stop looking for invalid tasks after one iteration, so not finding and annotating any invalid task after even if the task set may contain some.

We're returning protected pages as results in the requests to /api.php?action=query&format=json&prop=info%7Crevisions%7Cpageimages&inprop=protection&rvprop=idspiprop=name%7Coriginal%7Cthumbnail&pithumbsize=332&generator=growthtasks&ggtlimit=20&ggttasktypes=references&formatversion=2&uselang=en&ggttopics=europe&ggttopicsmode=OR. I'm not sure if this is a desired behavior because I can see a param inprop=protection which makes me doubt. However, from an API consistency point of view should we filter these in the server instead of in the client like we do now (see cleanupData() function and TODO comment) ? I can see the image and link filters applied in ApiQueryGrowthTasks::run(). Maybe we miss to add something like $tasks = $this->protectionFilter->filter( $tasks, $limit ); there? I'm not sure where this should happen since I can see another TODO comment for this in SearchTaskSuggester::doSuggest().

I pushed a tentative patch gerrit 793035 that filters protected pages from API responses but I'm not sure the assumptions here are right. The patch does not address all discrepancies tho, just the protected pages one. Feedback welcome.

There are a number of issues that could cause small discrepancies - filtering out protected articles, deduplication (if you have selected multiple task types and some pages has more than one tasks), possibly a page getting edited between the initial homepage load and the user paging. IMO it's not worth sinking much time into it. Maybe what we should do is to check that we are the last page (ie. we have gotten less tasks in the last API request than we have asked for) and modify the counter if that happens.

From the filtering scenarios you mention (1) Protected articles (2) Task type de-duplication (3) Edits made between homepage load and user paging I focused on (1) because it seemed the most obvious. Some findings:

We're passing a maxLength of 1 in the call to $tasks = $this->protectionFilter->filter( $tasks, 1 ); in SuggestedEdits::getTaskSet() , I guess because that's the call for the "first-render" where only one card will be shown. However I think that's causing the filter to stop looking for invalid tasks after one iteration, so not finding and annotating any invalid task after even if the task set may contain some.

Note this has some overlap with T308542: Suggested edits: Populate task queue from server-side. I think we could remove the limit on the protection filter without seeing much of a performance hit, since we are looking at up to 20 articles (it used to be 200).

inprop=protection just makes the pageinfo API indicate whether the page is protected or not. I don't think there is protection filter functionality in the core API.

The protection filter makes one SQL query per task. 20 extra queries is not terrible (and the work required to do it with a single query wouldn't be huge either), but also not worth just to get rid of one of several potential sources of discrepancy, IMO.

I think the more flexible way forward would be to make sure the number of total tasks shown (note this is not the number of tasks returned by the API, even if filtering is made fully accurate, it is a number we get from CirrusSearch which is basically "how many results would CirrusSearch give if we wouldn't provide a LIMIT parameter") capped to the actual number of tasks. E.g. we request 20 tasks, the API returns 20 tasks with a totalCount of 50. We display "1 of 50". After paging, you only get 10 new tasks, that's less than what we requested, so we modify the display to "21 of 30". If we get 0 tasks instead, we modify to "end of 20". (A slightly more elegant but slightly more complex approach is to request 21 tasks but only display 20, so we always know in advance when a paging attempt would yield 0 tasks.)

Change 793510 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] [WIP] Newcomer tasks: use task length instead of totalCount

https://gerrit.wikimedia.org/r/793510

Thanks for all the insights. After investigating a bit further I pushed 2 patches that I hope will improve the current status quo:

  • gerrit 793035 filters out protected pages in the server instead of the client for the Homepage module first render and requests to ApiQueryGrowthTasks. This will avoid discrepancies in the total count that are caused by protected pages since we were filtering them in the client but not re-calculating the count after the filter was made.
  • gerrit 793511 uses the length of the data.tasks received in the API requests to "fetch more tasks", instead of using the totalCount provided by the first request. This will adjust the total number of suggestions shown after the first "fetch more tasks" API request is made avoiding showing the "No more suggestions" message in the middle of the feed navigation., It also addresses a problem on the "Next arrow" navigation button. Currently we don't disable or change the state of such icon while the "next page" request is being made, allowing the user to navigate to the next card when we haven't yet receiveed the data for it from the server. This was occurring on the first "next page" and will be fixed by T308542 but it is also happening when navigating to "second next page". It can be easily reproduced by clicking very fast on the next button or using throttling to a slow connection.

I want to add some tests and check some more edge scenarios. Happy to gather more feedback before moving them to code review. @mewoph @Tgr @kostajh

Change 793510 abandoned by Sergio Gimeno:

[mediawiki/extensions/GrowthExperiments@master] [WIP] Newcomer tasks: use task length instead of totalCount

Reason:

Squashed in 793511

https://gerrit.wikimedia.org/r/793510

Change 793511 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] [WIP] Newcomer tasks: prevent navigation to next task before next page has finished loading

https://gerrit.wikimedia.org/r/793511

As discussed during reviews ( gerrit 793035 ) and meetings the filtering of protected pages would benefit from having T259346: Add page protection filter to CirrusSearch in place and improving the protection filter to make just one single query per-taskset as opposed of per-task, see T309535: ProtectionFilter::filter should make a single query per task set.

The changes in gerrit 793511 are:

  • Show the "next button" in an ellipsis state while a request to fetch more tasks is being made, preventing the "End of queue" state to show in between task feed navigation.
BeforeWith patch
Screenshot 2022-05-30 at 15.42.33.png (1×1 px, 125 KB)
Screenshot 2022-05-30 at 15.19.11.png (1×1 px, 195 KB)
icon color is black and it is clickableicon color is grey and not clickable
  • In the pager widget copy text n of M suggestions, update the M with the (real) number of tasks fetched, on the last chunk of tasks fetched. This prevents the "End of queue" state to show at an "unexpected" end of the queue.
BeforeWith patch
... 10 of 17 suggestions, click next, 11 of 17 suggestions, click next, 12 of 17 suggestions, click next, End of 17 suggestions...10 of 17 suggestions, click next, 11 of 12 suggestions, click next, 12 of 12 suggestions, click next, End of 12 suggestions
  • On mobile, when the user navigates to the "End of the queue" and then navigates to the hompeage using the back button (Top corner arrow) the "Suggested edits" module shows the last day edits counter widget. IMO this is miss-leading since there are still tasks in the queue. This patch will force the last task in the queue to render in such scenario. @RHo could you confirm this behavior improves the status quo or which would be the desired outcome? Ty.
BeforeWith patch
Screenshot 2022-05-30 at 15.25.08.png (1×764 px, 83 KB)
Screenshot 2022-05-30 at 15.25.16.png (1×760 px, 127 KB)
Screenshot 2022-05-30 at 15.29.58.png (1×774 px, 84 KB)
Screenshot 2022-05-30 at 15.30.03.png (1×770 px, 109 KB)

@Etonkovidova let me know if the scenarios above were already reported so I can link to exisitng phab tasks.

As discussed during reviews ( gerrit 793035 ) and meetings the filtering of protected pages would benefit from having T259346: Add page protection filter to CirrusSearch in place and improving the protection filter to make just one single query per-taskset as opposed of per-task, see T309535: ProtectionFilter::filter should make a single query per task set.

The changes in gerrit 793511 are:

  • Show the "next button" in an ellipsis state while a request to fetch more tasks is being made, preventing the "End of queue" state to show in between task feed navigation.
BeforeWith patch
Screenshot 2022-05-30 at 15.42.33.png (1×1 px, 125 KB)
Screenshot 2022-05-30 at 15.19.11.png (1×1 px, 195 KB)
icon color is black and it is clickableicon color is grey and not clickable
  • In the pager widget copy text n of M suggestions, update the M with the (real) number of tasks fetched, on the last chunk of tasks fetched. This prevents the "End of queue" state to show at an "unexpected" end of the queue.
BeforeWith patch
... 10 of 17 suggestions, click next, 11 of 17 suggestions, click next, 12 of 17 suggestions, click next, End of 17 suggestions...10 of 17 suggestions, click next, 11 of 12 suggestions, click next, 12 of 12 suggestions, click next, End of 12 suggestions
  • On mobile, when the user navigates to the "End of the queue" and then navigates to the hompeage using the back button (Top corner arrow) the "Suggested edits" module shows the last day edits counter widget. IMO this is miss-leading since there are still tasks in the queue. This patch will force the last task in the queue to render in such scenario. @RHo could you confirm this behavior improves the status quo or which would be the desired outcome? Ty.

Hi @Sgs - thanks for tagging me on this task. No, I do not agree with the proposal to change the right arrow to an ellipsis that cannot be selected. Firstly, the ellpsis icon in an action button is used to indicate "more options" typically. Secondly, it is undesirable to convey state in a call-to-action button.

My recommendation is that we keep the right arrow but have it shown as disabled, and it is the task counter above the article card "1 of X suggestions" that is changed to 1 of ... suggestions. Ideally, we would replace the article card with a "finding articles" loading progress message instead that explicitly tells the person what is going on. That is, something like the mock below (copy tbc):

image.png (1×728 px, 78 KB)

BeforeWith patch
Screenshot 2022-05-30 at 15.25.08.png (1×764 px, 83 KB)
Screenshot 2022-05-30 at 15.25.16.png (1×760 px, 127 KB)
Screenshot 2022-05-30 at 15.29.58.png (1×774 px, 84 KB)
Screenshot 2022-05-30 at 15.30.03.png (1×770 px, 109 KB)

@Etonkovidova let me know if the scenarios above were already reported so I can link to exisitng phab tasks.

Hi @Sgs - thanks for tagging me on this task. No, I do not agree with the proposal to change the right arrow to an ellipsis that cannot be selected. Firstly, the ellpsis icon in an action button is used to indicate "more options" typically. Secondly, it is undesirable to convey state in a call-to-action button.

My recommendation is that we keep the right arrow but have it shown as disabled, and it is the task counter above the article card "1 of X suggestions" that is changed to 1 of ... suggestions. Ideally, we would replace the article card with a "finding articles" loading progress message instead that explicitly tells the person what is going on. That is, something like the mock below (copy tbc):

image.png (1×728 px, 78 KB)

Thanks for the feedback. I will drop the usage of the ... ellipsis in the loading state inside the arrow CTA button and just use the disabled state (greyed). For the loading progress message I have filed T309614: Newcomer tasks: add intermediate loading state, moving it to low priority for now though.

Also speaking with @kostajh we agreed that it would be easier to test and evaluate the outcome of this task if we tackle T309535: ProtectionFilter::filter should make a single query per task set first since it will remove/reduce the discrepancy caused by protected pages.

Sgs lowered the priority of this task from Medium to Low.May 31 2022, 2:22 PM

Change 793035 abandoned by Sergio Gimeno:

[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: filter out protected pages

Reason:

Will be solved in I584d98f1841b8b25e656d1f8859ce0c4c7d6870e with optimized query

https://gerrit.wikimedia.org/r/793035

This comment was removed by Sgs.

Change 809627 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: avoid showing ellipsis in next arrow

https://gerrit.wikimedia.org/r/809627

Change 793511 abandoned by Sergio Gimeno:

[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: adjust the number of tasks shown in the taskfeed pager to match the total number of tasks received

Reason:

Needs re-work

https://gerrit.wikimedia.org/r/793511

Change 809627 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: avoid showing ellipsis in next arrow

https://gerrit.wikimedia.org/r/809627

Change 809967 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] Newcomer tasks: adjust the number of tasks shown in the taskfeed pager to match the total number of tasks received

https://gerrit.wikimedia.org/r/809967