Page MenuHomePhabricator

Orphaned entries in categorylinks
Open, MediumPublic


On plwiki there are some orphaned entries in table categorylinks. Category is empty and these pages are not in this category anymore. Probably it is releated with page moving, redirects, tagging redirects for deletion, moving over redirects.

Event Timeline

Wargo created this task.May 23 2019, 8:40 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 23 2019, 8:40 AM
Marostegui edited projects, added MediaWiki-General; removed DBA.May 23 2019, 8:45 AM
Marostegui added a subscriber: Marostegui.

So I can confirm this has the same results in production, so consistency across labs and production is the same, which was my initial concern.
Not sure what is the fix for this, but I am not sure we DBAs can act on it, as it seems more wiki related.

Can you give more details on what needs to be done?

Wargo added a comment.May 23 2019, 8:56 AM

Can you give more details on what needs to be done?

Entries about these pages (linked in categorylinks) should not exists because these pages were removed from this category. So MediaWiki code for categorylinks update not always working after decategorisation.

Thanks, let's tag Wikimedia-Rdbms to see if the Platform Engineering can check it out

This might be caused by T221980. It seems that in general, neither categorylinks table rows nor recentchanges entries are correctly deleted when a page is deleted to make way for a move from another title.

JJMC89 added a subscriber: JJMC89.Jul 12 2019, 3:58 AM
Krinkle added a subscriber: Krinkle.

Adding CPT for awareness. Assuming this is still true, then the answer to my question at T221795 is likely "Yes, we still need the recount/doubting logic". In which case that might be something we can fix "once and for all" - which might be easier now with the new RevisionStore, DerivedPageDataUpdater abstractions etc.

WDoranWMF triaged this task as Medium priority.Aug 6 2019, 6:34 PM

Is this visible in the "Categories" area of the page view? Is there any other user visibility?

Data corruption is bad, but data corruption that users can see is very bad.

The list of categories associated with a page, I believe, comes from ParserOutput through ParserCache.

The category links table is intended to contain all such associations from all existent pages the last time those pages were re-parsed after an edit (or recursive template update).

This table is used in various user-facing ways, such as:

  • List of pages that are in this category, as shown on Category: pages.
  • Special:RandomInCategory
  • Special:RecentChangesLinked
  • Aggregate information based on category membership, such as Special:Unusedcategories, Special:Wantedcategories.

Any of these might wrongly include or exclude a category, or page in that category, if link table entries are corrupt or outdated – e.g. due to failed links update or due to a to-be-discovered bug in the way this data is synced from Parser to the link tables.

Yann added a subscriber: Yann.Sep 8 2019, 5:07 PM

Wrong count right now on : 81 more than real count.

AMooney changed the task status from Open to Stalled.Mar 12 2020, 1:21 PM
Krinkle removed a subscriber: Krinkle.Apr 16 2020, 8:13 PM
Aklapper changed the task status from Stalled to Open.Oct 19 2020, 4:32 PM

The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task needs retesting, then the TestMe tag should be added. If this task is either out of scope and nobody should ever work on this, or nobody else managed to reproduce the problem described in this task, then this task should have the "Declined" status. If the task is valid but should not appear on some team's workboard, then the team project tag should be removed while the task has another active project tag.)