Page MenuHomePhabricator

Orphaned entries in categorylinks
Open, NormalPublic

Description

On plwiki there are some orphaned entries in table categorylinks. Category is empty and these pages are not in this category anymore. Probably it is releated with page moving, redirects, tagging redirects for deletion, moving over redirects.

https://quarry.wmflabs.org/query/31908

Event Timeline

Wargo created this task.May 23 2019, 8:40 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 23 2019, 8:40 AM
Marostegui edited projects, added MediaWiki-General; removed DBA.May 23 2019, 8:45 AM
Marostegui added a subscriber: Marostegui.

So I can confirm this has the same results in production, so consistency across labs and production is the same, which was my initial concern.
Not sure what is the fix for this, but I am not sure we DBAs can act on it, as it seems more wiki related.

Can you give more details on what needs to be done?

Wargo added a comment.May 23 2019, 8:56 AM

Can you give more details on what needs to be done?

Entries about these pages (linked in categorylinks) should not exists because these pages were removed from this category. So MediaWiki code for categorylinks update not always working after decategorisation.

Thanks, let's tag Wikimedia-Rdbms to see if the Core Platform Team can check it out

This might be caused by T221980. It seems that in general, neither categorylinks table rows nor recentchanges entries are correctly deleted when a page is deleted to make way for a move from another title.

JJMC89 added a subscriber: JJMC89.Jul 12 2019, 3:58 AM
Krinkle added a subscriber: Krinkle.

Adding CPT for awareness. Assuming this is still true, then the answer to my question at T221795 is likely "Yes, we still need the recount/doubting logic". In which case that might be something we can fix "once and for all" - which might be easier now with the new RevisionStore, DerivedPageDataUpdater abstractions etc.

WDoranWMF triaged this task as Normal priority.Aug 6 2019, 6:34 PM

Is this visible in the "Categories" area of the page view? Is there any other user visibility?

Data corruption is bad, but data corruption that users can see is very bad.

The list of categories associated with a page, I believe, comes from ParserOutput through ParserCache.

The category links table is intended to contain all such associations from all existent pages the last time those pages were re-parsed after an edit (or recursive template update).

This table is used in various user-facing ways, such as:

  • List of pages that are in this category, as shown on Category: pages.
  • Special:RandomInCategory
  • Special:RecentChangesLinked
  • Aggregate information based on category membership, such as Special:Unusedcategories, Special:Wantedcategories.

Any of these might wrongly include or exclude a category, or page in that category, if link table entries are corrupt or outdated – e.g. due to failed links update or due to a to-be-discovered bug in the way this data is synced from Parser to the link tables.

Yann added a subscriber: Yann.Sep 8 2019, 5:07 PM

Wrong count right now on https://commons.wikimedia.org/wiki/Category:Copyright_violations : 81 more than real count.