Page MenuHomePhabricator

Orphaned entries in categorylinks
Closed, DuplicatePublic

Description

On plwiki there are some orphaned entries in table categorylinks. Category is empty and these pages are not in this category anymore. Probably it is releated with page moving, redirects, tagging redirects for deletion, moving over redirects.

https://quarry.wmflabs.org/query/31908

Event Timeline

Marostegui subscribed.

So I can confirm this has the same results in production, so consistency across labs and production is the same, which was my initial concern.
Not sure what is the fix for this, but I am not sure we DBAs can act on it, as it seems more wiki related.

Can you give more details on what needs to be done?

Can you give more details on what needs to be done?

Entries about these pages (linked in categorylinks) should not exists because these pages were removed from this category. So MediaWiki code for categorylinks update not always working after decategorisation.

This might be caused by T221980. It seems that in general, neither categorylinks table rows nor recentchanges entries are correctly deleted when a page is deleted to make way for a move from another title.

Krinkle subscribed.

Adding CPT for awareness. Assuming this is still true, then the answer to my question at T221795 is likely "Yes, we still need the recount/doubting logic". In which case that might be something we can fix "once and for all" - which might be easier now with the new RevisionStore, DerivedPageDataUpdater abstractions etc.

Is this visible in the "Categories" area of the page view? Is there any other user visibility?

Data corruption is bad, but data corruption that users can see is very bad.

The list of categories associated with a page, I believe, comes from ParserOutput through ParserCache.

The category links table is intended to contain all such associations from all existent pages the last time those pages were re-parsed after an edit (or recursive template update).

This table is used in various user-facing ways, such as:

  • List of pages that are in this category, as shown on Category: pages.
  • Special:RandomInCategory
  • Special:RecentChangesLinked
  • Aggregate information based on category membership, such as Special:Unusedcategories, Special:Wantedcategories.

Any of these might wrongly include or exclude a category, or page in that category, if link table entries are corrupt or outdated – e.g. due to failed links update or due to a to-be-discovered bug in the way this data is synced from Parser to the link tables.

Aklapper changed the task status from Stalled to Open.Oct 19 2020, 4:32 PM

The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.

(Smallprint, as general orientation for task management: If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead. If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks...Edit Subtasks. If this task is stalled on an upstream project, then the Upstream tag should be added. If this task requires info from the task reporter, then there should be instructions which info is needed. If this task needs retesting, then the TestMe tag should be added. If this task is either out of scope and nobody should ever work on this, or nobody else managed to reproduce the problem described in this task, then this task should have the "Declined" status. If the task is valid but should not appear on some team's workboard, then the team project tag should be removed while the task has another active project tag.)

Closing this in favor of T85696 because the very same problem occurrs on other wikis, and nobody is working on this task.