Page MenuHomePhabricator

Deleted pages are not being removed from links tables, which also messes up category counts
Open, HighPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
{{PAGESINCATEGORY:Wikipedia:Nuweg}} isn't decreased

Long term solution should come from T221795, but we need a fix for the fact they're not decreasing now.
What should have happened instead?:

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

Event Timeline

The update of the category table (for the counters) and also the deletion from categorylinks is not done.

That is part of the LinksUpdate and that was refactored, which could be a reason for a error in that area.

I can reproduce this locally, git bisect says this is caused by the linksupdate refactor (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/743278). The patch doesn't seem to revert cleanly on master. I'm not really sure how to debug this component, sticking var_dump( 'here' ); die();s to functions that seem relevant seems to have no effect.

The LinksUpdate lost the page id to work on:

[DBQuery] Wikimedia\Rdbms\Database::beginIfImplied (MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows) [0s] localhost: BEGIN
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows [0.011s] localhost: SELECT  cl_to,cl_sortkey_prefix  FROM `categorylinks`    WHERE cl_from = 0  
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows [0.001s] localhost: SELECT  el_to  FROM `externallinks`    WHERE el_from = 0  
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows [0.001s] localhost: SELECT  il_to  FROM `imagelinks`    WHERE il_from = 0  
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows [0.001s] localhost: SELECT  iwl_prefix,iwl_title  FROM `iwlinks`    WHERE iwl_from = 0  
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows [0.001s] localhost: SELECT  ll_lang,ll_title  FROM `langlinks`    WHERE ll_from = 0  
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows [0.001s] localhost: SELECT  pl_namespace AS `ns`,pl_title AS `title`  FROM `pagelinks`    WHERE pl_from = 0  
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows [0.001s] localhost: SELECT  pp_propname,pp_value  FROM `page_props`    WHERE pp_page = 0  
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksTable::fetchExistingRows [0.001s] localhost: SELECT  tl_namespace AS `ns`,tl_title AS `title`  FROM `templatelinks`    WHERE tl_from = 0  
[DBQuery] MediaWiki\Deferred\LinksUpdate\LinksUpdate::updateLinksTimestamp [0s] localhost: UPDATE  `page` SET page_links_updated = '20220114230240' WHERE page_id = 1385

Without the existing rows it cannot determine the delta and does not delete anything.

Edit: The new LinksTable lost the page id, while the LinksUpdate itself has the correct one (its the 1385 in my case) to update page table.

LinksTable is getting the page id from a PageIdentity

Some quick thoughts on eyeballing the code:

  • The issue is presumably coming from LinksTable::getFromConds() line 353: return [ $this->getFromField() => $this->getSourcePageId() ];
  • LinksTable::getSourcePageId() line 276 returns the getId() of the private PageIdentity $sourcePage
  • LinksTable::injectBaseDependencies() is meant to set $sourcePage, and is meant to be called by the factory on instantiation.
  • But the call to PageIdentity::getId() in practice will be routed to a Title::getArticleID()
  • Title::getArticleID() line 2871 calls Title::getFieldFromPageStore() without any special flags, and casts its return to an instantiation
  • Title::getFieldFromPageStore() line 4113 returns false if the page doesn't exist, which will include when it's just been deleted?

Change 754046 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@wmf/1.38.0-wmf.17] Revert \"LinksUpdate refactor\" and follow-ups

https://gerrit.wikimedia.org/r/754046

14:48:02 <taavi> this incident probably deserves a post-mortem for our response, I don't think the time it took us to respond is acceptable nor is the fact that the patch authors aren't available and I don't know who to escalate this to

Change 754047 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] Use immutable page identity in LinksUpdate (needed for deletion)

https://gerrit.wikimedia.org/r/754047

Some quick thoughts on eyeballing the code:

  • The issue is presumably coming from LinksTable::getFromConds() line 353: return [ $this->getFromField() => $this->getSourcePageId() ];
  • LinksTable::getSourcePageId() line 276 returns the getId() of the private PageIdentity $sourcePage
    • PageIdentity::getId() -> Title::getArticleID() which shouldn't return 0 except if canExist() returns false?
  • LinksTable::injectBaseDependencies() is meant to set this, and is meant to be called by the factory on instantiation.

As time of the constructor of LinksDeletionUpdate and parent LinksUpdate the page id is set. But it is used on call of LinksUpdate::doUpdate and than the page id of the title is reset and LinkCache/PageStore also missing the page.
Caches are cleared to make the page showing redlinks. The clean up in the LinksUpdate is deferred after all of that and has to store its id to make sure it deletes the correct things. That was hold by LinksUpdate::$mId in the past (and could be done also now), but the new code used a page identity which is aware of all the changes done in the meantime, possible even when cloned due to refresh code.

If the page id is copy over to a new object it seems to work. Title::toPageIdentity is documented for this with: "The ProperPageIdentity returned by this method is guaranteed to be immutable." Which sounds great (at least at the moment).


[Linter extension was affected similiar, fixed some days ago - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Linter/+/752743]

Some quick thoughts on eyeballing the code:

  • The issue is presumably coming from LinksTable::getFromConds() line 353: return [ $this->getFromField() => $this->getSourcePageId() ];
  • LinksTable::getSourcePageId() line 276 returns the getId() of the private PageIdentity $sourcePage
    • PageIdentity::getId() -> Title::getArticleID() which shouldn't return 0 except if canExist() returns false?
  • LinksTable::injectBaseDependencies() is meant to set this, and is meant to be called by the factory on instantiation.

As time of the constructor of LinksDeletionUpdate and parent LinksUpdate the page id is set. But it is used on call of LinksUpdate::doUpdate and than the page id of the title is reset and LinkCache/PageStore also missing the page.
Caches are cleared to make the page showing redlinks. The clean up in the LinksUpdate is deferred after all of that and has to store its id to make sure it deletes the correct things. That was hold by LinksUpdate::$mId in the past (and could be done also now), but the new code used a page identity which is aware of all the changes done in the meantime, possible even when cloned due to refresh code.

If the page id is copy over to a new object it seems to work. Title::toPageIdentity is documented for this with: "The ProperPageIdentity returned by this method is guaranteed to be immutable." Which sounds great (at least at the moment).


[Linter extension was affected similiar, fixed some days ago - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Linter/+/752743]

Aha, that makes sense. Thanks for the analysis and the patch. For now we're going to deploy the revert, to give Tim/Daniel/CPT time to review your fix rather than deploying it without their input.

Change 754046 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.17] Revert \"LinksUpdate refactor\" and follow-ups

https://gerrit.wikimedia.org/r/754046

Summarizing the current status from IRC:

  • We're deploying the revert to wmf.17, and have moved the train back to wmf.16 temporarily while it gets synced out in a non-atomic way. After testing we'll move forward again to wmf.17.
  • Umherirrender's analysis/patch looks good, but we want to leave that for Tim to review and if there are any other follow-ups from this.
  • Even after the revert is deployed, we need some cleanup script or queries to reset the links table.

Some quick thoughts on eyeballing the code:

  • The issue is presumably coming from LinksTable::getFromConds() line 353: return [ $this->getFromField() => $this->getSourcePageId() ];
  • LinksTable::getSourcePageId() line 276 returns the getId() of the private PageIdentity $sourcePage
    • PageIdentity::getId() -> Title::getArticleID() which shouldn't return 0 except if canExist() returns false?
  • LinksTable::injectBaseDependencies() is meant to set this, and is meant to be called by the factory on instantiation.

As time of the constructor of LinksDeletionUpdate and parent LinksUpdate the page id is set. But it is used on call of LinksUpdate::doUpdate and than the page id of the title is reset and LinkCache/PageStore also missing the page.
Caches are cleared to make the page showing redlinks. The clean up in the LinksUpdate is deferred after all of that and has to store its id to make sure it deletes the correct things. That was hold by LinksUpdate::$mId in the past (and could be done also now), but the new code used a page identity which is aware of all the changes done in the meantime, possible even when cloned due to refresh code.

If the page id is copy over to a new object it seems to work. Title::toPageIdentity is documented for this with: "The ProperPageIdentity returned by this method is guaranteed to be immutable." Which sounds great (at least at the moment).


[Linter extension was affected similiar, fixed some days ago - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Linter/+/752743]

Aha, that makes sense. Thanks for the analysis and the patch. For now we're going to deploy the revert, to give Tim/Daniel/CPT time to review your fix rather than deploying it without their input.

I am very okay with it, because the code area is new and a patch set there needs some more eyes before running in production.

I would assume there are now orphans rows on the link table in production for the deletioned pages between the first deployment and the revert.
The (speedy) deletion categories should repair itself when they gets empty (by Category::refreshCountsIfEmpty).
Not sure if the maintenance script to recount category counters is ready for production (recountCategories.php).

Mentioned in SAL (#wikimedia-operations) [2022-01-15T00:42:39Z] <jforrester@deploy1002> Started scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958

Mentioned in SAL (#wikimedia-operations) [2022-01-15T00:46:37Z] <jforrester@deploy1002> Finished scap: Revert "LinksUpdate refactor" and follow-ups for T299244 re. T293958 (duration: 03m 58s)

OK, provisionally we think that this is 'fixed' (no longer making things worse) in current production, but (a) the counts are still going to be wrong until they naturally drift towards correctitude or an outside force like a script fixes them, and (b) the code isn't reverted or fixed in the dev branch so it'll break prod in the same way next week unless it's fixed. Un-tagging as a blocker from this week's train.

  • Even after the revert is deployed, we need some cleanup script or queries to reset the links table.

That seems to be refreshLinks.php with --dfn-only, there is periodic job already, which could be started out of the row.

Mentioned in SAL (#wikimedia-operations) [2022-01-15T02:54:14Z] <legoktm> started mwscript refreshLinks.php --wiki=enwiki --dfn-only (T299244)

Mentioned in SAL (#wikimedia-operations) [2022-01-15T02:55:10Z] <legoktm> started mwscript refreshLinks.php --wiki=commonswiki --dfn-only (T299244)

  • Even after the revert is deployed, we need some cleanup script or queries to reset the links table.

That seems to be refreshLinks.php with --dfn-only, there is periodic job already, which could be started out of the row.

D'oh. Spent a while trying to write one (P18740), that is much better. Starting it everywhere now.

Mentioned in SAL (#wikimedia-operations) [2022-01-15T03:01:32Z] <legoktm> started refreshLinks --dfn-only via systemd units for s2-s6 (T299244)

Mentioned in SAL (#wikimedia-operations) [2022-01-15T03:05:26Z] <legoktm> started refreshLinks --dfn-only via systemd units for s7-s8 (T299244)

refreshLinks is running against enwiki, commonswiki in tmux under my name.

s2-s8 except s4 (commons) are running via the periodic systemd units.

I'll check back in a few hours and start recountCategories runs if that seems needed.

Mentioned in SAL (#wikimedia-operations) [2022-01-15T05:20:39Z] <legoktm> started recountCategories.php --wiki=enwiki --mode pages (T299244)

Legoktm renamed this task from {{PAGESINCATEGORY:Wikipedia:Nuweg}} not decreased when page is deleted to Deleted pages are not being removed fron links tables, which also messes up category counts.Sat, Jan 15, 5:21 AM

Mentioned in SAL (#wikimedia-operations) [2022-01-15T06:18:43Z] <<legoktm>> finished running recountCategories on s8 wikis (T299244)

Mentioned in SAL (#wikimedia-operations) [2022-01-15T06:19:38Z] <<legoktm>> finished running recountCategories on s5 wikis (T299244)

s1: done
s2: still running refreshLinks, at svwiki
s3: recountCategories
s4: still running refreshLinks
s5: done
s6: recountCategories
s7: still running refreshLinks, at frwiktionary
s8: done

Using: mwscriptwikiset recountCategories.php s8.dblist --mode pages && mwscriptwikiset recountCategories.php s8.dblist --mode files && mwscriptwikiset recountCategories.php s8.dblist --mode subcats && dologmsg '!log <legoktm> finished running recountCategories on s8 wikis (T299244)'

Mentioned in SAL (#wikimedia-operations) [2022-01-15T06:21:46Z] <<legoktm>> finished running recountCategories on s6 wikis (T299244)

Mentioned in SAL (#wikimedia-operations) [2022-01-15T06:41:50Z] <<legoktm>> finished running recountCategories on s3 wikis (T299244)

Mentioned in SAL (#wikimedia-operations) [2022-01-15T07:51:28Z] <legoktm> finished running recountCategories on s2 wikis (T299244)

Mentioned in SAL (#wikimedia-operations) [2022-01-15T07:58:57Z] <legoktm> finished running recountCategories on s7 wikis (T299244)

Mentioned in SAL (#wikimedia-operations) [2022-01-15T08:55:53Z] <legoktm> finished running recountCategories on s4 wikis (T299244)

With that, in theory the databases are all back in the right state. Still leaving as UBN because it's an active train blocker for wmf.18 as the code was not reverted out of master.

Umherirrender renamed this task from Deleted pages are not being removed fron links tables, which also messes up category counts to Deleted pages are not being removed from links tables, which also messes up category counts.Sat, Jan 15, 12:46 PM

Not sure if my comments at T293958#7624144 are related to this but crosslinking just in case. I note lots of wikibase jobs (60Million, but maybe i'm reading the graph wrong) and the linter jobs are apparently not getting through.

Change 754117 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Add regression test for LinksDeletionUpdate loss of page_id

https://gerrit.wikimedia.org/r/754117

Change 754047 merged by jenkins-bot:

[mediawiki/core@master] Use immutable page identity in LinksDeletionUpdate

https://gerrit.wikimedia.org/r/754047

Change 754117 merged by jenkins-bot:

[mediawiki/core@master] Add regression test for LinksDeletionUpdate loss of page_id

https://gerrit.wikimedia.org/r/754117

This needs an incident report still. Not sure who it should be assigned to for that.

Ladsgroup lowered the priority of this task from Unbreak Now! to High.Sat, Jan 22, 12:53 PM
Ladsgroup added a subscriber: Ladsgroup.

True but since it's not a production issue anymore, I removed it as a subtask of train blocker + reduced its priority to High.