Page MenuHomePhabricator

MediaWiki\Linter\MissingCategoryException: Cannot find id for 'large-tables'
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   MediaWiki\Linter\MissingCategoryException: Cannot find id for 'large-tables'
exception.trace
from /srv/mediawiki/php-1.41.0-wmf.7/extensions/Linter/includes/CategoryManager.php(219)
#0 /srv/mediawiki/php-1.41.0-wmf.7/extensions/Linter/includes/Database.php(182): MediaWiki\Linter\CategoryManager->getCategoryId(string, NULL)
#1 [internal function]: MediaWiki\Linter\Database->serializeError(MediaWiki\Linter\LintError)
#2 /srv/mediawiki/php-1.41.0-wmf.7/extensions/Linter/includes/Database.php(287): array_map(array, array)
#3 /srv/mediawiki/php-1.41.0-wmf.7/extensions/Linter/includes/RecordLintJob.php(60): MediaWiki\Linter\Database->setForPage(array)
#4 /srv/mediawiki/php-1.41.0-wmf.7/extensions/EventBus/includes/JobExecutor.php(79): MediaWiki\Linter\RecordLintJob->run()
#5 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#6 {main}
Impact
Notes
  • 22k+ in a spike between 17:02 and 17:05 UTC, around the time the train was rolled back?

Details

Request URL
https://jobrunner.discovery.wmnet/rpc/RunSingleJob.php

Event Timeline

Definitely occurred during train rollback for T336504: Transcluding Special:Prefixindex can force the default skin.

Two spikes during rollback. (I'm not entirely clear on why we're restarting php twice at this point, but I think that's probably related to why 2 separate spikes.)

2023-05-11-11:28:18.png (255×636 px, 16 KB)

Arlolra claimed this task.
Arlolra subscribed.

It looks like what happened here was that both creating the lint category and starting to populate it from Parsoid went out in the same deploy, wmf-8

When rolling back, the protection for unknown categories, which prevents unknown categories from getting into the job queue,
https://github.com/wikimedia/mediawiki-extensions-Linter/blob/master/includes/Hooks.php#L228-L232
was not helpful because jobs were being picked up by rolled back instances (wmf-7) that were added by the rolled forward instances (wmf-8).

The errors can be prevented in the future by either staggering the deploys or having Parsoid also provide an id hint so that path gets used in the case of a rollback,
https://github.com/wikimedia/mediawiki-extensions-Linter/blob/master/includes/CategoryManager.php#L235-L238