Page MenuHomePhabricator

LogicException: Domain 'mowiktionary' is not recognized.
Open, Needs TriagePublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   LogicException: Domain 'mowiktionary' is not recognized.
exception.trace
from /srv/mediawiki/php-1.40.0-wmf.22/includes/jobqueue/JobQueueGroup.php(136)
#0 /srv/mediawiki/php-1.40.0-wmf.22/extensions/Cognate/src/LocalJobSubmitJob.php(26): JobQueueGroup->push(Cognate\CacheUpdateJob)
#1 /srv/mediawiki/php-1.40.0-wmf.22/extensions/EventBus/includes/JobExecutor.php(79): Cognate\LocalJobSubmitJob->run()
#2 /srv/mediawiki/rpc/RunSingleJob.php(77): MediaWiki\Extension\EventBus\JobExecutor->execute(array)
#3 {main}
Impact
Notes

mowiktionary was closed in T14255.

Details

Request URL
https://jobrunner.discovery.wmnet/rpc/RunSingleJob.php

Event Timeline

Might be similar to T322588: Run `refreshGlobalimagelinks.php --pages=nonexisting` from the GlobalUsage extension. I'm guessing Cognate is storing wiki associations that need to be cleared up as part of https://wikitech.wikimedia.org/wiki/Close_a_wiki or https://wikitech.wikimedia.org/wiki/Delete_a_wiki.

In the specific case of https://mo.wiktionary.org, this actually still exists but redirects to ro.wiktionary.org. Given it closed in 2007, afaik well before the existence of the Cognate extension, that raises some additional questions as to how this data got in there.

According to Logstash this has been sporadically happening for a while:

image.png (553×1 px, 97 KB)

This is indeed still part of cognates sites (and some pages are part of cognate_pages):

wikiadmin2023@10.64.32.11(cognate_wiktionary)> SELECT * FROM cognate_sites WHERE cgsi_dbname = 'mowiktionary';
+---------------------+--------------+----------------+
| cgsi_key            | cgsi_dbname  | cgsi_interwiki |
+---------------------+--------------+----------------+
| 3956001222954918560 | mowiktionary | mo             |
+---------------------+--------------+---------------
wikiadmin2023@10.64.32.11(cognate_wiktionary)> SELECT COUNT(*) FROM cognate_pages WHERE cgpa_site = 3956001222954918560;
+----------+
| COUNT(*) |
+----------+
|        9 |
+----------+
1 row in set (6.578 sec)

To fix these, we would need to modify (and rename?) Cognate's maintenance/populateCognateSites.php so that it is also able to purge old sites (right now it only appends previously unknown sites) and, more importantly, that it purges cognate_pages based on the currently existing sites. Then we can run it periodically while adding/ removing wikis (after updating the sites data).

To fix these, we would need to modify (and rename?) Cognate's maintenance/populateCognateSites.php so that it is also able to purge old sites (right now it only appends previously unknown sites) and, more importantly, that it purges cognate_pages based on the currently existing sites. Then we can run it periodically while adding/ removing wikis (after updating the sites data).

To me that sounds like a separate maintenance script would be better, since it overlaps with more than one of the existing scripts (to me it sounds related to populateCognateSites, populateCognatePages and purgeDeletedCognatePages).

I wonder if the maintenance script should look for pages with the to-be-deleted site itself (which isn’t cheap – it requires a full table scan, you can see it take >5 seconds in T329601#8617579), or if it should take a list of page rows to delete as an option (and finding those rows would be left to the person running the script, who could e.g. run this expensive query on the analytics replicas instead of the production ones). Though I suppose that second approach would rule out running the script periodically.

(I definitely don’t think we want another index on cognate_pages starting with cgpa_site, that wouldn’t be worth it. If we want the maintenance script to be autonomous, it’s probably better to have it scan the table in batches of e.g. 100k rows at a time to limit the runtime of each query, similar to what we did with UnexpectedUnconnectedPagePrimer for T300770.)