Page MenuHomePhabricator

XMLDumps broken on deployment-mwmaint02 due to Jade Extension related content
Open, Needs TriagePublic

Description

In T281418 the Jade Extension was undeployed from the beta cluster.

However, it seems like content is still available in the database, and when rendered, this breaks the XMLDumps:

wikiadmin@172.16.4.172(enwiki)> select distinct page_content_model from page;
+------------------------+
| page_content_model     |
+------------------------+
| wikitext               |
| css                    |
| javascript             |
| Scribunto              |
| json                   |
| flow-board             |
| MassMessageListContent |
| text                   |
| sanitized-css          |
| JsonConfig.Dashiki     |
| NewsletterContent      |
| JadeJudgment           | <<<<<<<<<
| JadeEntity             | <<<<<<<<<
| story                  |
+------------------------+
14 rows in set (0.101 sec)

wikiadmin@172.16.4.172(enwiki)> select count(1) from page   where page_content_model in ('JadeEntity', 'JadeJudgement');
+----------+
| count(1) |
+----------+
|    24893 |
+----------+
1 row in set (0.072 sec)

In this task we want to fix this, perhaps by patching our way out?

Event Timeline

Change 955819 had a related patch set uploaded (by Milimetric; author: Milimetric):

[operations/mediawiki-config@master] Map Jade content handler to UnknownContentHandler

https://gerrit.wikimedia.org/r/955819

Just a quick note that this breaks testing of dumps for enwiki in deployment-prep. We can work around it by testing only on other wikis, but it would be nice for this to be cleaned up.

We can also just delete all the pages or clear them or set the content model to json instead. Whatever you feel.

@Ladsgroup: would you prefer that to the settings change? I'm happy to delete, but as I understood the maintenance delete scripts won't work without a content handler. So I guess I could update all the content models to json and then delete?

I was wondering if we could simply change the content model to json for all of them and call it a day (=avoiding more code and edge cases in prod while preserving old jade stuff). But given the size of pages and revisions that need to be cleaned up, we can go with that change.

Change 955819 merged by jenkins-bot:

[operations/mediawiki-config@master] Map Jade content handler to UnknownContentHandler

https://gerrit.wikimedia.org/r/955819

Change 956482 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Use FallbackContentHandler instead of UnknownContentHandler

https://gerrit.wikimedia.org/r/956482

Change 956482 merged by jenkins-bot:

[operations/mediawiki-config@master] Use FallbackContentHandler instead of UnknownContentHandler

https://gerrit.wikimedia.org/r/956482

So the patches went around and I checked that they are on snapshot03, but unfortunately I still see the error:

2023-09-12 05:20:33: enwiki (ID 14793) 683 pages (694.3|694.3/sec all|curr), 1000 revs (1016.6|1016.6/sec all|curr), ETA 2023-09-12 05:30:22 [max 600437]
MWUnknownContentModelException from line 192 of /srv/mediawiki/php-master/includes/content/ContentHandlerFactory.php: The content model 'JadeJudgment' is not registered on this wiki.
See https://www.mediawiki.org/wiki/Content_handlers to find out which extensions handle this content model.
#0 /srv/mediawiki/php-master/includes/content/ContentHandlerFactory.php(247): MediaWiki\Content\ContentHandlerFactory->validateContentHandler('JadeJudgment', NULL)
#1 /srv/mediawiki/php-master/includes/content/ContentHandlerFactory.php(181): MediaWiki\Content\ContentHandlerFactory->createContentHandlerFromHook('JadeJudgment')
#2 /srv/mediawiki/php-master/includes/content/ContentHandlerFactory.php(93): MediaWiki\Content\ContentHandlerFactory->createForModelID('JadeJudgment')
#3 /srv/mediawiki/php-master/includes/export/XmlDumpWriter.php(474): MediaWiki\Content\ContentHandlerFactory->getContentHandler('JadeJudgment')
#4 /srv/mediawiki/php-master/includes/export/XmlDumpWriter.php(402): XmlDumpWriter->writeSlot(Object(MediaWiki\Revision\SlotRecord), 1)
#5 /srv/mediawiki/php-master/includes/export/WikiExporter.php(554): XmlDumpWriter->writeRevision(Object(stdClass), Array)
#6 /srv/mediawiki/php-master/includes/export/WikiExporter.php(492): WikiExporter->outputPageStreamBatch(Object(Wikimedia\Rdbms\MysqliResultWrapper), Object(stdClass))
#7 /srv/mediawiki/php-master/includes/export/WikiExporter.php(316): WikiExporter->dumpPages('page_id >= 1900...', false)
#8 /srv/mediawiki/php-master/includes/export/WikiExporter.php(208): WikiExporter->dumpFrom('page_id >= 1900...', false)
#9 /srv/mediawiki/php-master/maintenance/includes/BackupDumper.php(355): WikiExporter->pagesByRange(190001, 195001, false)
#10 /srv/mediawiki/php-master/maintenance/dumpBackup.php(82): BackupDumper->dump(1, 1)
#11 /srv/mediawiki/php-master/maintenance/includes/MaintenanceRunner.php(685): DumpBackup->execute()
#12 /srv/mediawiki/php-master/maintenance/run.php(51): MediaWiki\Maintenance\MaintenanceRunner->run()
#13 /srv/mediawiki/multiversion/MWScript.php(159): require_once('/srv/mediawiki/...')
#14 {main}

Perhaps the override isn't being respected, or the usage isn't quite right?

The content handler is registered from what I'm seeing

ladsgroup@deployment-deploy03:~$ mwscript eval.php --wiki=enwiki
> var_dump( $wgContentHandlers );
array(17) {
  ["GadgetDefinition"]=>
  string(66) "MediaWiki\Extension\Gadgets\Content\GadgetDefinitionContentHandler"
  ["SecurePoll"]=>
  string(56) "\MediaWiki\Extension\SecurePoll\SecurePollContentHandler"
  ["sanitized-css"]=>
  string(63) "MediaWiki\Extension\TemplateStyles\TemplateStylesContentHandler"
  ["MassMessageListContent"]=>
  string(59) "MediaWiki\MassMessage\Content\MassMessageListContentHandler"
  ["flow-board"]=>
  string(32) "Flow\Content\BoardContentHandler"
  ["Scribunto"]=>
  string(53) "MediaWiki\Extension\Scribunto\ScribuntoContentHandler"
  ["JsonSchema"]=>
  string(57) "MediaWiki\Extension\EventLogging\JsonSchemaContentHandler"
  ["NewsletterContent"]=>
  string(63) "MediaWiki\Extension\Newsletter\Content\NewsletterContentHandler"
  ["story"]=>
  array(2) {
    ["class"]=>
    string(51) "MediaWiki\Extension\Wikistories\StoryContentHandler"
    ["services"]=>
    array(4) {
      [0]=>
      string(26) "Wikistories.StoryConverter"
      [1]=>
      string(26) "Wikistories.StoryValidator"
      [2]=>
      string(25) "Wikistories.StoryRenderer"
      [3]=>
      string(18) "TrackingCategories"
    }
  }
  ["wikitext"]=>
  array(2) {
    ["class"]=>
    string(22) "WikitextContentHandler"
    ["services"]=>
    array(6) {
      [0]=>
      string(12) "TitleFactory"
      [1]=>
      string(13) "ParserFactory"
      [2]=>
      string(17) "GlobalIdGenerator"
      [3]=>
      string(17) "LanguageNameUtils"
      [4]=>
      string(16) "MagicWordFactory"
      [5]=>
      string(20) "ParsoidParserFactory"
    }
  }
  ["javascript"]=>
  string(24) "JavaScriptContentHandler"
  ["json"]=>
  string(18) "JsonContentHandler"
  ["css"]=>
  string(17) "CssContentHandler"
  ["text"]=>
  string(18) "TextContentHandler"
  ["unknown"]=>
  string(22) "FallbackContentHandler"
  ["JadeEntity"]=>
  string(22) "FallbackContentHandler"
  ["JadeJudgement"]=>
  string(22) "FallbackContentHandler"
}

I think onContentHandlerForModelID hook is doing something unbecoming.

That was a red herring but I found the problem

ladsgroup@deployment-deploy03:~$ mwscript eval.php --wiki=enwiki
> $factory = \MediaWiki\MediaWikiServices::getInstance()->getContentHandlerFactory();

> $wee = \Wikimedia\TestingAccessWrapper::newFromObject( $factory );

> var_dump( $wee->handlerSpecs );
array(17) {
  ["GadgetDefinition"]=>
  string(66) "MediaWiki\Extension\Gadgets\Content\GadgetDefinitionContentHandler"
  ["SecurePoll"]=>
  string(56) "\MediaWiki\Extension\SecurePoll\SecurePollContentHandler"
  ["sanitized-css"]=>
  string(63) "MediaWiki\Extension\TemplateStyles\TemplateStylesContentHandler"
  ["MassMessageListContent"]=>
  string(59) "MediaWiki\MassMessage\Content\MassMessageListContentHandler"
  ["flow-board"]=>
  string(32) "Flow\Content\BoardContentHandler"
  ["Scribunto"]=>
  string(53) "MediaWiki\Extension\Scribunto\ScribuntoContentHandler"
  ["JsonSchema"]=>
  string(57) "MediaWiki\Extension\EventLogging\JsonSchemaContentHandler"
  ["NewsletterContent"]=>
  string(63) "MediaWiki\Extension\Newsletter\Content\NewsletterContentHandler"
  ["story"]=>
  array(2) {
    ["class"]=>
    string(51) "MediaWiki\Extension\Wikistories\StoryContentHandler"
    ["services"]=>
    array(4) {
      [0]=>
      string(26) "Wikistories.StoryConverter"
      [1]=>
      string(26) "Wikistories.StoryValidator"
      [2]=>
      string(25) "Wikistories.StoryRenderer"
      [3]=>
      string(18) "TrackingCategories"
    }
  }
  ["wikitext"]=>
  array(2) {
    ["class"]=>
    string(22) "WikitextContentHandler"
    ["services"]=>
    array(6) {
      [0]=>
      string(12) "TitleFactory"
      [1]=>
      string(13) "ParserFactory"
      [2]=>
      string(17) "GlobalIdGenerator"
      [3]=>
      string(17) "LanguageNameUtils"
      [4]=>
      string(16) "MagicWordFactory"
      [5]=>
      string(20) "ParsoidParserFactory"
    }
  }
  ["javascript"]=>
  string(24) "JavaScriptContentHandler"
  ["json"]=>
  string(18) "JsonContentHandler"
  ["css"]=>
  string(17) "CssContentHandler"
  ["text"]=>
  string(18) "TextContentHandler"
  ["unknown"]=>
  string(22) "FallbackContentHandler"
  ["JadeEntity"]=>
  string(22) "FallbackContentHandler"
  ["JadeJudgement"]=>
  string(22) "FallbackContentHandler"
}

> var_dump( $wee->handlerSpecs['JadeJudgment'] );
NULL

Wanna take a guess why?

Change 957260 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Fix typo in Jade content type name

https://gerrit.wikimedia.org/r/957260

Any review of this would be very welcome ^

Change 957260 merged by jenkins-bot:

[operations/mediawiki-config@master] Fix typo in Jade content type name

https://gerrit.wikimedia.org/r/957260