Page MenuHomePhabricator

Fatal exception when cloning/saving banners with translatable messages
Closed, ResolvedPublic4 Estimated Story Points

Description

Getting internal errors when trying to clone/save certain banners. It appears to be something to do with the translatable messages, since I haven't seen the problem for our normal banners without such messages.

Attempting to clone peter_WMCH_dsk_lg gave

Internal error
[V@uMVwpAIC8AAFj3YUsAAACM] 2016-09-28 09:24:39: Fatal exception of type "MWException"

I then tried making a new banner at peter_WMCH_dsk_lg_alt1, and copying and pasting the content from the previous one. This worked at first, but attempting to save any changes to it after the initial one gives a similar error:

Internal error
[V@uLSwpAIDUAAEfWOxQAAABM] 2016-09-28 09:20:11: Fatal exception of type "MWException"

Event Timeline

Pcoombe triaged this task as High priority.Sep 28 2016, 9:34 AM

@AndyRussG note that we can get the details of these exceptions from fluorine:/a/mw-log/ exception.log or fatal.log, by searching for those exception IDs.

In logstash, I see a bunch of stuff like this:

{"id":"V@0v0gpAAEIAAd5oBDcAAAEJ","type":"MWException","file":"/srv/mediawiki/php-1.28.0-wmf.20/includes/Revision.php","line":1555,"message":"Content of revision (CNBanner:B16WMDE_01_160927_size2-no-interval-message/en) could not be loaded for validation!

Pcoombe renamed this task from Fatal exception when cloning/saving banners to Fatal exception when cloning/saving banners with translatable messages.Sep 30 2016, 8:31 AM
Pcoombe raised the priority of this task from High to Unbreak Now!.
Pcoombe added a subscriber: kai.nissen.

I was able to reproduce this on the beta cluster, by following the steps in the description of the duplicate bug (T147002):

  • Add a new banner
  • Enter some code or text into the banner code textarea
  • Save
  • Play with the banner settings
  • Save
  • Enter a translatable message into the banner code textarea
  • Save
  • Edit the translatable message, edit other options, or just hit "Save" again
  • Exception is thrown

Here's the error and backtrace... Same as what we're seeing on production:

[V@8EDQpEFhUAAB0FC5gAAAAA] /wiki/Special:CentralNoticeBanners/edit/TestBannerT147002 MWException from line 1555 of /srv/mediawiki/php-master/includes/Revision.php: Content of revision (CNBanner:TestBannerT147002-helloi18n/en) could not be loaded for validation!

Backtrace:

#0 /srv/mediawiki/php-master/includes/Revision.php(1410): Revision->checkContentModel()
#1 /srv/mediawiki/php-master/includes/page/WikiPage.php(2732): Revision->insertOn(DatabaseMysqli)
#2 /srv/mediawiki/php-master/includes/page/WikiPage.php(2572): WikiPage->insertProtectNullRevision(string, array, array, boolean, string, User)
#3 /srv/mediawiki/php-master/extensions/CentralNotice/includes/BannerMessage.php(143): WikiPage->doUpdateRestrictions(array, array, boolean, string, User)
#4 /srv/mediawiki/php-master/extensions/CentralNotice/includes/BannerMessage.php(116): BannerMessage->protectMessageInCnNamespaces(WikiPage, User)
#5 /srv/mediawiki/php-master/extensions/CentralNotice/special/SpecialCentralNoticeBanners.php(866): BannerMessage->update(string, string, User, string)
#6 /srv/mediawiki/php-master/extensions/CentralNotice/special/SpecialCentralNoticeBanners.php(835): SpecialCentralNoticeBanners->processSaveBannerAction(array)
#7 /srv/mediawiki/php-master/includes/htmlform/HTMLForm.php(656): SpecialCentralNoticeBanners->processEditBanner(array, CentralNoticeHtmlForm)
#8 /srv/mediawiki/php-master/includes/htmlform/HTMLForm.php(553): HTMLForm->trySubmit()
#9 /srv/mediawiki/php-master/extensions/CentralNotice/special/SpecialCentralNoticeBanners.php(417): HTMLForm->tryAuthorizedSubmit()
#10 /srv/mediawiki/php-master/extensions/CentralNotice/special/SpecialCentralNoticeBanners.php(76): SpecialCentralNoticeBanners->showBannerEditor()
#11 /srv/mediawiki/php-master/includes/specialpage/SpecialPage.php(522): SpecialCentralNoticeBanners->execute(string)
#12 /srv/mediawiki/php-master/includes/specialpage/SpecialPageFactory.php(583): SpecialPage->run(string)
#13 /srv/mediawiki/php-master/includes/MediaWiki.php(283): SpecialPageFactory::executePath(Title, RequestContext)
#14 /srv/mediawiki/php-master/includes/MediaWiki.php(861): MediaWiki->performRequest()
#15 /srv/mediawiki/php-master/includes/MediaWiki.php(522): MediaWiki->main()
#16 /srv/mediawiki/php-master/index.php(43): MediaWiki->run()
#17 /srv/mediawiki/w/index.php(3): include(string)
#18 {main}

I wonder if this might be related to our other mystery bug, T144952? The error is being thrown after Revision::getContent() returns falsy for a translatable banner message.

I'm unable to reproduce the error locally with the Translate Extension enabled... Same result as you got, @Ejegg, right...?

Still unable to reproduce locally, but I was able to at least debug through the same code where the error occurs. Just had to add the following to my LocalSettings.php and follow the steps above anew.

require "$IP/extensions/Translate/Translate.php";
$wgGroupPermissions['user']['translate'] = true;
$wgGroupPermissions['user']['translate-messagereview'] = true;
$wgGroupPermissions['sysop']['pagetranslation'] = true;
$wgTranslateWorkflowStates = array(
	'new' => array( 'color' => 'FF0000' ), // red
	'needs_proofreading' => array( 'color' => '0000FF' ), // blue
	'ready' => array( 'color' => 'FFFF00' ), // yellow
	'published' => array(
		'color' => '00FF00', // green
		'right' => 'centralnotice-admin',
	),
);

P.S. Also thx @Ejegg for suggesting the above settings! ^

Here is my theory so far:

  • When you add a new translatable message, a new page is created in the CNBanner namespace by BannerMessage::update().
  • Immediately after, in the same request, BannerMessage::protectMessageInCnNamespaces() tries to protect the new page.
  • This second call tries to retrieve and verify the content of the new page. It goes first to the ObjectCache, then falls back to the database, but uses the replica instead of master. In our case, it seems, it doesn't try the master db.

Working on verifying that this is what's up...

Adding @Krinkle and @aaron because Revision.php... :) Thx in advance 4 any suggestions!!

If the theory mentioned above is correct, then this patch (deployed last week for T138310) also fixed this bug. Since there are no errors of this sort in the logs since September 29, I'm marking the task resolved! :) Thanks all!!