Page MenuHomePhabricator

High rate of timeouts on jsonTruncated channel upon group1 1.35.0-wmf.26 promotion
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

MediaWiki version: n/a

message

Impact

Roll back of group1 to 1.35.0-wmf.25.

Notes

Timeout errors are common during deployment but not on this channel and not for a prolonged duration (> 10 minutes).

See logstash snapshot for error details as many fields are blank.

Event Timeline

dduvall triaged this task as Unbreak Now! priority.Apr 1 2020, 7:42 PM

Mentioned in SAL (#wikimedia-operations) [2020-04-01T19:48:57Z] <marxarelli> rollback of 1.35.0-wmf.26 from group1 (T247773). blocked by T249162

Pchelolo added a subscriber: Pchelolo.

Stack trace can be parsed out of the error, here's a portion demonstrating the problem:

#1 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(80): TimedMediaHandlerHooks::activePlayerMode()\n
#2 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(34): TimedMediaIframeOutput::outputIframe(Title)\n
#3 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(174): TimedMediaIframeOutput::iframeHook(Title, NULL, RequestContext)\n
#4 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)\n
#5 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(188): Hooks::run(string, array)\n
#6 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(214): Article::newFromTitle(Title, RequestContext)\n
#7 /srv/mediawiki/php-1.35.0-wmf.26/includes/actions/Action.php(172): Article::newFromWikiPage(WikiFilePage, RequestContext)\n
#8 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(3289): Action::getActionName(RequestContext)\n
#9 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(2999): OutputPage->getJSVars()\n
#10 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(87): OutputPage->getRlClient()\n
#11 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(34): TimedMediaIframeOutput::outputIframe(Title)\n
#12 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(174): TimedMediaIframeOutput::iframeHook(Title, NULL, RequestContext)\n
#13 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)\n
#14 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(188): Hooks::run(string, array)\n
#15 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(214): Article::newFromTitle(Title, RequestContext)\n
#16 /srv/mediawiki/php-1.35.0-wmf.26/includes/actions/Action.php(172): Article::newFromWikiPage(WikiFilePage, RequestContext)\n
#17 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(3289): Action::getActionName(RequestContext)\n
#18 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(2999): OutputPage->getJSVars()\n
#19 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(87): OutputPage->getRlClient()\n
#20 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(34): TimedMediaIframeOutput::outputIframe(Title)\n
#21 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(174): TimedMediaIframeOutput::iframeHook(Title, NULL, RequestContext)\n
#22 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)\n
#23 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(188): Hooks::run(string, array)\n
#24 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(214): Article::newFromTitle(Title, RequestContext)\n
#25 /srv/mediawiki/php-1.35.0-wmf.26/includes/actions/Action.php(172): Article::newFromWikiPage(WikiFilePage, RequestContext)\n#26 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(3289)

It's an infinite loop between 'ArticleFromTitle' hook TimedMediaHandler

I'm guessing the errors referred to here are the ones at https://logstash.wikimedia.org/goto/22fe5c56d14caa0a2fdf52fee60e5d3e.

It appears the messages are all containing stack traces like

#0 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(576): {closure}(integer)
#1 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(2946): OutputPage->addModuleStyles(array)
#2 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(87): OutputPage->getRlClient()
#3 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(34): TimedMediaIframeOutput::outputIframe(Title)
#4 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(174): TimedMediaIframeOutput::iframeHook(Title, NULL, RequestContext)
#5 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#6 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(188): Hooks::run(string, array)
#7 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(214): Article::newFromTitle(Title, RequestContext)
#8 /srv/mediawiki/php-1.35.0-wmf.26/includes/actions/Action.php(172): Article::newFromWikiPage(WikiFilePage, RequestContext)
#9 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(3289): Action::getActionName(RequestContext)
#10 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(2999): OutputPage->getJSVars()
#11 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(87): OutputPage->getRlClient()
#12 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(34): TimedMediaIframeOutput::outputIframe(Title)
#13 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(174): TimedMediaIframeOutput::iframeHook(Title, NULL, RequestContext)
#14 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#15 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(188): Hooks::run(string, array)
#16 /srv/mediawiki/php-1.35.0-wmf.26/includes/page/Article.php(214): Article::newFromTitle(Title, RequestContext)
#17 /srv/mediawiki/php-1.35.0-wmf.26/includes/actions/Action.php(172): Article::newFromWikiPage(WikiFilePage, RequestContext)
#18 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(3289): Action::getActionName(RequestContext)
#19 /srv/mediawiki/php-1.35.0-wmf.26/includes/OutputPage.php(2999): OutputPage->getJSVars()
#20 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(87): OutputPage->getRlClient()
#21 /srv/mediawiki/php-1.35.0-wmf.26/extensions/TimedMediaHandler/includes/TimedMediaIframeOutput.php(34): TimedMediaIframeOutput::outputIframe(Title)
#22 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(174): TimedMediaIframeOutput::iframeHook(Title, NULL, RequestContext)
#23 /srv/mediawiki/php-1.35.0-wmf.26/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
...

The culprit seems to be rMW07f57bd27156: WikiPage/Article split. Rely on Article inside Action, in particular where it changed

@@ -152,7 +167,12 @@ final public static function getActionName( IContextSource $context ) {
                        return 'view';
                }
 
-               $action = self::factory( $actionName, $context->getWikiPage(), $context );
+               $action = self::factory(
+                       $actionName,
+                       Article::newFromWikiPage( $context->getWikiPage(), $context ),
+                       $context
+               );
                if ( $action instanceof Action ) {
                        return $action->getName();
                }

The added call to Article::newFromWikiPage() results in a call to Article::newFromTitle(), which calls the 'ArticleFromTitle' hook, and TimedMediaHandler's handler for that hook winds up calling some stuff that results in OutputPage::getJSVars() calling Action::getActionName() again.

If Action was happy being passed a WikiPage before, then probably instead of trying to convert the WikiPage to Article it should just have changed Action::factory() to accept WikiPage instead of Article.

Change 585320 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Revert "WikiPage/Article split. Rely on Article inside Action"

https://gerrit.wikimedia.org/r/585320

If Action was happy being passed a WikiPage before, then probably instead of trying to convert the WikiPage to Article it should just have changed Action::factory() to accept WikiPage instead of Article.

The purpose of these changes was to make Action only accept Article and not WikiPage so that we could get rid of the Page interface around Action. Action can't depend on just WikiPage, cause a lot of actions depend on the UI-related parts of the Article. So we would've needed to instantiate Article in a lot of places instead.

Action can't depend on just WikiPage, cause a lot of actions depend on the UI-related parts of the Article. So we would've needed to instantiate Article in a lot of places instead.

If we had been passing a WikiPage in all along from Action::getActionName(), how were all those things working until now? Or were we just lucky that whatever code paths were requiring the Article bits weren't called from any code path that was creating the Action with a WikiPage?

If we had been passing a WikiPage in all along from Action::getActionName(), how were all those things working until now? Or were we just lucky that whatever code paths were requiring the Article bits weren't called from any code path that was creating the Action with a WikiPage?

Exactly. The Action constructed from getActionName is thrown away right after it is used to call 'getName' on it, which is normally just returning a hard-coded string. The subclasses were constructed with Article/WikiPage depending on what they were using internally.

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/585320 provides a partial revert of the commit that caused this. Two new helper methods were introduced by that commit, Action::getWikiPage and Action::getArticle. The partial revert preserves both methods since since then a couple of changes in the extensions were introduced that use them.

Alternatively this partial revert, we could do a full revert of Ib8eb8a141b4b7cb72ff65afbcd17385fcc6b3b1c, however that would also require reverts of:

Implemented at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/585343

My personal preference would probably to go for the partial revert path, however I would like to wait for some code review.

Change 585338 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Revert "Remove direct access to Action::page and Article::page properties"

https://gerrit.wikimedia.org/r/585338

Change 585343 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@master] Revert "WikiPage/Article split. Rely on Article inside Action"

https://gerrit.wikimedia.org/r/585343

Change 585361 had a related patch set uploaded (by Art-Baltai; owner: Art-Baltai):
[mediawiki/core@master] Break infinity loop

https://gerrit.wikimedia.org/r/585361

Change 585320 merged by jenkins-bot:
[mediawiki/core@master] Revert "WikiPage/Article split. Rely on Article inside Action"

https://gerrit.wikimedia.org/r/585320

Change 585343 abandoned by Ppchelko:
Revert "WikiPage/Article split. Rely on Article inside Action"

Reason:
Alternative route was taken Ifa642a631caa2d265ee097711dc8727f84435ef0

https://gerrit.wikimedia.org/r/585343

Change 585338 abandoned by Ppchelko:
Revert "Remove direct access to Action::page and Article::page properties"

Reason:
Alternative route was taken Ifa642a631caa2d265ee097711dc8727f84435ef0

https://gerrit.wikimedia.org/r/585338

Change 585516 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/core@wmf/1.35.0-wmf.26] Revert "WikiPage/Article split. Rely on Article inside Action"

https://gerrit.wikimedia.org/r/585516

Partial review approach was taken, let's hope it will work.

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/585516 is a backport for wmf.26

Change 585516 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.26] Revert "WikiPage/Article split. Rely on Article inside Action"

https://gerrit.wikimedia.org/r/585516

Mentioned in SAL (#wikimedia-operations) [2020-04-02T16:49:43Z] <jforrester@deploy1001> Synchronized php-1.35.0-wmf.26/includes/actions/Action.php: T249162 Partially revert 'WikiPage/Article split. Rely on Article inside Action' (duration: 01m 07s)

Jdforrester-WMF assigned this task to Pchelolo.
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Provisionally marking as Resolved.

Change 585612 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/extensions/TimedMediaHandler@master] Switch iframHook from ArticleFromTitle hook to MediaWikiPerformAction

https://gerrit.wikimedia.org/r/585612

Pchelolo lowered the priority of this task from Unbreak Now! to Medium.

I'm reopening the ticket and lowering the priority to "medium" since it has been fixed by rolling back the changes, thus it's not a train blocker any longer, but real fix wasn't delivered yet. Real fix is implemented in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/TimedMediaHandler/+/585612

Having to instantiate an Action object just to check whether the action exists seems a bit exuberant. Presumably Action::factory( $action, $somePage, $someContext )->getName() === $action always holds unless the factory returns false / null, so it would be enough to call Action::getClass instead of Action::factory in Action::getActionName?

Looking at codesearch the only exception is SpecialPageAction mapping historysubmit to other actions and Action::getActionName already has special handling for that.

@Tgr T249265#6025206 has some more info about it and a POC patch for how to get rid of instantiation

Change 585612 merged by jenkins-bot:
[mediawiki/extensions/TimedMediaHandler@master] Switch iframeHook from ArticleFromTitle hook to MediaWikiPerformAction

https://gerrit.wikimedia.org/r/585612

Change 585361 abandoned by Art-Baltai:
Break infinity loop

Reason:
NOT ACTUAL

https://gerrit.wikimedia.org/r/585361