Page MenuHomePhabricator

PHP Notice: Array to string conversion
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
labels.normalized_message
[{reqId}] {exception_url}   PHP Notice: Array to string conversion
error.stack_trace
from /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(64)
#0 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(64): MWExceptionHandler::handleError(int, string, string, int, array)
#1 /srv/mediawiki/php-1.44.0-wmf.1/vendor/zordius/lightncandy/src/Runtime.php(413): MediaWiki\Html\TemplateParser->{closure}(array, array)
#2 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(69): LightnCandy\Runtime::sec(array, array, null, array, bool, Closure)
#3 [internal function]: MediaWiki\Html\TemplateParser->{closure}(array, array, string)
#4 /srv/mediawiki/php-1.44.0-wmf.1/vendor/zordius/lightncandy/src/Runtime.php(553): call_user_func(Closure, array, array, string)
#5 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(81): LightnCandy\Runtime::p(array, string, array, int, string)
#6 /srv/mediawiki/php-1.44.0-wmf.1/vendor/zordius/lightncandy/src/Runtime.php(413): MediaWiki\Html\TemplateParser->{closure}(array, array)
#7 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(81): LightnCandy\Runtime::sec(array, array, null, array, bool, Closure)
#8 [internal function]: MediaWiki\Html\TemplateParser->{closure}(array, array, string)
#9 /srv/mediawiki/php-1.44.0-wmf.1/vendor/zordius/lightncandy/src/Runtime.php(553): call_user_func(Closure, array, array, string)
#10 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(89): LightnCandy\Runtime::p(array, string, array, int, string)
#11 /srv/mediawiki/php-1.44.0-wmf.1/vendor/zordius/lightncandy/src/Runtime.php(441): MediaWiki\Html\TemplateParser->{closure}(array, array)
#12 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(90): LightnCandy\Runtime::sec(array, array, null, array, bool, Closure)
#13 [internal function]: MediaWiki\Html\TemplateParser->{closure}(array, array, string)
#14 /srv/mediawiki/php-1.44.0-wmf.1/vendor/zordius/lightncandy/src/Runtime.php(553): call_user_func(Closure, array, array, string)
#15 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(93): LightnCandy\Runtime::p(array, string, array, int)
#16 [internal function]: MediaWiki\Html\TemplateParser->{closure}(array, array, string)
#17 /srv/mediawiki/php-1.44.0-wmf.1/vendor/zordius/lightncandy/src/Runtime.php(553): call_user_func(Closure, array, array, string)
#18 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(174): LightnCandy\Runtime::p(array, string, array, int, string)
#19 /srv/mediawiki/php-1.44.0-wmf.1/vendor/zordius/lightncandy/src/Runtime.php(441): MediaWiki\Html\TemplateParser->{closure}(array, array)
#20 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(177) : eval()'d code(174): LightnCandy\Runtime::sec(array, array, null, array, bool, Closure)
#21 /srv/mediawiki/php-1.44.0-wmf.1/includes/Html/TemplateParser.php(297): MediaWiki\Html\TemplateParser->{closure}(array, array)
#22 /srv/mediawiki/php-1.44.0-wmf.1/includes/skins/SkinMustache.php(95): MediaWiki\Html\TemplateParser->processTemplate(string, array)
#23 /srv/mediawiki/php-1.44.0-wmf.1/includes/skins/SkinTemplate.php(180): SkinMustache->generateHTML()
#24 /srv/mediawiki/php-1.44.0-wmf.1/includes/skins/Skin.php(683): SkinTemplate->outputPage()
#25 /srv/mediawiki/php-1.44.0-wmf.1/includes/Output/OutputPage.php(3193): Skin->outputPageFinal(MediaWiki\Output\OutputPage)
#26 /srv/mediawiki/php-1.44.0-wmf.1/includes/actions/ActionEntryPoint.php(163): MediaWiki\Output\OutputPage->output(bool)
#27 /srv/mediawiki/php-1.44.0-wmf.1/includes/MediaWikiEntryPoint.php(200): MediaWiki\Actions\ActionEntryPoint->execute()
#28 /srv/mediawiki/php-1.44.0-wmf.1/index.php(58): MediaWiki\MediaWikiEntryPoint->run()
#29 /srv/mediawiki/w/index.php(3): require(string)
#30 {main}
Impact

A high error rate since promotion of 1.44.0-wmf.1 to testwikis and subsequently group0.

Notes

Caused by c12f61709bed / https://gerrit.wikimedia.org/r/c/mediawiki/skins/MinervaNeue/+/1082092

Event Timeline

dduvall triaged this task as Unbreak Now! priority.

Call stack pretty early calls new code in includes/skins/Skin.php:683 introduced recently in rMWc7656257159a3d85422f4558f5add6df1564f3c3

Change #1084254 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/core@master] Revert "Skin: [BREAKING CHANGE] Remove support for rendering outside body element"

https://gerrit.wikimedia.org/r/1084254

Change #1084255 had a related patch set uploaded (by Zabe; author: Jforrester):

[mediawiki/core@wmf/1.44.0-wmf.1] Revert "Skin: [BREAKING CHANGE] Remove support for rendering outside body element"

https://gerrit.wikimedia.org/r/1084255

Call stack pretty early calls new code in includes/skins/Skin.php:683 introduced recently in rMWc7656257159a3d85422f4558f5add6df1564f3c3

Pretty sure that's a red herring. SkinTemplate->outputPage() is in pretty much any error stack and this is a LightnCandy error. It's possible previously this bug was being swallowed silently.

Has anyone been able to replicate this locally?

I'm looking to see if I can get a reproduction locally to investigate this some more.

Call stack pretty early calls new code in includes/skins/Skin.php:683 introduced recently in rMWc7656257159a3d85422f4558f5add6df1564f3c3

Pretty sure that's a red herring. SkinTemplate->outputPage() is in pretty much any error stack and this is a LightnCandy error. It's possible previously this bug was being swallowed silently.

Has anyone been able to replicate this locally?

No luck. The only clue now is it happen the most on user talkpages

Change #1084254 merged by jenkins-bot:

[mediawiki/core@master] Revert "Skin: [BREAKING CHANGE] Remove support for rendering outside body element"

https://gerrit.wikimedia.org/r/1084254

Call stack pretty early calls new code in includes/skins/Skin.php:683 introduced recently in rMWc7656257159a3d85422f4558f5add6df1564f3c3

Pretty sure that's a red herring. SkinTemplate->outputPage() is in pretty much any error stack and this is a LightnCandy error. It's possible previously this bug was being swallowed silently.

Has anyone been able to replicate this locally?

No luck. The only clue now is it happen the most on user talkpages

I'm very troubled by this. The error suggests that some hook is creating data in a non-compatible way with a template.

If the issue does disappear with https://gerrit.wikimedia.org/r/1084254 it means it is likely still happening but being suppressed which could be masking a bigger issue, but if we haven't got replication steps I'm not sure how we restore the patch and avoid the issue happening again.

The logstash error doesn't tell me much about the state in which the error occurred - e.g. which skin and whether the user is logged in. I suspect there is some hook being triggered in very specific circumstances.

Is there any way to get notices to render in the browser so I can perhaps narrow it down to the template where the issue is occurring ?

Change #1084255 abandoned by Zabe:

[mediawiki/core@wmf/1.44.0-wmf.1] Revert "Skin: [BREAKING CHANGE] Remove support for rendering outside body element"

https://gerrit.wikimedia.org/r/1084255

I have identified the issue and reproduced locally. It is not related to the reverted Skin patch.

The logstash error doesn't tell me much about the state in which the error occurred - e.g. which skin and whether the user is logged in. I suspect there is some hook being triggered in very specific circumstances.

This would have helped a lot... The eval() blackhole really compounded the problem here and made this quite painful to debug.

Change #1084753 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/skins/MinervaNeue@master] Revert "Use array instead of string for class list"

https://gerrit.wikimedia.org/r/1084753

Change #1084759 had a related patch set uploaded (by Ammarpad; author: Ammarpad):

[mediawiki/skins/MinervaNeue@wmf/1.44.0-wmf.1] Revert "Use array instead of string for class list"

https://gerrit.wikimedia.org/r/1084759

Change #1084753 merged by jenkins-bot:

[mediawiki/skins/MinervaNeue@master] Revert "Use array instead of string for class list"

https://gerrit.wikimedia.org/r/1084753

Change #1084759 merged by jenkins-bot:

[mediawiki/skins/MinervaNeue@wmf/1.44.0-wmf.1] Revert "Use array instead of string for class list"

https://gerrit.wikimedia.org/r/1084759

Mentioned in SAL (#wikimedia-operations) [2024-10-30T18:21:29Z] <dduvall@deploy2002> Started scap sync-world: Backport for [[gerrit:1084759|Revert "Use array instead of string for class list" (T378531)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-30T18:23:54Z] <dduvall@deploy2002> ammarpad, dduvall: Backport for [[gerrit:1084759|Revert "Use array instead of string for class list" (T378531)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-10-30T18:27:16Z] <dduvall> monitoring testwiki error rates for a few minutes to see if the error related to T378531 subsides (current rate is 23 errors in the last 15 minutes)

I backported https://gerrit.wikimedia.org/r/c/mediawiki/skins/MinervaNeue/+/1084759 prior to train deployment today. However, I'm still seeing errors.

I went ahead and completed the backport sync since it seemed harmless, but I'm hesitant to continue train without a fix to this.

Mentioned in SAL (#wikimedia-operations) [2024-10-30T18:40:34Z] <dduvall@deploy2002> Finished scap sync-world: Backport for [[gerrit:1084759|Revert "Use array instead of string for class list" (T378531)]] (duration: 19m 04s)

@dduvall, where can I test this? Test wiki and a couple of wikis where there's error are still showing MinervaNeue – (ecd660e) 19:20, 28 October 2024 for me.

@Ammarpad I may have been too hasty! It seems like errors are finally subsiding after the full sync.

@dduvall, where can I test this? Test wiki and a couple of wikis where there's error are still showing MinervaNeue – (ecd660e) 19:20, 28 October 2024 for me.

This is a bit curious and does not match the current state of staging.

dduvall@deploy2002:/srv/mediawiki-staging/php-1.44.0-wmf.1$ git submodule status skins/MinervaNeue/
 7b819ad803cd779f9f777ae6fd60305a9b4ee335 skins/MinervaNeue (remotes/origin/wmf/1.44.0-wmf.1)

@dduvall, where can I test this? Test wiki and a couple of wikis where there's error are still showing MinervaNeue – (ecd660e) 19:20, 28 October 2024 for me.

This is a bit curious and does not match the current state of staging.

dduvall@deploy2002:/srv/mediawiki-staging/php-1.44.0-wmf.1$ git submodule status skins/MinervaNeue/
 7b819ad803cd779f9f777ae6fd60305a9b4ee335 skins/MinervaNeue (remotes/origin/wmf/1.44.0-wmf.1)

So there appear to be two conflicting entries in the git info cache. Perhaps this is a bug in scap somewhere. I'll investigate further and file a bug if necessary.

dduvall@deploy2002:/srv/mediawiki-staging/php-1.44.0-wmf.1$ jq . < cache/gitinfo/info-skins-MinervaNeue.json 
{
  "@directory": "/srv/mediawiki-staging/php-1.44.0-wmf.1/skins/MinervaNeue",
  "head": "7b819ad803cd779f9f777ae6fd60305a9b4ee335",
  "headSHA1": "ecd660e0f7b61c9bc0311acd9412628750c5e219",
  "headCommitDate": "1730139607",
  "branch": "7b819ad803cd779f9f777ae6fd60305a9b4ee335",
  "remoteURL": "https://gerrit.wikimedia.org/r/mediawiki/skins/MinervaNeue"
}

In any case, the backport was definitely deployed and the errors have subsided.

Thanks for the details.

We can consider this task as resolved since Logstash shows there are no more errors

<sidequest>

dduvall@deploy2002:/srv/mediawiki-staging/php-1.44.0-wmf.1$ git submodule status skins/MinervaNeue/
 7b819ad803cd779f9f777ae6fd60305a9b4ee335 skins/MinervaNeue (remotes/origin/wmf/1.44.0-wmf.1)

So there appear to be two conflicting entries in the git info cache. Perhaps this is a bug in scap somewhere. I'll investigate further and file a bug if necessary.

dduvall@deploy2002:/srv/mediawiki-staging/php-1.44.0-wmf.1$ jq . < cache/gitinfo/info-skins-MinervaNeue.json 
{
  "@directory": "/srv/mediawiki-staging/php-1.44.0-wmf.1/skins/MinervaNeue",
  "head": "7b819ad803cd779f9f777ae6fd60305a9b4ee335",
  "headSHA1": "ecd660e0f7b61c9bc0311acd9412628750c5e219",
  "headCommitDate": "1730139607",
  "branch": "7b819ad803cd779f9f777ae6fd60305a9b4ee335",
  "remoteURL": "https://gerrit.wikimedia.org/r/mediawiki/skins/MinervaNeue"
}

The "headSHA1" value here is pointing at the merge commit rSMINecd660e0f7b6: Merge "Use array instead of string for class list". The "head" is pointing at rSMIN7b819ad803cd: Revert "Use array instead of string for class list". We would typically expect the "head" and "headSHA1" values to match except in the event of a non-public commit having been added to the repo (typically an embargoed security patch). When that happens "head" will be the hash of the actual HEAD of the branch/submodule and "headSHA1" will be the hash found by git merge-base HEAD origin, the first common hash in both the working tree and the upstream origin.

$ pwd
/srv/mediawiki-staging/php-1.44.0-wmf.1/skins/MinervaNeue
$ cat .git
gitdir: ../../.git/modules/skins/MinervaNeue
$ cat ../../.git/modules/skins/MinervaNeue/HEAD
7b819ad803cd779f9f777ae6fd60305a9b4ee335
$ git show --pretty=fuller --no-patch 7b819ad803cd779f9f777ae6fd60305a9b4ee335
commit 7b819ad803cd779f9f777ae6fd60305a9b4ee335 (HEAD, origin/wmf/1.44.0-wmf.1)
Author:     Ammarpad <ammarpad@yahoo.com>
AuthorDate: Wed Oct 30 12:30:08 2024 +0000
Commit:     Ammarpad <ammarpad@yahoo.com>
CommitDate: Wed Oct 30 12:33:30 2024 +0000

    Revert "Use array instead of string for class list"

    This reverts commit c12f61709bed7d8c7b2d85b5b3e850dcb17d6d46.

    Reason for revert: Does not work and broke the train

    Bug: T378531
    Change-Id: I3a91169276b8ebffbdf409aac4e3731298047196
$ git merge-base HEAD origin
ecd660e0f7b61c9bc0311acd9412628750c5e219
$ git show --pretty=fuller --no-patch ecd660e0f7b61c9bc0311acd9412628750c5e219
commit ecd660e0f7b61c9bc0311acd9412628750c5e219 (master)
Merge: 3f733bea c12f6170
Author:     jenkins-bot <jenkins-bot@gerrit.wikimedia.org>
AuthorDate: Mon Oct 28 18:20:07 2024 +0000
Commit:     Gerrit Code Review <gerrit@wikimedia.org>
CommitDate: Mon Oct 28 18:20:07 2024 +0000

    Merge "Use array instead of string for class list"
$ git l | head -5
* 7b819ad8 Revert "Use array instead of string for class list" (8 hours ago, Ammarpad) (HEAD, origin/wmf/1.44.0-wmf.1)
*   ecd660e0 Merge "Use array instead of string for class list" (2 days ago, jenkins-bot) (master)
|\
| * c12f6170 Use array instead of string for class list (9 days ago, Fomafix)
* |   3f733bea Merge "Use PHP type hints in PHPUnit tests" (2 days ago, jenkins-bot)

git l is my custom alias for git log --pretty=format:'%C(yellow)%h%C(reset) %s %C(bold green)(%C(reset)%C(green)%ar%C(reset)%C(bold green),%C(reset) %C(magenta)%an%C(reset)%C(bold green))%C(reset)%C(cyan)%d%C(reset)' --abbrev-commit --graph

I think what is happening here is that git merge-base HEAD origin is checking against the default upstream branch ("master" in this case) rather than the "wmf/1.44.0-wmf.1" branch that is more expected.

$ git merge-base HEAD origin
ecd660e0f7b61c9bc0311acd9412628750c5e219
$ git merge-base HEAD origin/wmf/1.44.0-wmf.1
7b819ad803cd779f9f777ae6fd60305a9b4ee335

</sidequest>

For future reference, anything with LightnCandy in the stack trace should be an indication of an issue with rendering a template. This can be replicated quite easily by putting {{{data-footer}}} in Vector's skin.mustache file (eg. trying to render an array as a string).

@dduvall I'm a little concerned that we reverted a change based on an educated guess (T378531#10274398) and just assumed that was correct without further investigation and probing. If Ammarpad hadn't investigated this further, we would have likely backported the wrong change and the train would have been blocked by an additional day. Is there anything we can do to improve our handling/analysis of UBNs like this in future?

Thanks @Ammarpad for taking the time to investigating this and find the correct fix. I appreciate you!

@dduvall I'm a little concerned that we reverted a change based on an educated guess (T378531#10274398) and just assumed that was correct without further investigation and probing. If Ammarpad hadn't investigated this further, we would have likely backported the wrong change and the train would have been blocked by an additional day. Is there anything we can do to improve our handling/analysis of UBNs like this in future?

I can't think of any additional safeguards in the deployment process. There was a peer reviewed patch (multiple reviewers) which was merged into master and a backported patch submitted to wmf/1.44.0-wmf.1 which was eventually abandoned. While that's definitely not an ideal scenario, we did avoid a deployment incident.

My impression is that the initial confusion was due to the use of eval() which made it very difficult to discern the problem based on static analysis alone. If someone can think of a way to refactor the execution of compiled templates to give us useful stacktraces (e.g. write to tmpfs, include and log contents upon exceptions? I'm not a PHP dev, so just theorizing, sorry...) that would improve triage of such errors in the future.

Thanks @Ammarpad for taking the time to investigating this and find the correct fix. I appreciate you!

Hear hear. Thanks from me as well, @Ammarpad

@dduvall I'm a little concerned that we reverted a change based on an educated guess (T378531#10274398) and just assumed that was correct without further investigation and probing.

This is a mischaracterization.

  • The comment you're pointing to was not an educated guess, it was an observation—code changed this week which caused this error to surface.
  • The revert was not assumed correct:
    • It was made by a MediaWiki developer in the absence of any discussion on that task
    • It was peer review by another MediaWiki developer
    • The code was correct in the sense that it squashed the error – as you mentioned after the patch was made, "It's possible previously this bug was being swallowed silently."

And it seems like investigation and probing is what happened here after the revert was made. Thanks @Ammarpad for that!

If Ammarpad hadn't investigated this further, we would have likely backported the wrong change and the train would have been blocked by an additional day. Is there anything we can do to improve our handling/analysis of UBNs like this in future?

I'll +1 @dduvall—from the rollout side, seems like everything went correctly.

  • There was logspam—a message in the logs that seems harmless but can overwhelm logstash and makes spotting real problems harder
  • We filed the task, tagged it, added as a blocker to the train task
  • There was no movement on the task
  • We tried try to spur conversation on the task by adding additional observations we have—code rolling out this week is in the stack trace—that's often an indication that a revert would fix the problem
  • A MediaWiki developer made a revert
  • That revert was peer reviewed and merged
  • We were prepared to backport that fix
  • @Ammarpad made a better fix <3