Page MenuHomePhabricator

TypeError: Cannot assign null to property Shellbox\\Command\\Command::$includeStderr of type bool in Shellbox (server) v4.5.0
Open, Needs TriagePublic

Description

Earlier today, we tried to deploy Shellbox server using a newly built image, reflecting what has now been released as Shellbox v4.5.0. While updates to 5 of the 6 server variants showed no obvious issues issues, shellbox-syntaxhighlight very much did, returning nearly all 500 errors while running the new image.

So, what's up?

While we assumed these were almost definitely the result of errors handled by Server::handleException (i.e., returned as 500 with json body containing exception details) there were no corresponding exceptions / errors surfaced in MediaWiki error channels logstash, which was puzzling. I'll get to why shortly.

In any case, without obvious logs to work from, I briefly redeployed the suspect image to a single pod in codfw with catch_workers_output=1 in FPM ini settings, which allowed me to recover (P93613):

Exception of class TypeError: Cannot assign null to property Shellbox\\Command\\Command::$includeStderr of type bool
#0 /srv/app/src/Command/BoxedCommand.php(474): Shellbox\Command\Command->setClientData(Array)
#1 /srv/app/src/Action/ShellAction.php(92): Shellbox\Command\BoxedCommand->setClientData(Array)
#2 /srv/app/src/Action/ShellAction.php(27): Shellbox\Action\ShellAction->createCommand(Array)
#3 /srv/app/src/Action/MultipartAction.php(60): Shellbox\Action\ShellAction->execute(Array)
#4 /srv/app/src/Server.php(134): Shellbox\Action\MultipartAction->baseExecute(Array)
#5 /srv/app/src/Server.php(75): Shellbox\Server->guardedExecute('/srv/app/config...')
#6 /srv/app/src/Server.php(64): Shellbox\Server->execute('/srv/app/config...')
#7 /srv/app/index.php(3): Shellbox\Server::main('/srv/app/config...')
#8 {main}

In short, I believe this is a combination of https://gerrit.wikimedia.org/r/c/mediawiki/libs/Shellbox/+/1267941 and https://gerrit.wikimedia.org/r/c/mediawiki/libs/Shellbox/+/1285399.

  • MediaWiki vendor contains Shellbox v4.4.0, and thus Command::includeStderr is not initialized with its (new) default value (false).
  • Pygments::highlight (i.e., the majority of syntaxhighlight Shellbox invocations) executes the Shellbox call without explicitly setting includeStderr, which then serializes to null in the command data sent to Shellbox server (json).
  • The call to Command::setClientData on the server using the deserialized data attempts to set a bool property to null, and sadness ensues.

Why did we only see this in syntaxhighlight?

While there are presumably other BoxedCommand execution code paths that don't explicitly set includeStderr, I suspect they're just a lot more rare than Pygments::highlight. Which is to say, if we'd not rolled back within minutes, we'd probably see other examples.

Why didn't we see anything in logstash?

The ShellboxError thrown by the client will be converted to a PygmentsException which is then only logged via wfWarn.

What should we do about it?

I suspect the simplest option is to just update MediaWiki vendor to Shellbox v4.5.0 before updating the server.

That should ensure that the Command (and BoxedCommand) properties are always serialized by the client with values that will satisfy type checks when deserialized at the server. That should also be safe to do before the server is updated (i.e., those values should indeed be backward compatible).

The way that would fail us is if we also have the same problem in reverse - i.e., if the serialized response values from a server running older code will not type check when used in newer client code. I'm cautiously optimistic that's less likely, but if it does happen, we'll need to do something more involved.

Event Timeline

Moving this to our Radar (Pending) since it should not block work in T427820: Migrate Shellbox image to Bookworm (i.e., we have options available for triggering rebuilds at older commits), but it will have implications for how we proceed on other work that touches Shellbox.

https://gerrit.wikimedia.org/r/c/mediawiki/libs/Shellbox/+/1285399/8/src/Command/Command.php#b26 specifically going from private $includeStderr;, adding a type, and a default to private bool $includeStderr = false;, along with the declare( strict_types = 1 ); in the other patch would indeed seem to tally up.

Upon closer inspection, I suspect the only Command property in the serialized client data that was not otherwise guaranteed to take a value that will satisfy type checks upon assignment in setClientData on v4.5.0 is includeStderr. Meaning, as long as that's the only issue, we probably could get away with just changing L470 in setClientData to $value ?? false. That would make that function compatible with v4.4.0-serialized client data, at the expense of introducing what feels like kind of a hack.