Page MenuHomePhabricator

PHP Warning "headers already sent" from exception reporter for fatals that happen post-send
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

Request URL: GET www.wikidata.org /w/api.php?action=wbgetentities& …
Request ID: XQE22ApAIDQAAAwmDKoAAADJ

message
PHP Warning: Cannot modify header information - headers already sent by (output started at /srv/mediawiki/php-1.34.0-wmf.8/includes/MediaWiki.php:745)
trace
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.34.0-wmf.8/includes/libs/HttpStatus.php(112): header(string)
#2 /srv/mediawiki/php-1.34.0-wmf.8/includes/GlobalFunctions.php(1651): HttpStatus::header(integer)
#3 /srv/mediawiki/php-1.34.0-wmf.8/includes/exception/MWExceptionRenderer.php(44): wfHttpError(integer, string, string)
#4 /srv/mediawiki/php-1.34.0-wmf.8/includes/exception/MWExceptionHandler.php(121): MWExceptionRenderer::output(WMFTimeoutException, integer)
#5 /srv/mediawiki/php-1.34.0-wmf.8/includes/exception/MWExceptionHandler.php(195): MWExceptionHandler::report(WMFTimeoutException)
#6 /srv/mediawiki/php-1.34.0-wmf.8/includes/exception/MWExceptionHandler.php(167): MWExceptionHandler::handleException(WMFTimeoutException)
#7 [internal function]: MWExceptionHandler::handleUncaughtException(WMFTimeoutException)

Impact

This is a meta error. It happens while handling another higher impact issue. The meta issue itself does not directly impact end-users. The accidental side-effect of this error is that the user does not get to see an error page (because we have already closed the response stream), which happens to be the desired behaviour.

The error would be resolved by having the code not try something that we know is not valid, and incidentally something we don't actually want.

The impact is that it causes noise in Logstash. The type of error it causes (headers already sent) is not generally innocent and easy to ignore, which means the noise is hard to filter out in Logstash, which in turn makes regressions harder to detect, both for humans, and during deployments by Scap (e.g. it could cause an aborted deployment due to a random spike in traffic).

Event Timeline

The ExceptionHandler reporter might be able to use the same detection for this issue that we use in MediaWiki.php, which is !( $output->isDisabled() || headers_sent() ).

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:06 PM

I've looked at this again today and managed to narrow it down to api.php requests. It only happens in production for /w/api.php … requests.

Example
PHP Warning: Cannot modify header information - headers already sent by (output started at /srv/mediawiki/php-1.35.0-wmf.24/includes/MediaWiki.php:847)

#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.35.0-wmf.24/includes/GlobalFunctions.php(1615): header(string)
#2 /srv/mediawiki/php-1.35.0-wmf.24/includes/exception/MWExceptionRenderer.php(55): wfHttpError(integer, string, string)
#3 /srv/mediawiki/php-1.35.0-wmf.24/includes/exception/MWExceptionHandler.php(103): MWExceptionRenderer::output(WMFTimeoutException, integer)
#4 /srv/mediawiki/php-1.35.0-wmf.24/includes/exception/MWExceptionHandler.php(177): MWExceptionHandler::report(WMFTimeoutException)
#5 /srv/mediawiki/php-1.35.0-wmf.24/includes/exception/MWExceptionHandler.php(149): MWExceptionHandler::handleException(WMFTimeoutException)
#6 [internal function]: MWExceptionHandler::handleUncaughtException(WMFTimeoutException)

It seems all paths out of MWExceptionRenderer::output() for web contexts (not CLI) handle headers_sent(), except for the case under defined( 'MW_API' ) which outsources its output handling to wfHttpError() which doesn't expect to be called post-send.

Given that MWExceptionRenderer already has its own handling for printing a HTTP 500 error with a minimal HTML document, I wonder if it would make sense to stop using wfHttpError() for this. Or perhaps to sprinkle another header_sent() check in the mix?

Unfortunately there's little indication as to how some exception is making it past the Action API's normal exception handling mechanisms to get to the uncaught exception handler, although the fact that it's WMFTimeoutException is suggestive.

It seems there were only two instances in the past week:

2020-03-23 17:30:47 [XnjyCwpAEKYAAwS626AAAABU] mw1402 enwiki 1.35.0-wmf.24 exception ERROR: [XnjyCwpAEKYAAwS626AAAABU] /w/api.php?a...   WMFTimeoutException from line 39 of /srv/mediawiki/wmf-config/set-time-limit.php: the execution time limit of 60 seconds was exceeded
[Exception WMFTimeoutException] (/srv/mediawiki/wmf-config/set-time-limit.php:39) the execution time limit of 60 seconds was exceeded
  #0 /srv/mediawiki/php-1.35.0-wmf.24/includes/MediaWiki.php(848): {closure}(integer)
  #1 /srv/mediawiki/php-1.35.0-wmf.24/api.php(119): MediaWiki->doPostOutputShutdown()
  #2 /srv/mediawiki/w/api.php(3): require(string)
  #3 {main}
2020-03-19 03:53:23 [XnLsdwpAIDUAALOehCsAAAAV] mw1341 wikidatawiki 1.35.0-wmf.23 exception ERROR: [XnLsdwpAIDUAALOehCsAAAAV] /w/api.php?...   WMFTimeoutException from line 39 of /srv/mediawiki/wmf-config/set-time-limit.php: the execution time limit of 60 seconds was exceeded
[Exception WMFTimeoutException] (/srv/mediawiki/wmf-config/set-time-limit.php:39) the execution time limit of 60 seconds was exceeded
  #0 /srv/mediawiki/php-1.35.0-wmf.23/includes/MediaWiki.php(847): {closure}(integer)
  #1 /srv/mediawiki/php-1.35.0-wmf.23/api.php(119): MediaWiki->doPostOutputShutdown()
  #2 /srv/mediawiki/w/api.php(3): require(string)
  #3 {main}

In both cases it happened after the API completed, during the post-output shutdown. All the older instances I checked were also WMFTimeoutException thrown from within doPostOutputShutdown(), most with the same trace but a few made it a bit further. So that explains it bypassing the API's normal error handling.

Given that MWExceptionRenderer already has its own handling for printing a HTTP 500 error with a minimal HTML document, I wonder if it would make sense to stop using wfHttpError() for this.

It looks like the special case for MW_API was added in rSVN111955: Don't output skinned errors in case of unhandled API exceptions. Unfortunately there is no bug link or further details. But from a practical standpoint, there's little point in serving a 12K error page when a 400-some byte error page will have the same effect.[1]

Looking at MWExceptionRenderer, it seems that its AS_PRETTY mode does skinned output while AS_RAW is even more minimal than wfHttpError().

Or perhaps to sprinkle another header_sent() check in the mix?

That seems reasonable enough. Or I wonder whether we should just have wfHttpError() output nothing if the headers were already sent, since it can't set the intended status code either.

[1]: Most likely of making the client fail to parse it as the JSON, XML, or PHP-serialized data it was expecting.

Unfortunately there's little indication as to how some exception is making it past the Action API's normal exception handling mechanisms to get to the uncaught exception handler, […].
In both cases it happened after the API completed, during the post-output shutdown. All the older instances I checked were also WMFTimeoutException thrown from within doPostOutputShutdown(),

Yeah, I think we would see the same issue for any fatal error or uncaught exception thrown during post-send, or during shutdown. Mainly from deferred updates, and destruct handlers I guess.

The good thing is that overall fatal errors are sufficiently rare now, and post-send or shutdown ones even more so, that it is indeed mostly just timeout exceptions we're seeing. And from API traffic indeed only timeouts in recent weeks. That's a fairly good position for us to be in.

I wonder whether we should just have wfHttpError() output nothing if the headers were already sent,

Yeah, the exception renderer behaves that way as well. I'm unsure whether to do that for wfHttpError though, I feel like that should be the caller's responsibility not to call this method and thus require the caller to be aware and handle accordingly (if needed) the post-send scenario. I worry it would otherwise duplicate and push down that responsibility too much and also hide other bugs if higher levels aren't required to consider it.

Perhaps making things like wfHttpError() throw in such scenario could be explored though. That allows both kinds of benefits to be had.

Change 583384 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] exception: No longer try to send error page post-send on api.php

https://gerrit.wikimedia.org/r/583384

Change 583432 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] exception: Remove use of wfHttpError() from MWExceptionRenderer

https://gerrit.wikimedia.org/r/583432

Change 583384 merged by jenkins-bot:
[mediawiki/core@master] exception: No longer try to send error page post-send on api.php

https://gerrit.wikimedia.org/r/583384