Page MenuHomePhabricator

DeferredUpdates should not silently ignore critical exceptions
Closed, ResolvedPublic

Description

Currently, any critical exceptions thrown from any DeferrableUpdate are silently ignored by DeferredUpdates::runUpdate.

Expected:
Exceptions thrown by a DeferrableUpdate cause a code 500 response (if still possible), and should go to the error log per default. They should also cause unit tests to fail.

Observed:
Critical exceptions (such as RuntimeExceptions or LogicExceptions) thrown by a DeferrableUpdate are silently ignored. They are only visible in the debug log if the "exception" channel is explcitly enabled.

Analysis:

  • DeferredUpdates::runUpdate calls MWExceptionHandler::rollbackMasterChangesAndLog
  • MWExceptionHandler::rollbackMasterChangesAndLog calls MWExceptionHandler::logException, which calls error() on a Logger returned by LoggerFactory::getInstance( 'exception' ).
  • The logger used by MWExceptionHandler is per default a LegacyLogger. LegacyLogger::error forwards to LegacyLogger::log, which, per default, ends up calling MWDebug::debugMsg, but not LegacyLogger::emit(), because shouldEmit() returns false.
  • shouldEmit() returns false unless $wgDebugLogGroups[] contains 'exception', which is this LegacyLogger's default channel as per the call to LoggerFactory::getInstance().

Proposal:

  • LegacyLogger::log should default to the wfErrorLog channel if $level is >= LogLevel::ERROR. In addition, MWDebug::warning should be called instead of MWDebug::debugMsg if $level is >= LogLevel::WARNING.
  • It's unclear if level LogLevel::WARNING should be considered a "developer warning" which just causes tests to fail, or a "production warning" that also goes into error logs. In any case, unit tests should fail when warning are triggered (unless the warnign was specifically suppressed).
  • Alternatively, if the behavior of LegacyLogger should not be changed, DeferredUpdates::runUpdate should be changed to call wfLogWarning() for all exceptions that are not ErrorPageErrors (and maybe even for them).

Event Timeline

unit tests should fail when warning are triggered

For the record, this is already the case. Any "php error" (notice, warning, error) is turned into an exception by PHPUnit and fails the test. Any "mediawiki error" (not debug/info) that goes to mw-exception.log or mw-error.log in Jenkins, will make the job fail per T50002. Our Jenkins jobs, after completing the PHPUnit run, assert that these files are empty/non-existent.

However, if they are turned into debug-level errors, then, naturally, those don't cause tests to fail given we have (and should have) plenty of debug messages being output normally.

@Krinkle wrote:

Any "php error" (notice, warning, error) is turned into an exception by PHPUnit and fails the test.

I know, I wrote that code :) My point is, calling LegacyLogger::log with LogLevel::WARNING does not do this. But should.

Any "mediawiki error" (not debug/info) that goes to mw-exception.log or mw-error.log in Jenkins, will make the job fail per T50002

That is correct for jenkins, but local tests will not fail. Also, for a vanilla install, such errors will be invisible on the live system, unless someone enables and monitors mw-exception.log.

Change 423035 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Make DeferredUpdates re-throw exceptions during testing.

https://gerrit.wikimedia.org/r/423035

@Krinkle wrote:

Any "php error" (notice, warning, error) is turned into an exception by PHPUnit and fails the test.

Just to clarify, I was referring to [convertErrorsToExceptions](https://phpunit.de/manual/4.8/en/appendixes.configuration.html).

We currently enable $wgDebugLogFile (which writes all severities and all channels of MediaWiki logging) in Jenkins' LocalSettings file for MediaWiki jobs. We also have separate files for specific channels (php-exception, php-error, and ratelimit).

Perhaps we can also configure it so that of all channels, messages with severity of WARNING and higher, are written emitted as php errors. That would naturally make them show up in stderrr and/or caught by PHPUnit.

I'm not entirely sure how yet, but I believe this would be possible using the modern $wgMWLoggerDefaultSpi configuration format (instead of the older wgDebugLogGroups config, which just feeds into that).

Change 423035 merged by jenkins-bot:
[mediawiki/core@master] Make DeferredUpdates re-throw exceptions during testing.

https://gerrit.wikimedia.org/r/423035

@Krinkle wrote:

Perhaps we can also configure it so that of all channels, messages with severity of WARNING and higher, are written emitted as php errors. That would naturally make them show up in stderrr and/or caught by PHPUnit.

I think that would be a good idea. I looked into doing that, but got very confused by LegacyLogger.

For the record: a stop gap solution is merged, but it's a Volkswagen solution: DeferredUpdates changes behavior based on whether it's invoked during a unit test.

In any case, critical errors thrown in DeferredUpdates should not be invisible per default on 3rd party installations. This ticket should stay open until that problem is resolved.

In any case, critical errors thrown in DeferredUpdates should not be invisible per default on 3rd party installations. This ticket should stay open until that problem is resolved.

Just want to add a giant +1 to that (after spending a couple hours trying to track down errors which i assumed didn't exist because they didn't show up in my php error log despite E_DEPRECATED things from the same code showing up there).

Exceptions from DeferredUpdates:

  • … do go do the error log. They are logged as exceptions same as any other exception, with the same severity and structured logging keys. It uses channel=DeferredUpdates which is similar to channel=fatal, and is included by default in $wgDebugLogFile same as for fatals. Or, if you have a separate debug log file for only certain groups, then you'd include it in the channels you whitelist (e.g. "error", "exception", "fatal" and "DeferredUpdates"). Our various Logstash dashboards do this already, and our Jenkins CI config and DevelopmentSettings.php config also do this by default.
  • … they do do abort the web request and result in HTTP 5xx (if pre-send). DeferredUpdates.php catches the exceptions only to give other updates a chance to run. Whichever exception was first thrown, is re-thrown and otherwise behaves the same.
  • … they behave the same (if post-send) as fatal errors from the JobQueue.

As I understand it, the main issue that led to this task was that unit tests weren't aborted correctly, which has since been fixed.

What is left for this task? Do we want to reconsider merging the DeferredUpdates channel into the exception channel? (Note this was in the past intentionally split).

CCicalese_WMF subscribed.

Moving on workboard from Backlog to Next, since the Backlog column was hidden some time ago. Is there work remaining to do on this task?

Is this resolved?

I don't know... Krinkle said above that the original issue has since been fixed, but I don't know anything about that.

To test, make a DeferredUpdate that throws an exception and schedule it with DeferredUpdates::addUpdate() in an integration test. It should make the test fail. Ideally, we'd have a test that tests this :)

--- a/tests/phpunit/integration/includes/user/UserFactoryTest.php
+++ b/tests/phpunit/integration/includes/user/UserFactoryTest.php
@@ -24,6 +24,11 @@ class UserFactoryTest extends MediaWikiIntegrationTestCase {
        $user = $factory->newFromName( $name );
 
        $this->assertInstanceOf( User::class, $user );
+
+       DeferredUpdates::addCallableUpdate( function () {
+           throw new Exception( 'Boo!' );
+       } );
+
        $this->assertSame( $name, $user->getName() );
    }
PHPUnit
# php phpunit.php integration/includes/user/UserFactoryTest.php 
Using PHP 7.2.32
PHPUnit 8.5.9 by Sebastian Bergmann and contributors.

E.........                                                        10 / 10 (100%)

Time: 6.36 seconds, Memory: 40.25 MB

There was 1 error:

1) UserFactoryTest::testNewFromName
Exception: Boo!

/var/www/mediawiki/tests/phpunit/integration/includes/user/UserFactoryTest.php:29

In addition, in case it somehow escapes PHPUnit, it also goes to mw-error.log, which CI fails jobs for if non-empty:

mw-debug.log
[DeferredUpdates] [debug] DeferredUpdates::run: started MWCallableUpdate_UserFactoryTest->testNewFromName #889 []
[DeferredUpdates] [debug] DeferredUpdates::run: ended MWCallableUpdate_UserFactoryTest->testNewFromName #889 []
mw-error.log
[exception] [error] [{exception_id}] {exception_url}   Exception from line 29 of /var/www/mediawiki/tests/phpunit/integration/includes/user/UserFactoryTest.php: Boo! {"exception":… "Call Stack: … 7. UserFactoryTest->run() \/var\/www\/mediawiki\/vendor\/phpunit\/phpunit\/src\/Framework\/TestSuite.php:601\n    5.5336   36983984   8. UserFactoryTest->run() \/var\/www\/mediawiki\/tests\/phpunit\/MediaWikiIntegrationTestCase.php:446\n    5.5337   36983984   9. PHPUnit\\Framework\\TestResult->run() \/var\/www\/mediawiki\/vendor\/phpunit\/phpunit\/src\/Framework\/TestCase.php:763\n    5.5443   37074752  10. UserFactoryTest->runBare() \/var\/www\/mediawiki\/vendor\/phpunit\/phpunit\/src\/Framework\/TestResult.php:691\n    5.6658   37699328  11. UserFactoryTest->onNotSuccessfulTest() \/var\/www\/mediawiki\/vendor\/phpunit\/phpunit\/src\/Framework\/TestCase.php:1133\n"},"exception_id":"f4e6cc7f3a7686349515a499","exception_url":"[no req]","caught_by":"other"}
tstarling claimed this task.
tstarling subscribed.

Sounds like it's resolved to me.