Page MenuHomePhabricator

WikimediaDebug features (e.g. logging and profiling) for MediaWiki JobQueue jobs
Open, Needs TriagePublic

Description

Debugging MediaWiki jobs is a pain point, in part because there is no way to trigger detailed debug logging like we can for web requests. This seems fairly easy to fix - MediaWiki just needs to pass the value of X-Wikmedia-Debug to EventBus, and then the job executor service needs to turn it back to an actual header. (That header would affect other jobs in the same batch; that's probably fine.)

Alternatively, we can make it part of the job specification like we do with X-Request-ID - that wouldn't allow routing to mwdebug, but would allow "Verbose logging" etc to work.

Event Timeline

Note that this isn't as simple as just forwarding the header, as it's parsed by ATS and jobs don't pass through the CDN layer.

The routing part isn't that simple. It would fix logging though.

I enhanced one of my bots to check for, and report, errors it encounters.

One error encountered, I was able to recover from my making the bot log back in.

[error] => Array
    (
        [code] => assertuserfailed
        [info] => You are no longer logged in, so the action could not be completed.
        [docref] => See https://en.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes.
    )

But another error I encountered seems to be a report of an internal problem with the job queue.

[error] => Array
    (
        [code] => internal_api_error_JobQueueError
        [info] => [0f9b54ac-f0e9-4040-9546-b893e162e7b5] Caught exception of type JobQueueError
        [errorclass] => JobQueueError
    )

At the moment, my bot ignores these, and they only are happening sporadically. Most API purge requests, including the ones right after these errors, still work.

Let me know if my bot should do anything special in response to these errors. I'm reporting this here because this Phab relating to MediaWiki jobs is still open, while other Phabs related to my bots API Purge troubles (T351729 and T354460) were resolved and closed. Please do open a new Phab for this, if appropriate.

Please do open a new Phab for this, if appropriate.

It would be more appropriate, but I think it's fine to ignore that error (the exception is "Could not enqueue jobs", caused by a bunch of 503s in EventBus, so probably a server being down or overloaded temporarily).

Krinkle renamed this task from Forward X-Wikimedia-Debug header to MediaWiki jobs to WikimediaDebug features (e.g. logging and profiling) for MediaWiki JobQueue jobs.Mar 19 2025, 11:11 PM
Krinkle updated the task description. (Show Details)

For external web requests, we separate mwdebug from appservers. That is, we don't just enable WikimediaDebug features on a request that is otherwise routed and handled in the same cluster, but we run it on a separate host. The two proposals in the task description both involve activating WikimediaDebug features within the main jobrunner cluster. For one, Tideways/XHProf is not installed there (and can't be, for performance reasons).

A smaller first step might be to document (and then abstract/formalize) a way to run jobs from the command-line. We already provide ways to enable WikimediaDebug features via mwscript from mwdebug or mwmaint CLI.

For example, depending on what you need, the following might work:

https://wikitech.wikimedia.org/wiki/WikimediaDebug#Debug_logging_(CLI)

krinkle at KrinkleMac in ~/Development/mediawiki
$ php maintenance/eval.php -d 1

> $job = new CdnPurgeJob(['urls' => ['https://example.org']]); $job->run();
[info] [squid] MediaWiki\Deferred\CdnCacheUpdate::purge: https://example.org

Idem in prod on mwmaint:

[23:28 UTC] krinkle at mwmaint1002.eqiad.wmnet in ~
$ mwscript eval.php testwiki -d 1
> $job = new CdnPurgeJob(['urls' => ['https://example.org']]); $job->run();
[info] [squid] CdnCacheUpdate::purge: https://example.org
…

> DeferredUpdates::doUpdates();

Or, to write to a file for later analysis (uses the same format as the udp2log files on the mwlog host):

[23:28 UTC] krinkle at mwmaint1002.eqiad.wmnet in ~
$ PHP='MW_DEBUG_LOCAL=1 php' mwscript eval.php testwiki
> $job = new CdnPurgeJob(['urls' => ['https://krinkle.wikimedia.org']]);
> $job->run();
> DeferredUpdates::doUpdates();
> ^D

[23:29 UTC] krinkle at mwmaint1002.eqiad.wmnet in ~
$ head /tmp/wiki.log 
2025-03-19 23:28:49.035124 [] mwmaint1002 testwiki 1.44.0-wmf.21 localisation DEBUG: LocalisationCache using store LCStoreCDB
2025-03-19 23:28:49.050544 [] mwmaint1002 testwiki 1.44.0-wmf.21 memcached DEBUG: MainWANObjectCache using store MemcachedPeclBagOStuff
2025-03-19 23:28:49.053109 [] mwmaint1002 testwiki 1.44.0-wmf.21 memcached DEBUG: MicroStash using store MemcachedPeclBagOStuff
2025-03-19 23:28:49.066457 [] mwmaint1002 testwiki 1.44.0-wmf.21 session DEBUG: SessionManager using store RESTBagOStuff
…
2025-03-19 23:29:00.664653 [] mwmaint1002 testwiki 1.44.0-wmf.21 squid INFO: CdnCacheUpdate::purge: https://krinkle.wikimedia.org
…
2025-03-19 23:29:02.651213 [] mwmaint1002 testwiki 1.44.0-wmf.21 rdbms DEBUG: LBFactory shutdown completed