Page MenuHomePhabricator

context.ab_tests global logging context causing validation errors for the mediawiki.api_requests stream
Closed, ResolvedPublic2 Estimated Story Points

Description

We're seeing ~250,000 validation errors every 15 minutes on the mediawiki.api_requests stream.

Context

The EventGate validation errors dashboard is showing a decent chunk of these sorts of errors:

"message": "'' should NOT have additional properties",
"$schema": "/error/2.1.0",
"tags": [
  "input-kafka-eventgate-analytics-validation-error-eqiad",
  "kafka",
  "es",
  "eventgate",
  "normalized_message_untrimmed"
],
"emitter_id": "eventgate-production",
"errored_schema_uri": "/mediawiki/api/request/1.0.0",
"dt": "2025-11-26T05:35:09.027Z",
"error_type": "ValidationError",
"normalized_message": "'' should NOT have additional properties",

Thanks @tchin for flagging.

This appears due to the call chain of ApiMain's logRequest where

wfDebugLog( 'api-request', '', 'private', $logCtx );

ultimately results in injecting the context into the message destined for EventBus (in addition to Logstash events as seen if expanding entries in this OpenSearch dashboard snapshot query).

The approximate call chain doing this injection is.

LoggingContext.php:65, MediaWiki\Logger\LoggingContext->get()
ContextProcessor.php:18, MediaWiki\Logger\Monolog\ContextProcessor->__invoke()
Logger.php:388, Monolog\Logger->addRecord()
Logger.php:611, Monolog\Logger->info()
GlobalFunctions.php:733, wfDebugLog()
ApiMain.php:2155, MediaWiki\Api\ApiMain->logRequest()
ApiMain.php:949, MediaWiki\Api\ApiMain->executeActionWithErrorHandling()
ApiMain.php:918, MediaWiki\Api\ApiMain->execute()
ApiEntryPoint.php:138, MediaWiki\Api\ApiEntryPoint->execute()

One option to allow us to support A/B testing on the Action API right now would be to not add the context.ab_tests data to the logging context until we have figured out a nice way to exclude it from being a part of the /mediawiki/api/request/1.0.0 stream. Another could be to rev the stream (although that requires patching core) so the field is supported, although the intent of context.ab_tests was really for operators looking in Logstash, not for downstream data processing. It looks like we're getting maybe 10K of these messages every 15 minutes right now.

Thanks @dr0ptp4kt for the additional info. The patch above should fix this for now, while we work on the correct fix for the issue.

The approximate call chain doing this injection is. <snip />

The call chain is correct enough but there's some config-driven indirection here that should be captured:

The api-request Monolog channel is defined here. You'll notice that the eventbus logger is set to debug level.

Aside: The documentation for this config variable is incorrect. The eventbus logger is disabled by default.

Monolog loggers are initialized in mediawiki-config/wmf-config/logging.php on a per-channel basis. The eventbus logger for the api-request channel will be initialized here.

Event Timeline

jnuche triaged this task as Unbreak Now! priority.Nov 26 2025, 9:18 AM
phuedx lowered the priority of this task from Unbreak Now! to Needs Triage.Nov 26 2025, 9:18 AM
phuedx updated the task description. (Show Details)

Change #1211611 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/MetricsPlatform@wmf/1.46.0-wmf.4] Hooks: Only add global logging context for pageviews

https://gerrit.wikimedia.org/r/1211611

Change #1211611 merged by jenkins-bot:

[mediawiki/extensions/MetricsPlatform@wmf/1.46.0-wmf.4] Hooks: Only add global logging context for pageviews

https://gerrit.wikimedia.org/r/1211611

Mentioned in SAL (#wikimedia-operations) [2025-11-26T09:34:02Z] <phuedx@deploy2002> Started scap sync-world: Backport for [[gerrit:1211611|Hooks: Only add global logging context for pageviews (T409965 T411074)]]

Mentioned in SAL (#wikimedia-operations) [2025-11-26T09:36:17Z] <phuedx@deploy2002> phuedx: Backport for [[gerrit:1211611|Hooks: Only add global logging context for pageviews (T409965 T411074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-11-26T09:47:27Z] <phuedx@deploy2002> Finished scap sync-world: Backport for [[gerrit:1211611|Hooks: Only add global logging context for pageviews (T409965 T411074)]] (duration: 13m 29s)