Page MenuHomePhabricator

Decorate mediawiki.api_request events with experiment enrollment information
Open, HighPublic3 Estimated Story Points

Description

Background

Soon after the change for T409965: Enable experiment enrollment in the MediaWiki Action API was deployed to group1 wikis, there was a spike in validation errors on the mediawiki.api_request stream. In response, we (temporarily) disabled adding experiment enrollment information to the global logging context for MediaWiki Action API requests. This means that experiment enrollment information is not included in any lines logged during an API request.

Why did this happen?

This happened because API requests are sent as events to the stream via the Monolog logging library and we starting adding experiment enrollment information to the global logging context, which is mixed into all log lines. This information was added to the events sent to the stream, triggering a validation error.

AC

  • We determine if the mediawiki.api_request stream is required
  • The /mediawiki/api/request/1.0.0 schema is updated to include an context.ab_tests property (or, make it not fail when adding context.ab_tests)
  • We re-enable adding experiment enrollment information to the global logging context for MediaWiki Action API requests

Notes

  1. This will happen again when another developer adds to the global logging context. The schema should be made more resilient

Event Timeline

JVanderhoop-WMF subscribed.

What is the impact of not doing? What teams would this block (and in what time frame)?

Relevant docs: https://wikitech.wikimedia.org/wiki/Data_Platform/Data_Lake/Traffic/mediawiki_api_request

What is the impact of not doing? What teams would this block (and in what time frame)?

@JVanderhoop-WMF: I've updated the task to highlight that this impacts all logging during API requests and not just logging that ends up with an event flowing on the mediawiki.api_request stream. The impact of not fixing this is that there's no insight into which experimental treatments were assigned during a MediaWiki Action API request.

@phuedx and this would be particularly bad for Growth's future experiments (Revise tone structured task, slated for Jan), correct?

JVanderhoop-WMF updated the task description. (Show Details)
JVanderhoop-WMF set the point value for this task to 5.

This would also be bad for RelEng and SRE when they're trying to figure out what's happening for a set of API requests. This reduces the overall observability of they system.

Change #1228500 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/TestKitchen@master] Hooks: Fix context.ab_test global logging context

https://gerrit.wikimedia.org/r/1228500

Change #1228509 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/TestKitchen@master] Hooks: Add global logging context for all requests

https://gerrit.wikimedia.org/r/1228509

phuedx changed the point value for this task from 5 to 3.

I've been bold and picked up this task. At the time of writing, we're seeing 600+ validation errors/min on the mediawiki.api_request stream.

As for why we're seeing a non-trivial number of validation errors on that stream despite the functionality being limited to pageviews in T409965#11408520 (onwards)? My theory is that these are action=parse MediaWiki Action API requests.

Review and merge https://gitlab.wikimedia.org/repos/data-engineering/schemas-event-primary/-/merge_requests/28

Are we sure that adding TestKitchen stuff to the api request logging schema is the right thing to do? I can see how it would fix the problem, but somethings seems a bit fishy.

Why did you want these in mediawiki.api_request in the first place? And/or why do you want it in the global logging context? I guess you want to be able to see experiment details in logstash?

This happened because API requests are sent as events to the stream via the Monolog logging library

Honestly: probably the right fix would be to stop doing this, and instead just emit the api_request as events rather than hijacking monolog. :) I can't totally recall, but I think the api request log might be the only thing in MediaWiki that is using monolog to send events.

This happened because API requests are sent as events to the stream via the Monolog logging library

Honestly: probably the right fix would be to stop doing this and instead just emit the api_request as events rather than hijacking monolog. :)

Agreed. I was erring on the side of being quick – fixing the validation error – rather than diving into introducing a hook etc etc. You're right though.

I can't totally recall, but I think the api request log might be the only thing in MediaWiki that is using monolog to send events.

Would you object to removing the supporting code as part of fixing the issue?

If you are just trying to fix the validation errors, and don't actually want this data in the api_request event, a quick fix might be to modify the EventBus monolog adapter and just delete the offending fields before the event is produced. It looks like it is already doing this for a private field.

This way you wouldn't have to add irrelevant backwards incompatible coupling between the api_request schema and testkitchen stuff. CC @tchin too

Change #1228500 abandoned by Phuedx:

[mediawiki/extensions/TestKitchen@master] Hooks: Fix context.ab_test global logging context

https://gerrit.wikimedia.org/r/1228500

Change #1228509 abandoned by Phuedx:

[mediawiki/extensions/TestKitchen@master] Hooks: Add global logging context for all requests

https://gerrit.wikimedia.org/r/1228509

A_smart_kitten subscribed.

(as IIUC tasks should generally have e.g. a codebase/team project attached to them, and it seems like this task might involve the TestKitchen extension)