Page MenuHomePhabricator

Port \GrowthExperiments\Mentorship\Hooks\MentorHooks::onPageSaveComplete to PageUpdated event
Open, MediumPublic

Description

As an experiment, the \GrowthExperiments\Mentorship\Hooks\MentorHooks::onPageSaveComplete hook should be migrated to use the new PageUpdated event.

Example from the Linter extension: EXP: use new event listener mechanism
TODO: link documentation here

Acceptance criteria:

  • Investigate and document here what would happen if there were a problem with that hook and how we would notice that.
  • Change the \GrowthExperiments\Mentorship\Hooks\MentorHooks::onPageSaveComplete to the PageUpdated event
  • (optional) Share any feedback/thoughts/concerns/feelings you may have about how the new system.

Open questions:

  • Currently, in that hook handler we schedule two deferred updates. Should this now ideally be two handlers for the PageUpdated event?

Event Timeline

Sgs triaged this task as Medium priority.Nov 26 2024, 6:10 PM

Change #1100778 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/GrowthExperiments@master] refactor(Mentorship): migrate PageSaveComplete hook to domain events

https://gerrit.wikimedia.org/r/1100778

Michael added subscribers: Urbanecm_WMF, daniel.

Investigate and document here what would happen if there were a problem with that hook and how we would notice that.

If that subscriber would not be executed at all, then mentees would not be set to active and the numbers of active mentees would probably be steadily declining.
If there were only intermittent problems with executing this subscriber, then this would probably be less of an issue, because a mentee that edits once is likely to also edit a second time and they would be set to active then.

How we would notice that: Not sure. We don't seem to have any metrics on the number of active/inactive mentees? -- Should we add such tracking, and if so: how/where? Curious about @Urbanecm_WMF's thoughts in particular.

Share any feedback/thoughts/concerns/feelings you may have about how the new system.

  • I think this hook in particular does not have very strict requirements as to when it is executed. If there would be benefits to moving it to a job instead of a deferred update, then this would be also OK. (cc @daniel)
NOTE: this might need further work due to T381299#10383200

Investigate and document here what would happen if there were a problem with that hook and how we would notice that.

If that subscriber would not be executed at all, then mentees would not be set to active and the numbers of active mentees would probably be steadily declining.
If there were only intermittent problems with executing this subscriber, then this would probably be less of an issue, because a mentee that edits once is likely to also edit a second time and they would be set to active then.

FTR, the active flag is a performance optimalization thing. It allowed us to release Recent Changes filters based on mentorship ("include only your mentees") at the biggest Wikipedias, where it was processing more rows than allowed (see T318457 for more details). We also use it in Personalized praise (deployed at few wikis only), to avoid processing more mentees than necessary. So, failing to update the flag would result in missing mentees in those two features.

How we would notice that: Not sure.

In theory, at least a permanent failure scenario should be caught by MentorHooksTest::testMarkMenteeAsActive, which asserts an edit resets the is active flag back to true. Locally, breaking the hook indeed does result in a test failure. If the hook was to fail intermittently, that would be much harder to assert by a test, of course.

We don't seem to have any metrics on the number of active/inactive mentees? -- Should we add such tracking, and if so: how/where?

Ad how/where: This metric should be easy to calculate using the MetricsFactory, which computes several mentorship-related metrics once a day or so. It should be fairly easy to add a new IMetric there for this purpose.

Ad whether: At least the number of active mentees seems potentially useful (the inactive ones are expected to basically grow forever). But, I am wondering whether it is sufficient to rely on the test I mentioned above.

When reviewing @Michael's code for the migration, I noticed the method we implement is called handlePageUpdatedEventAfterCommit(). I'm wondering, why is it called after commit? Does that mean the method is invoked whenever a commit occurs on the database level? Does that apply even within maintenance scripts, which have the outer transaction scope (meaning they're one free to start and end transactions as they please)? In other words: if a maintenance script is editing, would the handlePageUpdatedEventAfterCommit() method be invoked whenever the maintenance script committed the changes to the database (whether that is right away, in batches, or at the end of the execution as normally)? Or is this meant to say "runs within a deferred update"?

@daniel @Michael Curious to hear whether you have any thoughts about this.

When reviewing @Michael's code for the migration, I noticed the method we implement is called handlePageUpdatedEventAfterCommit(). I'm wondering, why is it called after commit? Does that mean the method is invoked whenever a commit occurs on the database level? Does that apply even within maintenance scripts, which have the outer transaction scope (meaning they're one free to start and end transactions as they please)? In other words: if a maintenance script is editing, would the handlePageUpdatedEventAfterCommit() method be invoked whenever the maintenance script committed the changes to the database (whether that is right away, in batches, or at the end of the execution as normally)? Or is this meant to say "runs within a deferred update"?

@daniel @Michael Curious to hear whether you have any thoughts about this.

Thank you for the question, I think it is another datapoint in favor of needing to figure out how to document/communicate the purpose better.

As I understand it, the event is triggered when a page is edited (and in a couple of related-ish circumstances, see T381299). However, "when a page is edited" can mean different points in time:

  • during the transaction of that page being edited (so throwing an exception would cause the edit to fail)
  • after the transaction of that page being edited, but before the response is being sent to the user
  • after the response has been sent to the user, but still in the same request (deferred updates)
  • in a job after the request has been completed

As I understand it, defacto it is currently handled in a deferred update, but I understand this being intentionally not part of the contract yet. What is part of the contract is that this happens when the edit has already been committed to the database. I would expect this to also fire for maintenance scripts, for example we would want to invalidate the link-recommendation for a page that is being edited with maintenance/edit.php, right?

See also T380013#10357537 and following comments for conversations around that issue. Ideas welcome :)

Change #1100778 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] refactor(Mentorship): migrate PageSaveComplete hook to domain events

https://gerrit.wikimedia.org/r/1100778

Not sure if this need to wait for the conclusion of T381299: Emit PageUpdatedEvents on all occasion the page content changes., because we may need to change our code in response to that. Should we move it to stalled, or should it move forward to QA?

Not sure if this need to wait for the conclusion of T381299: Emit PageUpdatedEvents on all occasion the page content changes., because we may need to change our code in response to that. Should we move it to stalled, or should it move forward to QA?

Boldly moving to QA, as the migration happened. If a change is needed, we can always add a new task for it.