Page MenuHomePhabricator

[timebox: 3 days] Impact module: Support larger wgGEUserImpactMaxEdits
Closed, ResolvedPublic3 Estimated Story Points

Description

Background

The Growth-Team tried increasing the maximum number of edits in Impact module to 10,000 edits at the English Wikipedia, see T341599: Impact Module: improvements for former newcomers. Unfortunately, that increase triggered a lot of fatal errors, see T398418: TypeError: array_map(): Argument #2 ($array) must be of type array, int given. We fixed them by reverting the changes, but we need to adjust the Impact module in a way so that it works with larger values for wgGEUserImpactMaxEdits (10,000 is the current goal).

The underlying technical problem is related to the refreshUserImpactJob. On every edit, we:

  1. compute user impact data,
  2. store it into the database,
  3. pass the computed value to the refreshUserImpactJob,
  4. the job computes its own version of user impact data and updates the database

The issue happens, because the job (generated at step 3) is larger than the maximum size, and is refused by the event gate.

Questions to resolve
  1. Why are we using the "compute-schedule-compute-again" approach? If we want the computation to happen within the job, wouldn't it make more sense to just schedule the job and let it compute?
  2. Is it necessary for us to pass the impact data blob to the job? Can we avoid that to ensure the job is not becoming too large?
  3. What is the maximum size of a job params?
Acceptance Criteria

Determine the next steps that are needed to increase the wgGEUsereImpactMaxEdits value.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Michael renamed this task from Impact module: Support larger wgGEUserImpactMaxEdits to [timebox: 3 days] Impact module: Support larger wgGEUserImpactMaxEdits.Jul 7 2025, 4:07 PM
Michael triaged this task as High priority.
Michael set the point value for this task to 3.
KStoller-WMF lowered the priority of this task from High to Medium.Jul 30 2025, 4:15 PM

Is there an estimate of when this task might move forward?

@kostajh - we don't have an estimate, as this keeps getting pushed off since it doesn't directly relate to our annual plan commitments.

That being said, I think it's a valuable improvement for junior and experienced editors, and if this is something that is high priority for Trust & Safety Product, then perhaps Growth can consider a small hypothesis under WE4.2 to support the User Info card priority? Do you think that makes sense?

@kostajh - we don't have an estimate, as this keeps getting pushed off since it doesn't directly relate to our annual plan commitments.

That being said, I think it's a valuable improvement for junior and experienced editors, and if this is something that is high priority for Trust & Safety Product, then perhaps Growth can consider a small hypothesis under WE4.2 to support the User Info card priority? Do you think that makes sense?

Sorry I missed replying to this. I'm not sure about having this as a hypothesis in WE4.2. Could we try increasing the limit on some production Wikipedias (e.g. enwiki) and seeing if it results in any issues?

Could we try increasing the limit on some production Wikipedias (e.g. enwiki) and seeing if it results in any issues?

I know when we attempted this previously T341599: Impact Module: improvements for former newcomers we ran into issues: T398418: TypeError: array_map(): Argument #2 ($array) must be of type array, int given.

This has been on hold until we wrap up some annual plan efforts, but I would love to see this move forward!

@Sgs asked me about step (4) from the description (why I think that the job unconditionally re-computes the data). What I see happening here is this:

  1. GrowthExperimentUserImpactUpdater::refreshUserImpactData computes the data for the first time and saves them
  2. GrowthExperimentUserImpactUpdater::refreshUserImpactData schedules RefreshUserImpactJob with staleBefore set to currentTime + 1 second
  3. Within the job, staleBefore is guaranteed to be at least a second later than the generatedAt timestamp of the impact data computed in step 1
  4. Because of that, RefreshUserImpactJob::isFresh is guaranteed to return false
  5. The check at line 125 of RefreshUserImpactJob is guaranteed to be true => unconditional recomputation happens at the following line

I do not know why we're passing over user impact data only to recompute it.

Why are we using the "compute-schedule-compute-again" approach? If we want the computation to happen within the job, wouldn't it make more sense to just schedule the job and let it compute?

I believe the reason is an intent to make the data generated from a web request and the data generated from the refreshUserImpactData maintenance script (T324675) more consistent. In 881414: Process more articles when fetching page view data two methods are introduced, one for requesting the page view data from a web request content and another from a job context. The job one attempts consecutive calls to page views service for 5min until it fetches up to 1k articles page view data.

In particular, there are 3 calls to GrowthExperimentUserImpactUpdater::refreshUserImpactData from 2 callers (they all check GrowthExperimentUserImpactUpdater::userIsInCohort before) :

  • UserImpact\MediaWikiEventIngress\PageRevisionUpdatedIngress::handlePageRevisionUpdatedEvent: for updating a single user impact after standard edit.
  • ImpactHooks::onManualLogEntryBeforePublish: for updating one or two user impacts after thanking or being thanked.

and as the original code from 881414: Process more articles when fetching page view data states in the description:

Enqueue a job after receiving thanks or making an edit, so that users who are not in the refreshUserImpact cron job data cohort will also have most of their page view data accurately processed.

So it seems the second computation is triggered to reduce the chances of discrepancy on the articles/pageviews shown for users who aren't in the script cohort

Is it necessary for us to pass the impact data blob to the job? Can we avoid that to ensure the job is not becoming too large?

From a theoretical pov the impact data blob pre-computed is used in the job to decide which version is fresher: whatever was pre-computed OR whatever the job can retrieve from the cache at the time of running. However afaiu our current code will not trigger that comparison in any meaningful way because as Martin pointed:

  1. GrowthExperimentUserImpactUpdater::refreshUserImpactData schedules RefreshUserImpactJob with staleBefore set to currentTime + 1 second

that makes the job not re-use at all the pre-computed impact. This is also true in the case of the refreshUserImpactData script where there's no pre-computed blob added to the job

I'm still in the process of figuring out (3), what's the physical limit for the job params, however since it seems that we aren't currently making any good use of the pre-computed data in the job, we could just omit adding the full blob as a param like we do in the job context and give a try to a bumped limit in a test/pilot wiki. If there rest of engineers agree with the diagnose I can write that patch which is trivial and give a try in one wiki on Monday. cc @Urbanecm_WMF @Michael

Thank you for all this investigation! This will be helpful now, but also in the future when we look into using DPE's new API.

[...] since it seems that we aren't currently making any good use of the pre-computed data in the job, we could just omit adding the full blob as a param like we do in the job context and give a try to a bumped limit in a test/pilot wiki. If there rest of engineers agree with the diagnose I can write that patch which is trivial and give a try in one wiki on Monday. cc @Urbanecm_WMF @Michael

This makes sense to me. Let's do that!

Change #1218265 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] UserImpact: stop using pre-computed impact in the user impact job

https://gerrit.wikimedia.org/r/1218265

Change #1218265 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UserImpact: stop using pre-computed impact in the user impact job

https://gerrit.wikimedia.org/r/1218265

I plan to do the QA of the change in this task (only for testwiki) before bumping the limit in other wiki. I will post the results.

Change #1219582 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@wmf/1.46.0-wmf.7] UserImpact: stop using pre-computed impact in the user impact job

https://gerrit.wikimedia.org/r/1219582

Change #1219582 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.46.0-wmf.7] UserImpact: stop using pre-computed impact in the user impact job

https://gerrit.wikimedia.org/r/1219582

Mentioned in SAL (#wikimedia-operations) [2025-12-18T14:41:15Z] <sgimeno@deploy2002> Started scap sync-world: Backport for [[gerrit:1219582|UserImpact: stop using pre-computed impact in the user impact job (T398500)]]

Mentioned in SAL (#wikimedia-operations) [2025-12-18T14:43:27Z] <sgimeno@deploy2002> sgimeno: Backport for [[gerrit:1219582|UserImpact: stop using pre-computed impact in the user impact job (T398500)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-12-18T14:50:46Z] <sgimeno@deploy2002> Finished scap sync-world: Backport for [[gerrit:1219582|UserImpact: stop using pre-computed impact in the user impact job (T398500)]] (duration: 09m 31s)

The change and limit bump can be tested in testwiki. I will monitor the job performance over the next days before bumping the limit in a real wiki.

A possible side effect of this change is introducing discrepancies between what's immediately shown after making an edit in the impact module and what will be shown after the refresh job finishes. Specially on the list of most viewed articles. I've tested with Special:Impact/Sergio_Enrollment_2 and haven't perceived it but maybe this discrepancy is more obvious for users making their first edits. Maybe @Etonkovidova can help confirming there's no discrepancy.

The change and limit bump can be tested in testwiki. I will monitor the job performance over the next days before bumping the limit in a real wiki.

A possible side effect of this change is introducing discrepancies between what's immediately shown after making an edit in the impact module and what will be shown after the refresh job finishes. Specially on the list of most viewed articles. I've tested with Special:Impact/Sergio_Enrollment_2 and haven't perceived it but maybe this discrepancy is more obvious for users making their first edits. Maybe @Etonkovidova can help confirming there's no discrepancy.

Didn't find any issues on testwiki wmf.7; the impact stats for a newly made edit are shown immediately.

Moving to Test in Production to await for deployment to other wikis. Testing notes: check in production

I will look into T413819 which is coincident with this change, thanks for testing!

I will look into T413819 which is coincident with this change, thanks for testing!

Thanks, @Sgs! I am closing this task as resolve - nothing else was found.

the message growthexperiments-homepage-impact-scores-best-streak-info-text-third-person reflects the new limit (10, 000)

The message is correct on testwiki wmf.10.

@Sgs Does this mean we can "unstall" T341599: Impact Module: improvements for former newcomers? Or is another step necessary?

Indeed, I wanted to check first T413819: DivisionByZeroError: Division by zero (in ComputedUserImpactLookup) but that issue is most probably unrelated, the changes we made in this task affect all wikis in the sense that all re-computations of the impact data are now from scratch. If we made some mistake in our changes here we should be able to observe something off in our Homepage performance dashboard already which isn't the case.

I will comment on T341599 suggesting a paced rollout.