Provide a mechanism to regenerate link recommendation task pool after configuration changes
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	kostajh
	Sep 8 2022, 9:47 AM

Description

Most of the changes made to Growth feature configuration via Special:EditGrowthConfig are effective immediately.

But some changes related to link recommendation only take effect after the task pool has been rebuilt. This would changes to:

minimum threshold per link recommendation
section names in which links should no longer be recommended
minimum/maximum links per task

The task pool is currently rebuilt regularly as users complete link recommendation tasks, or as non-link recommendation edits are made to relevant articles. That means it can take a long time for the pool to be completely rebuilt.

Ideally, there would be a mechanism that would allow for rapidly removing pre-configuration-change tasks and replacing them with post-configuration-change tasks. Since that's an expensive process, we need some kind of protection on who could invoke the changes, and also how often it can be done. And we'd want some kind of monitoring so users can see how far along the process is.

We currently store a hash of the dataset IDs used for creating a cached link recommendation, we should probably also store a hash related to the link-recommendation settings in Special:EditGrowthConfig.

Acceptance Criteria

EditGrowthConfig users should be able to adjust link-recommendation settings and see the changes take effect within 24 hours
Monitoring should be available to see how far long task pool regeneration is after a config switch
...?

Completion checklist

Functionality

The patches have been code reviewed and merged
The task passes its acceptance criteria

Engineering

There are existing and passing unit/integration tests
Tests for every involved patch should pass
Coverage for every involved project should have improved or stayed the same

Design & QA

If the task is UX/Design related: it must be reviewed and approved by the UX/Design team
Must be reviewed and approved by Quality Assurance.

Documentation

Related and updated documentation done where necessary
- Internal technical changes: internal repository documentation must be updated (README.md, JSDoc, PHPDoc)
- Infrastructure technical changes: technical changes that reflect on environment, infrastructure, endpoints or any other area of interest for technical contributors should be reflected on Extension:GrowthExperiments or Extension:GrowthExperiments/Technical documentation pages.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T293711 [EPIC] "Add an image" Iteration 2
		Open		None	T317290 Provide a mechanism to regenerate link recommendation task pool after configuration changes

Event Timeline

kostajh created this task.Sep 8 2022, 9:47 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 8 2022, 9:47 AM

cc @KStoller-WMF, this might fit into T315732: [EPIC] Structured Tasks: Patroller Focus?

kostajh mentioned this in T316079: Bump threshold for confidence score on link recommendation service suggestions.Sep 8 2022, 9:48 AM

Tgr moved this task from Inbox to Needs Discussion on the Growth-Team board.Sep 8 2022, 8:43 PM

In T317290#8220378, @kostajh wrote:

cc @KStoller-WMF, this might fit into T315732: [EPIC] Structured Tasks: Patroller Focus?

Agreed, thanks for adding this task!

Editing the configuration is already limited to admins and interface editors, I don't think we need more protection than that.

I think the new functionality we need is:

Move most of the revalidateLinkRecommendations.php maintenance script into a shared utility class, and create a RevalidateLinkRecommendationsJob which does the same thing as the maintenance script. Alternatively, instead of revalidation, we can just drop all existing tasks (in some staggered way) and rely on the existing task generation process for replacing them.
- Find out if a long-running job is a problem (I think it would take a few hours to 1-2 days, depending on the size of the wiki), and if so, find a way to break up into smaller chunks. (GWToolset has an example of a parent job periodically rescheduling itself and scheduling child jobs to do chunks of the work; I'm not sure it's considered to be a good pattern, though. Another approach is the one taken by GlobalRename, where status is tracked in a DB table, and jobs reschedule their own follow-ups until the status table is cleared. Also not necessarily a good example, it can be fragile; but maybe a similar approach using a maintenance script to schedule jobs would work.)
Trigger the job when the configuration is edited in a relevant way: the minimum threshold is raised,
Make sure that whenever the configuration is edited, ongoing jobs which have been triggered by a previous edit get shut down. (If the jobs run in small chunks, this might not be necessary.) Maybe possible using job deduplication?
Adjust the "Changes to this setting won't take effect immediately" message shown on relevant configuration form fields as needed.
Maybe speed up the revalidation script by making sure it only revalidates tasks when needed. In some cases this is easy (for score threshold, we can just check the scores of each recommendation against the new threshold; we might even skip requesting a new recommendation if after filtering out the links with a low score, we are still left with enough links), in some cases it might be too much effort to be worth it (when changing the section exclusion link, we'd have to know which section each link is in, and we'd have to update the service to add that information).

The job batching/continuing/controlling problem seems generic enough that it would benefit from wider discussion.

Tgr mentioned this in T299021: Reduce running time of refreshLinkRecommendations.php to a maximum of 60 minutes.Sep 23 2022, 9:29 AM

In T317290#8238724, @Tgr wrote:

Editing the configuration is already limited to admins and interface editors, I don't think we need more protection than that.

What if a well-intentioned admin makes multiple changes that trigger link recommendation regeneration in rapid succession? That seems potentially problematic.

The job batching/continuing/controlling problem seems generic enough that it would benefit from wider discussion.

Right, we can discuss that more in T299021, and treat this task as a specific use case to fix.

In T317290#8258828, @kostajh wrote:

What if a well-intentioned admin makes multiple changes that trigger link recommendation regeneration in rapid succession? That seems potentially problematic.

It would require some sort of deduplication or semaphore. Which is a good idea anyway, as the process might take long on large wikis and there is no way to tell when it's finished. Depending on the implementation used, it might or might not be easy to add. I don't think it's easy to avoid the problem via permissions, though. If we want to limit changes to developers, we just shouldn't expose it in community configuration; otherwise, I don't think an interface admin is that much less likely to make multiple changes than an admin.

OK, we've discussed this task and have some ideas. I believe we can move it into "Triaged" until we're ready to focus on T315732: [EPIC] Structured Tasks: Patroller Focus, or maybe since that epic is in progress, the correct place for this task is the Current Sprint board? @KStoller-WMF @MShilova_WMF what do you think?

Let's move to the current sprint with the hope that we can start this within the next month.

KStoller-WMF triaged this task as Low priority.Nov 3 2022, 8:59 PM

KStoller-WMF moved this task from Incoming to Ready for Development on the Growth-Team (Sprint 0 (Growth Team)) board.

We don't seem to have the capacity to work on this now, so I'm moving the task out of current sprint and shifting to a later epic.

KStoller-WMF edited parent tasks, added: T293711: [EPIC] "Add an image" Iteration 2; removed: T315732: [EPIC] Structured Tasks: Patroller Focus.Nov 7 2022, 7:33 PM

kostajh moved this task from Needs Discussion to Triaged on the Growth-Team board.Jan 13 2023, 11:04 AM

Urbanecm_WMF removed a subscriber: MShilova_WMF.Jul 31 2023, 6:05 PM

Nemoralis subscribed.Jul 17 2024, 12:07 PM

Michael subscribed.Wed, Dec 18, 12:53 PM

Michael merged a task: T370550: revalidate/update LinkRecommendations automatically after config change.Wed, Dec 18, 6:26 PM

Idea for how this could be implemented:

Store hash (rev-id?) for link recommendation config in the DB in its own column (we are considering doing this anyway)
When the link recommendation config changes, then schedule job with the new changed hash
When the Job runs, it gets the current hash of the config and compares that to its own. If the hash is different, the job exits (config changed again -> outdated job)
Then the jobs asks the database (link-recommendation table) for 10 (100?) suggestions which do not have the hash of the job
Then the job revalidates them and stores the revalidated suggestion together with the new hash
If the job got all recommendations it requested from the DB, then it schedules itself again (because there might still be more to revalidate)

KStoller-WMF awarded a token.Wed, Dec 18, 8:31 PM

Provide a mechanism to regenerate link recommendation task pool after configuration changesOpen, Needs TriagePublicActions

Description

Completion checklist

Related ObjectsSearch...

Event Timeline

Provide a mechanism to regenerate link recommendation task pool after configuration changes
Open, Needs TriagePublic
Actions

Related Objects
Search...