Denylist for language agnostic revert risk model
Open, MediumPublic
Actions

Assigned To

Authored By

	fkaelin
	Jul 29 2023, 2:40 AM

Description

Add denylist of words. First, research to what it means. Adding regex to current model might be tricky since it needs to be fast

Caveats:

Regexes are not available for all languages but the premise for the language agnostic model is that it is able to support all languages.
Testing a large number of regexes against every revision might be slow. Current throughput for the model hosted on Liftwing is 202.34 queries per second when queried by 25 parallel workers.

Tasks:

Incorporating the regexes from ORES into the model and benchmarking performance. [Low]
If too slow, explore other options like multiple substring matching algorithms e.g. Aho Corasick etc. [Medium]
Look into building automated blacklist of words in all languages [High]

Related Objects
Search...

		Status	Subtype	Assigned	Task
		In Progress		diego	T314384 Develop a ML-based service to predict reverts on Wikipedia(s)
		Open		MunizaA	T343061 Denylist for language agnostic revert risk model

Event Timeline

fkaelin triaged this task as Medium priority.Jul 29 2023, 2:40 AM

fkaelin created this task.

fkaelin moved this task from Backlog to Staged on the Research board.

fkaelin set Due Date to Aug 31 2023, 4:00 AM.

@MunizaA hi! Just checking if this task is on a track to be delivered by Thursday, or do you need more time (due date on the task is Thursday) thanks!

fkaelin changed Due Date from Aug 31 2023, 4:00 AM to Nov 30 2023, 5:00 AM.Oct 6 2023, 1:51 AM

KCVelaga_WMF subscribed.Jan 23 2024, 7:48 PM

@MunizaA / @fkaelin: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

Removing due date and moving to backlog to prioritize.

fkaelin removed Due Date.Tue, Apr 16, 2:57 PM

Denylist for language agnostic revert risk modelOpen, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Denylist for language agnostic revert risk model
Open, MediumPublic
Actions

Related Objects
Search...