Page MenuHomePhabricator

Denylist for language agnostic revert risk model
Open, MediumPublic

Description

Add denylist of words. First, research to what it means. Adding regex to current model might be tricky since it needs to be fast

Caveats:

  • Regexes are not available for all languages but the premise for the language agnostic model is that it is able to support all languages.
  • Testing a large number of regexes against every revision might be slow. Current throughput for the model hosted on Liftwing is 202.34 queries per second when queried by 25 parallel workers.

Tasks:

  • Incorporating the regexes from ORES into the model and benchmarking performance. [Low]
  • If too slow, explore other options like multiple substring matching algorithms e.g. Aho Corasick etc. [Medium]
  • Look into building automated blacklist of words in all languages [High]

Event Timeline

fkaelin triaged this task as Medium priority.Jul 29 2023, 2:40 AM
fkaelin created this task.
fkaelin moved this task from Backlog to Staged on the Research board.
fkaelin set Due Date to Aug 31 2023, 4:00 AM.

@MunizaA hi! Just checking if this task is on a track to be delivered by Thursday, or do you need more time (due date on the task is Thursday) thanks!

fkaelin changed Due Date from Aug 31 2023, 4:00 AM to Nov 30 2023, 5:00 AM.Oct 6 2023, 1:51 AM

@MunizaA / @fkaelin: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

Removing due date and moving to backlog to prioritize.