Page MenuHomePhabricator

Store/read informals, badwords, stopwords and other language assets on a wiki page
Open, LowestPublic

Description

Suggested by ԱշոտՏՆՂ

We'd store our lists of words on a wiki and periodically re-read from the wiki and a snapshot in revscoring

Event Timeline

Halfak renamed this task from Make ORES/revscoring read mediawiki pages for swear words and bad words to Store/read informals, badwords, stopwords and other language assets on a wiki page.Mar 4 2017, 5:45 PM
Halfak updated the task description. (Show Details)
Halfak subscribed.

Just renamed this to be a little more clear to me. I'm not quite sure how we'd set this up. We need a lot of determinism in revscoring to have things work. But it's possible that we can store snapshots in revscoring to account for changes on the wiki.

Halfak triaged this task as Lowest priority.Mar 4 2017, 5:55 PM
Halfak moved this task from Unsorted to New development on the Machine-Learning-Team board.

Yeah. I think this is really interesting. We'd need to so some thinking about how it could work with our pipelines for building models.

It is not uncommon for some good faith edit to add a new expression (or badly written regex) to such lists and then breaking (to some extent) the tools which use them (e.g. increasing its false positives).

We can probably handle the breaking changes by having a manual step where we pull a new version of a badwords list from the wiki. If fitness measure go down, we know something was broken.