Store/read informals, badwords, stopwords and other language assets on a wiki page
Open, LowestPublic
Actions

Assigned To

None

Authored By

	Ladsgroup
	Feb 23 2017, 10:34 PM

Description

Suggested by ԱշոտՏՆՂ

We'd store our lists of words on a wiki and periodically re-read from the wiki and a snapshot in revscoring

Event Timeline

Ladsgroup created this task.Feb 23 2017, 10:34 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 23 2017, 10:34 PM

Ladsgroup updated the task description. (Show Details)Feb 23 2017, 10:34 PM

Just renamed this to be a little more clear to me. I'm not quite sure how we'd set this up. We need a lot of determinism in revscoring to have things work. But it's possible that we can store snapshots in revscoring to account for changes on the wiki.

Halfak triaged this task as Lowest priority.Mar 4 2017, 5:55 PM

Halfak moved this task from Unsorted to New development on the Machine-Learning-Team board.

• bmansurov subscribed.Dec 14 2018, 4:41 PM

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptDec 14 2018, 4:41 PM

Ladsgroup unsubscribed.Apr 17 2019, 7:19 PM

Is this still wanted nowadays?

Yeah. I think this is really interesting. We'd need to so some thinking about how it could work with our pipelines for building models.

Here are some examples of existing lists, of varying quality and formats, used by other tools:

There are also some edit filters which contain such lists:

https://pt.wikipedia.org/wiki/WP:Filtro_de_edi%C3%A7%C3%B5es

And for typos:

https://en.wikipedia.org/wiki/WP:Lists_of_common_misspellings/For_machines#The_Machine-Readable_List

It is not uncommon for some good faith edit to add a new expression (or badly written regex) to such lists and then breaking (to some extent) the tools which use them (e.g. increasing its false positives).

We can probably handle the breaking changes by having a manual step where we pull a new version of a badwords list from the wiki. If fitness measure go down, we know something was broken.

Maintenance_bot moved this task from New development to Backlog/Revscoring on the Machine-Learning-Team board.Jan 19 2021, 11:38 PM

Store/read informals, badwords, stopwords and other language assets on a wiki pageOpen, LowestPublicActions

Description

Event Timeline

Store/read informals, badwords, stopwords and other language assets on a wiki page
Open, LowestPublic
Actions