Page MenuHomePhabricator

Add language support for Albanian
Closed, ResolvedPublic

Description

In parallel we need a list of "badwords" and "informal words".

  • Bad words are words that would be commonly associated with vandalism. They are generally used to insult or be vulgar. This includes curse words, racial slurs, assertions of- and prejudices against sexual preferences.
  • Informal words are words unwelcome on article namespace but would be acceptable on talk pages. This would include words such as 'hello' or 'hahaha' which would be fine in discussions but not in articles.

See - https://www.mediawiki.org/wiki/ORES/BWDS_review

Event Timeline

Sumit created this task.Jun 20 2017, 9:18 AM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJun 20 2017, 9:18 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Sumit updated the task description. (Show Details)Jun 20 2017, 9:23 AM

@Halfak @Ladsgroup I'm not familiar with automatic Bad-words list creation and their review, any pointers where to look for?

Currently, @Ladsgroup has a nice framework for running the Bad-Words-Detection-System. I have an old pull request to modernize it, but that's not ready for use quite yet.

It looks like we don't have output for Albanian yet. https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sq @Ladsgroup, could you start a new run of BWDS?

Started, ping me if it's not there after 24 hours.

Looks like the generated list is there.

Instructions: https://www.mediawiki.org/wiki/ORES/BWDS_review

Ping: @Margott, @Liridon, and @Arianit.

Thanks!

Halfak assigned this task to Sumit.Jun 26 2017, 4:40 PM
Sumit added a comment.Jul 1 2017, 1:52 PM

Hi @Margott @Liridon @Arianit Can you please goto https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sq and segregate the generated list words into badwords and informal words. Refer to task description for badwords and informal words definition.

Let us know in case of any issue! and Thanks for your help!

Sumit added a comment.Jul 1 2017, 2:05 PM

Left the following note on their talk pages:

Hi
Can you goto https://phabricator.wikimedia.org/T168369 and see if you can help in segregating a list of about 250 words in Albanian into badwords and informal words. We need these lists to help build damaging and goodfaith models for Albanian Wikipedia. A good way to do that would be to edit the https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sq and simply copy the generated list to badwords and informal words and remove the words that do not fall in the respective category. Your help is much appreciated! Let me know or leave a comment on the task itself in case of any issue.-Thanks!

I'll get to it in a week or so it is not done until then.

Le samedi 1 juillet 2017, Sumit <no-reply@phabricator.wikimedia.org> a
écrit :

Sumit added a comment.

Hi @Margott https://phabricator.wikimedia.org/p/Margott/ @Liridon
https://phabricator.wikimedia.org/p/Liridon/ @Arianit
https://phabricator.wikimedia.org/p/Arianit/ Can you please goto
https://meta.wikimedia.org/wiki/Research:Revision_
scoring_as_a_service/Word_lists/sq and segregate the generated list words
into badwords and informal words. Refer to task description for badwords
and informal words definition.

Let us know in case of any issue! and Thanks for your help!

*TASK DETAIL*
https://phabricator.wikimedia.org/T168369

*EMAIL PREFERENCES*
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

*To: *Sumit
*Cc: *Liridon, Margott, Arianit, Ladsgroup, Halfak, Sumit, Aklapper,
bkowshik, Avner, Ricordisamoa, He7d3r

Hi, we're done, please see here https://www.mediawiki.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sq

Let us know if you seen any problems.

Thanks!

Sumit added a comment.Jul 10 2017, 7:18 PM

Hi, we're done, please see here https://www.mediawiki.org/wiki/Research:Revision_scoring_as_a_service/Word_lists/sq

Let us know if you seen any problems.

Thanks!

Thanks @Arianit for your support! We'll get to adding support for Albanian now.

Halfak closed this task as Resolved.