Page MenuHomePhabricator

Scale up the number of observations for idwiki to 100k
Closed, ResolvedPublic

Description

This is a task related to Wikimedia's latest experiments using Artificial Intelligence to judge (score) the quality of human edits on wiki pages.

Your task is to scale up the number of observations for idwiki (that is the internal name for the database behind the Wikipedia in Indonesian language) to 100000, by providing a patch in GitHub.

For a previous example how to add new observations, see: https://github.com/wiki-ai/editquality/pull/48/files

In order to change it you need to change the query in Quarry (which allows running SQL queries against Wikipedia & other databases from your browser) at https://quarry.wmflabs.org/query/12494 and re-run the feature extraction, etc.

Event Timeline

Halfak created this task.Sep 30 2016, 10:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 30 2016, 10:34 PM
Halfak triaged this task as Low priority.Oct 3 2016, 4:50 PM
Halfak moved this task from Untriaged to Ideas on the Machine Learning Platform board.
Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptNov 1 2016, 7:22 PM
Ladsgroup added a subscriber: Ladsgroup.

I plan to mentor this in GCI

@Ladsgroup: Could you provide more information for contributors? Which code base is this about, any documentation explaining how to change the "number of observations" for certain wikis? Thanks!

@Ladsgroup: Could you provide more information for contributors? Which code base is this about, any documentation explaining how to change the "number of observations" for certain wikis? Thanks!

Okay, As an example of adding new observations you can see: https://github.com/wiki-ai/editquality/pull/48/files
In order to change it they just need to change the query in quarry: https://quarry.wmflabs.org/query/12494 and re-run the feature extraction, etc.

Aklapper updated the task description. (Show Details)Dec 9 2016, 8:12 PM

Copying @Phantom42's comment from the GCI site here:

"Just a bit of progress report: everything is okay. I successfully created SQL query and edited Makefile. Now I am generating new models, no problem too, but just need to wait a bit.
Actually, is it okay if the resulting number of observations will be slightly smaller than 100k? Because of network errors or something like that, the result will be around 99976. Is it okay? If not, I will try to restart."

It's okay to be less than 100K it happens due to deleted revision, or deleted parent revision, etc.

Task completed. Link to github pull request: https://github.com/wiki-ai/editquality/pull/53
If anything is wrong, please tell and I will fix it.

Aklapper closed this task as Resolved.Dec 21 2016, 3:23 AM
Aklapper assigned this task to nikitavbv.

https://github.com/wiki-ai/editquality/pull/53 got merged hence closing this task as resolved. Thank you a lot!