Page MenuHomePhabricator

Scale up the number of observations for idwiki to 100k
Closed, ResolvedPublic

Description

This is a task related to Wikimedia's latest experiments using Artificial Intelligence to judge (score) the quality of human edits on wiki pages.

Your task is to scale up the number of observations for idwiki (that is the internal name for the database behind the Wikipedia in Indonesian language) to 100000, by providing a patch in GitHub.

For a previous example how to add new observations, see: https://github.com/wiki-ai/editquality/pull/48/files

In order to change it you need to change the query in Quarry (which allows running SQL queries against Wikipedia & other databases from your browser) at https://quarry.wmflabs.org/query/12494 and re-run the feature extraction, etc.

Event Timeline

Halfak triaged this task as Low priority.Oct 3 2016, 4:50 PM
Halfak moved this task from Unsorted to Ideas on the Machine-Learning-Team board.

@Ladsgroup: Could you provide more information for contributors? Which code base is this about, any documentation explaining how to change the "number of observations" for certain wikis? Thanks!

@Ladsgroup: Could you provide more information for contributors? Which code base is this about, any documentation explaining how to change the "number of observations" for certain wikis? Thanks!

Okay, As an example of adding new observations you can see: https://github.com/wiki-ai/editquality/pull/48/files
In order to change it they just need to change the query in quarry: https://quarry.wmflabs.org/query/12494 and re-run the feature extraction, etc.

Copying @Phantom42's comment from the GCI site here:

"Just a bit of progress report: everything is okay. I successfully created SQL query and edited Makefile. Now I am generating new models, no problem too, but just need to wait a bit.
Actually, is it okay if the resulting number of observations will be slightly smaller than 100k? Because of network errors or something like that, the result will be around 99976. Is it okay? If not, I will try to restart."

It's okay to be less than 100K it happens due to deleted revision, or deleted parent revision, etc.

Task completed. Link to github pull request: https://github.com/wiki-ai/editquality/pull/53
If anything is wrong, please tell and I will fix it.

Aklapper assigned this task to nikitavbv.

https://github.com/wiki-ai/editquality/pull/53 got merged hence closing this task as resolved. Thank you a lot!