- discuss sampling strategy
- generate sample
- create campaign at labels.wmflabs.org
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • johl | T127047 Collection of topics for HPI hackathon | |||
Resolved | awight | T187836 [Epic] Audit of pending ORES GUI deployments | |||
Resolved | Glorian_WD | T127470 Deploy item quality classification model for Wikidata | |||
Resolved | Glorian_WD | T157498 Train/test item quality model for Wikidata | |||
Resolved | Glorian_WD | T157495 Complete Wikidata item quality campaign | |||
Resolved | Halfak | T157493 Deploy Wikidata item quality campaign | |||
Resolved | Halfak | T159570 Deploy the pilot of Wikidata item quality campaign | |||
Resolved | Halfak | T155828 Design item_quality form for Wikidata | |||
Resolved | Glorian_WD | T157489 [Discuss] item quality in Wikidata | |||
Resolved | Halfak | T160256 Wikidata items render badly in Wikilabels | |||
Resolved | Glorian_WD | T162530 Implement "unwanted pages" filtering strategy for Wikidata |
Event Timeline
Comment Actions
I think we'll want a stratified sample for the labeling campaign so that we don't end up with 99% of items in the lowest quality strata. We can probably stratify using some basic heuristics, but which ones? # of statements? We can probably get away without labeling any of the showcase items since they have already been reviewed. How many showcase items are there?
Comment Actions
halfak@wikilabels-01:~/datasets$ sudo -u www-data /srv/wikilabels/venv/bin/wikilabels new_campaign wikidatawiki "Item quality (5k stratified)" item_quality PrintablePageAsOfRevision 1 10 --config /srv/wikilabels/config/config/ {'active': True, 'labels_per_task': 1, 'tasks_per_assignment': 10, 'view': 'PrintablePageAsOfRevision', 'name': 'Item quality (5k stratified)', 'form': 'item_quality', 'id': 51, 'created': datetime.datetime(2017, 4, 8, 18, 20, 48, 468435), 'wiki': 'wikidatawiki'} halfak@wikilabels-01:~/datasets$ cat wikidatawiki.stratified_revisions.5k_sample.json | sudo -u www-data /srv/wikilabels/venv/bin/wikilabels task_inserts --config /srv/wikilabels/config/config/ 51
Comment Actions
@Glorian_WD updated the sampling strategy so I re-deployed. See http://labels.wmflabs.org/campaigns/wikidatawiki/52/?campaign=stats
halfak@wikilabels-01:~$ sudo -u www-data /srv/wikilabels/venv/bin/wikilabels new_campaign wikidatawiki "Item quality (5k sample)" item_quality PrintablePageAsOfRevision 1 10 --config /srv/wikilabels/config/config/ {'view': 'PrintablePageAsOfRevision', 'wiki': 'wikidatawiki', 'form': 'item_quality', 'labels_per_task': 1, 'created': datetime.datetime(2017, 4, 10, 18, 19, 0, 152014), 'active': True, 'id': 52, 'tasks_per_assignment': 10, 'name': 'Item quality (5k sample)'} halfak@wikilabels-01:~$ cat datasets/wikidatawiki.stratified_revisions.5k_sample.json | sudo -u www-data /srv/wikilabels/venv/bin/wikilabels task_inserts --config /srv/wikilabels/config/config/ 52
Comment Actions
There was a bug. So I re-deployed again. See http://labels.wmflabs.org/campaigns/wikidatawiki/53/?campaign=stats