Fix makefile entry for enwiktionary.rev_reverted.20k_2016.tsv
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Sep 2 2016, 3:26 PM

Description

It looks like there's a file checked in for this. The file looks very different (has way more reverted edits) than what is generated by the Makefile command. The Makefile command should reflect how the file was really generated.

Event Timeline

Halfak created this task.Sep 2 2016, 3:26 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 2 2016, 3:26 PM

See https://github.com/wiki-ai/editquality/blob/master/datasets/enwiktionary.rev_reverted.20k_2016.tsv

It looks like maybe this file was generated by sampling from the 200k sample file. I'm going to try generating a dataset from that larger file.

$ cat datasets/enwiktionary.prelabeled_revisions.200k_2016.tsv | grep "reverted" | wc
    821    3284   22988
$ cat datasets/enwiktionary.rev_reverted.20k_2016.tsv | grep "True" | wc
    815    1630   11410

Well, that looks like a promising direction.

$ cat datasets/enwiktionary.rev_reverted.20k_2016.tsv | grep "True" | cut -f1 | sort | head
32446761
32446914
32447343
32447567
32448513
32451957
32452977
32462224
32466357
32468155

$ cat datasets/enwiktionary.prelabeled_revisions.200k_2016.tsv | grep "reverted" | cut -f1 | sort | head
32446761
32446914
32447343
32447567
32448513
32451957
32453530
32462299
32462964
32466357

OK. My plan is to run label_reverted on the 200k dataset and then do this:

(head -n1 datasets/enwiktionary.rev_reverted.200k_2016.tsv;
 (tail -n+2 datasets/enwiktionary.rev_reverted.200k_2016.tsv | \
  grep "False" | shuf -n 20000;
  tail -n+2 datasets/enwiktionary.rev_reverted.200k_2016.tsv | \
  grep "True") | shuf;) >
datasets/enwiktionary.rev_reverted.weighted.20k_2016.tsv

https://github.com/wiki-ai/editquality/pull/46

Halfak claimed this task.Sep 2 2016, 10:44 PM

Halfak moved this task from Parked to Review on the Machine-Learning-Team (Active Tasks) board.

Halfak moved this task from Review to Completed on the Machine-Learning-Team (Active Tasks) board.Sep 6 2016, 8:21 PM

Halfak closed this task as Resolved.Sep 7 2016, 1:28 AM

Fix makefile entry for enwiktionary.rev_reverted.20k_2016.tsvClosed, ResolvedPublicActions

Description

Event Timeline

Fix makefile entry for enwiktionary.rev_reverted.20k_2016.tsv
Closed, ResolvedPublic
Actions