Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | leila | T171224 [Objective 9.1.1] Article expansion recommendations | |||
Resolved | leila | T183039 Gather labels as ground truth for translation and synonym section classifiers | |||
Resolved | diego | T184213 Gather labels as ground truth for section synonym detection |
Event Timeline
@diego I assigned this task to you as you're working on the method for finding/surfacing synonyms now. Feel free to assign it back to me or others as work progresses.
/me is super happy that we made it this far this quarter. \o/ Great job! :)
Please find the data to labeled here: https://drive.google.com/drive/folders/1pzR3P16ck7FyrE7QgIpcSx1TPumTGA9u?usp=sharing
Those are candidates for synonyms, stratified by section-tfidf-similarty, and fasttext distance. For more details about the procedure, please check the code here: https://github.com/digitalTranshumant/wmf-interlanguage/blob/master/Synonyms.ipynb
@bmansurov : Please, now,we need to upload the sheets, just keeping the columns A (Sec_B) and B (Sec_B), and ask to volunteers to tag the in one of these three categories: synonym, related, not related.
We (@leila and me), have updated the labels, now we will use: same, overlap and different. And translated this in Spanish, and required help from staff and community for translating this labels in the other 4 languages.
We have also added 3 columns, for collecting different assessment in the case different opinions among reviewers.
@diego based on your latest updates, we seem to not aim to collect more labels for now. I resolve this task. Please re-open if you disagree.