Page MenuHomePhabricator

Design a data collection pilot using WikiLabels platform (mining reasons)
Closed, ResolvedPublic

Description

Pilot 1: Understanding reasons for adding inline citations

Following the ideas in the meta page,
We implement the pilot to mine reasons for adding citations https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/Citation_Reason_Pilot

  • Design a usable and effective form to collect annotations on statements. @Capt_Swing @Miriam

https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements#Collecting_annotations:_WikiLabels

  • Collect statement samples to be annotated @Miriam

https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements#Collecting_Statements_Data

https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/WikiLabels:_How-to

  • Advertise WL experiment T191041
  • Run the pilot for 3 languages T186353

Event Timeline

Miriam triaged this task as High priority.Feb 2 2018, 7:55 PM
Miriam created this task.

I've added Miriam's form to https://github.com/kodchi/wikilabels/commit/340d53b00009b02a89f1a931298ce149b1ce8487. The commit also includes a view called HighlightedCitation, which isn't quite ready yet.

Created a WikiLabels pull request: https://github.com/wiki-ai/wikilabels/pull/231.

Here's the script that takes input from Besnik's files and outpus a JSON file that can be imported into WikiLabels.

I left some notes. Looks like it's pretty close to merge. Thanks for your work :)

Thanks for the review, @Halfak. I've updated the patch. I've also shared the tasks files (via Google drive) with you and @Miriam.

Missing information:
@Capt_Swing @bmansurov can you help suggesting the right values for these fields?

 <wiki>                  Wiki database id, for example fawiki, dewiki, etc.
<name>                  Name of campaign, note that it will return error if you define a duplicate name. @Capt_Swing @Miriam 
<form>                  The name of the form  @bmansurov 
<view>                  The view for tasks @bmansurov 
<labels-per-task>       The number times a task can be assigned to different labelers @Capt_Swing 
<tasks-per-assignment>  The number of tasks assigned per workset @Capt_Swing
  • wiki is either enwiki, frwiki, or itwiki.
  • form is missing_citations
  • view is RenderedHTML

OK I have the wikilabels PR merged. :)

Feel free to schedule a campaign-loading hack session on my calendar. I'm UTC-5 and I'll be AFK this Friday, but otherwise available.

Miriam renamed this task from Design a data collection experiment using WikiLabels platform to Design a data collection pilot using WikiLabels platform .Mar 29 2018, 5:36 PM
Miriam updated the task description. (Show Details)
Miriam updated the task description. (Show Details)
Miriam updated the task description. (Show Details)
Miriam renamed this task from Design a data collection pilot using WikiLabels platform to Design a data collection pilot using WikiLabels platform (mining reasons).Mar 29 2018, 5:44 PM
<wiki>                  Wiki database id, for example fawiki, dewiki, etc.

enwiki for now

<name>                  Name of campaign, note that it will return error if you define a duplicate name. @Capt_Swing @Miriam

Identify statements needing citation? @Capt_Swing what do you think?

<labels-per-task>       The number times a task can be assigned to different labelers @Capt_Swing

1 for now

<tasks-per-assignment>  The number of tasks assigned per workset @Capt_Swing

10?

@Miriam, I've set up a testing environment at http://research-wikilabels.wmflabs.org. It has the latest data from Besnik. In order to get it working, for now, you'll have to manually change localhost:8080 to research-wikilabels.wmflabs.org upon logging in with mediawiki oauth.

@Miriam I've made the changes you requested. Here's the summary:

  1. Reference numbers are visible;
  2. The form has been updated to include drop-downs.

You can see the change at http://research-wikilabels.wmflabs.org/. Let me know if everything looks good, and I'll make a pull request.