Page MenuHomePhabricator

Update edit session-based labor hours measurements (English Wikipedia)
Open, LowPublic

Description

Requested by @MeganHernandez_WMF and @MSyed

See http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf

Numbers in that paper are up-to-date as of March 2012. Update them and, if possible, provide a simple process to update them again next time.

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak added a project: Research.
Halfak moved this task to Epics on the Research board.
Halfak subscribed.
Halfak renamed this task from Update labor hours numbers to Update edit session-based labor hours measurements.Jun 24 2015, 3:47 PM
Halfak updated the task description. (Show Details)
Halfak set Security to None.
Halfak added subscribers: MeganHernandez_WMF, MSyed.
Halfak renamed this task from Update edit session-based labor hours measurements to Update edit session-based labor hours measurements (English Wikipedia).Jul 2 2015, 10:25 PM

I just kicked off new queries to gather sorted lists of revisions (and revisions to deleted pages) so that I could update this analysis.

See my code here: https://github.com/halfak/mwsessions

I expect these queries to run for 24-48 hours before I can start processing data in python.

Everything went as planned. I just kicked off the data processing in python.

I crashed a few times due to some weird MySQL output in the TSV. I've cleaned that up (and written a nice, clean TSV handler to solve the problem in the future too. See https://pythonhosted.org/mysqltsv/). The process is currently on revision 150m out of about 500m

I've completed generating the dataset. Analysis is next.

Datasets are ready. I'm putting this on the back burner until I can get some movement on T99172

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)