Page MenuHomePhabricator

Update edit session-based labor hours measurements (English Wikipedia)
Open, LowPublic

Description

Requested by @MeganHernandez_WMF and @MSyed

See http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Measure_Participation_in_Wikipedia/geiger13using-preprint.pdf

Numbers in that paper are up-to-date as of March 2012. Update them and, if possible, provide a simple process to update them again next time.

Event Timeline

Halfak created this task.Jun 24 2015, 3:46 PM
Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak added a project: Research.
Halfak moved this task to Epics on the Research board.
Halfak added a subscriber: Halfak.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 24 2015, 3:46 PM
Halfak renamed this task from Update labor hours numbers to Update edit session-based labor hours measurements.Jun 24 2015, 3:47 PM
Halfak updated the task description. (Show Details)
Halfak set Security to None.
Halfak added subscribers: MeganHernandez_WMF, MSyed.
ggellerman assigned this task to Halfak.Jun 24 2015, 9:25 PM
ggellerman moved this task from Epics to Staged on the Research board.
Halfak renamed this task from Update edit session-based labor hours measurements to Update edit session-based labor hours measurements (English Wikipedia).Jul 2 2015, 10:25 PM
DarTar triaged this task as Low priority.Jul 2 2015, 10:39 PM
Halfak added a comment.Aug 4 2015, 6:18 PM

I just kicked off new queries to gather sorted lists of revisions (and revisions to deleted pages) so that I could update this analysis.

See my code here: https://github.com/halfak/mwsessions

I expect these queries to run for 24-48 hours before I can start processing data in python.

DarTar moved this task from Staged to In Progress on the Research board.Aug 6 2015, 2:32 PM

Everything went as planned. I just kicked off the data processing in python.

I crashed a few times due to some weird MySQL output in the TSV. I've cleaned that up (and written a nice, clean TSV handler to solve the problem in the future too. See https://pythonhosted.org/mysqltsv/). The process is currently on revision 150m out of about 500m

I've completed generating the dataset. Analysis is next.

Halfak moved this task from In Progress to Paused on the Research board.Sep 17 2015, 9:55 PM

Datasets are ready. I'm putting this on the back burner until I can get some movement on T99172

Aklapper removed Halfak as the assignee of this task.Jun 19 2020, 4:29 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)