Overview
When seeking to understand what changes were made by a given Wikipedia edit, there are three general sources of information: the actual diff of an edit (example), the edit summary, and edit tags. Edit tags are generally pretty simple and constrained to tool information or basic regexes -- e.g., whether an edit occurred via mobile or easy-to-define content changes like blanked a section or a new, short article). The edit diff is much richer -- showing the specific changes made in context -- but takes up too much visual space for tasks such as quick browsing of an edit history or patrolling for suspicious edits. The edit summary is intended to be a good middle ground -- a succinct but flexible, editor-provided description of what the edit did (and maybe why).
Edit summaries have a number of drawbacks but also are very valuable. Vandals may provide misleading summaries and many editors leave the summary blank or use a canned edit summary that may or may not accurately reflect the impact of the edit. Despite these drawbacks, edit summaries are invaluable for tasks such as understanding work on Wikipedia, building datasets of edits for study, and contextualizing edits.
Task
This research will focus on understanding current usage of edit summaries in order to identify potential opportunities for tool-building to better support researchers and patrollers. It will primarily be qualitative research. A potential set of steps include:
- Read documentation on edit summaries and go through the edit history for some random articles to get a sense of how they are used.
- Gather initial sample of last three edits to 50 random articles (sticking to just namespace 0 for now -- i.e. the standard Wikipedia article)
- Code your sample for types of edit summaries and how well the edit summary described the edit. This should be multi-faceted -- e.g., was the summary misleading? was it complete? did it say what and why? was it manual, canned, or automated? Ideas for what sorts of questions to be asking for will initially come from the documentation -- i.e. evaluating how well edit summaries meet what they are recommended to contain. You likely will want to do this iteratively and revise what codes you apply as you see more summaries and notice new patterns too.
- NOTE: this can be for English Wikipedia and/or any other languages the researcher can read fluently.
- NOTE: this is combining both the conventional and directed approaches described in Hsieh and Shannon or inductive and deductive by Muller.
- Expand sample and apply codes to new edit summaries (and identify any new codes that should be added).
- NOTE: this can be more edits, more languages, or perhaps a stratified sample that looks at specific types of articles such as more controversial or heavily-edited articles.
- NOTE: this is similar to the summative approach described in in Hsieh and Shannon.
- Write up taxonomy of edit summaries, common issues with edit summaries, and how common they are based on qualitative coding.
- For each issue, propose what approaches / tools might help to address the issue.
This task is considered [medium]. In general, it's expected that the task will take a a month or two of consistent work and is a good fit for someone with some research experience or interest in being involved in research. The actual time needed, however, will depend greatly on your level of experience.
Rationale
In order to make edit summaries more useful, they likely need to be more complete and trustworthy. This research is a first step to identifying how large of a challenge it would be to augment edit summaries -- either via editor support or addition of edit tags. Depending on the outcomes, it may lead to creating datasets for further research, new tooling, or design recommendations.
Recommended Skills
- This task primarily requires some experience with qualitative methods and, in particular, qualitative coding.
- It might be possible to automate aspects of the sampling -- e.g., using the Random API and Python -- but that is not necessary.
Acceptance Criteria
- The output of this task will be a Meta report describing the research and findings (example). Depending on researcher and mentor interest, this could be expanded into a more formal publication.
- At each stage of the analysis, the choices should be carefully grounded in past research and validated by the mentor.
- Depending on the approach, PAWS-hosted Jupyter notebooks might also be a viable option for sharing progress.
Process
- If you are interested in this task and it is not assigned to anyone, you may begin work on it. Please leave a comment on the task and tag @Isaac so that he is aware.
- If you have made some progress on the task (at least 10-20 edit summaries and your initial codes) and would like to continue, share a link to your current draft and let @Isaac know so that he can assign the task to you and help you to plot out the next steps.
- Generally, @Isaac will be able to answer any questions about the task and try to respond quickly when clarification is necessary but response times may be slow if help is needed for more general debugging etc.
Resources
- Content analysis: Hsieh and Shannon
- Edit summary documentation / recommendations: English Wikipedia; Meta
- How to "add a project" to Meta (@Isaac will let you know when it's a good point to do this): https://meta.wikimedia.org/wiki/Research:Projects