Page MenuHomePhabricator

Write a Ruby gem for analyzing Wikidata edits
Closed, ResolvedPublic

Description

IMPORTANT: Make sure to read the Outreachy participant instructions and communication guidelines thoroughly before commenting on this task. This space is for project-specific questions, so avoid asking questions about getting started, setting up Gerrit, etc. When in doubt, ask your question on Zulip first!

Brief summary

Write a Ruby library (gem) for parsing the differences between Wikidata revisions and extracting statistics about what changed. This library is needed for improving Wikidata statistics on Wiki Education Dashboard and Programs & Events Dashboard (and may have other uses as well).

There are currently no tools for analyzing Wikidata edits to accurately determine statistics like the number of qualifiers added, the number of references added, and so on. Wikidata edit summaries usually indicate what was changed, but are comprehensive. Consider this edit, for example: https://www.wikidata.org/w/index.php?title=Q111269579&diff=1596238100&oldid=1596236983

The Wikidata API can provide JSON representations of an item at a specific revision, so we can analyze a revision by comparing the JSON representations of that revision and its parent.

Some of the steps that could be involved in this project:

  • Write code that can fetch JSON representations of a Wikidata item and isolate the differences
  • Explore Wikidata to find a diverse set of example edits representing different kinds of changes
  • Research what statistics would be possible to extract via these JSON differences, and create methods for generating these statistics
  • Write a suite of tests to demonstrate that the example edits are processed correctly
  • Add methods to efficiently fetch and analyze large batches of Wikidata edits at once
  • Coordinate with mentor(s) to publish the gem
  • Stretch Goal: Integrate the gem into Wiki Education Dashboard

Skills required

  • Ruby
  • Familiarity with wikis and/or Wikidata helpful but not required
  • Experience with or interested in user research helpful but not required

Possible mentor(s)

  • Sage Ross (Ragesoss)
  • Will Kent

Microtasks

Event Timeline

srishakatux changed the visibility from "Public (No Login Required)" to "acl*outreachy-mentors (Project)".Feb 2 2023, 3:38 AM

@Ragesoss Thanks for adding this! Whenever you feel ready, you can upload this on the Outreachy's program website as explained in step 4 here: https://www.mediawiki.org/wiki/Outreachy/Mentors#_Before_the_program.

srishakatux changed the visibility from "acl*outreachy-mentors (Project)" to "Public (No Login Required)".Mar 6 2023, 8:01 PM

Hi! Please consider resolving this task and moving any pending items to a new task, as GSoC/Outreachy rounds are now over, and this workboard will soon be archived.