Page MenuHomePhabricator

GSoC 2019 Proposal: Build statistics toolset to support WM-HU editor retention grant
Closed, DeclinedPublic


Profile Information

Name: Harshit Sharma
University: IIT (ISM) Dhanbad
Github: @SHarma239
Twitter: @SharmaHS24
Location: India (UTC + 5:30)
Working hours: Between 2 pm and 12 pm UTC + 5:30

GSoC Proposal Submitted: GSoC2019Wikimedia


About the Project:

The goal of this project is to help the Hungarian Wikipedia community in decreasing the negative experiences and strengthening the positive experiences of the contributors. The tool developed in this project displays statistics and editor lists relevant for the Hungarian Wikipedia Editor Retention grant. Basically, the project aims at increasing the participation(number of participants and their activity) in the Hungarian Wikimedia community. The project highlights the efforts of contributors or editors by displaying clearly the statistics via the toolset. This in turn will result in improving the Wikimedia Hungarian community atmosphere.


  • Top lists of editors who perform a certain task in a given span of time
  • Lists of editors who have been recently transitioning from one group to another
  • List of users identified with unique contributions
  • Funnel view of the Hungarian Wikipedia community
  • Filters for manually provided username
  • Cohort view of the editor community grouped by year or month
  • API to translate records into a machine-readable form
  • Quick feedback option for newbie contributors
  • Frequent highlights for top users from all lists
  • Manual for using the tool
  • Toolset to export the statistical data to a desired format

I plan to follow nuanced lateral approach(es) like funnel analysis for avoiding possible predictable problems in building of the toolset, thereby avoiding predictable mistakes before attempting and losing precious time rather than attempting and then realizing and changing and following a new approach.

I’ve been working on the microtasks for a while now. I made many attempts at displaying results in the microtasks assigned.

Possible Mentors:


1) Pre-Community Bonding Period [ Apr 14 - May 6 ]


  • Gaining information about internal & external factors influencing the size of the active volunteer community
  • Know about contributions in technical areas like Gadgets, Extensions, Skins, Bots
  • Go through the problems deeply that are existing now, like no systematic way to follow editors, impatient reactions of new editors who contribute
  • Predicting the extent of the problem that can be solved
  • Planning to follow such an approach as to extend the extent of the solution to the problems
  • Planning to increase user friendliness at every stage of the toolset
  • Bug fixes of similar environment projects
  • Study past model usage by Objective Revision Evaluation Service(ORES)
  • Study past approaches followed by WikiMedia for solving giant problems or fixing loopholes like Replacing Tidy with HTML5 Parser To avoid possible fatal approaches in the building of the toolset
  • Analyze and decide optional vs compulsory features of the toolset


  • Blogs about the researched and analyzed information
  • Main areas to focus to solve while building the toolset
  • Algorithms/models/approaches to be made in the coding period

2) Community Bonding Period [ May 6 - May 26 ]


  • Communicate and bond with mentors.
  • Discuss APIs to be used during the project.
  • Getting familiar with the database scheme of many Manual Wikidatabases for all relevant tables and attributes
  • Create issues for the project
  • Getting familiar with the Toolforge environment, Slim framework, Heroku and Wikimedia API


  • Bonding report(s)
  • Blog(s) about the experience and the tool framework structure

3) Week 1 [ May 27 - June 2 ]


  • Fixing bugs in listing queries on Quarry
  • Work on queries like displaying top list of editors in last 30 days (in/)excluding 'bots'


  • Working SQL Queries displaying top lists
  • List of still active users with negligible edits
  • List of now-inactive users with maximum edits

4) Weekend 1 [ June 1 | June 2 ]


  • Test deliverables and revisit reports of Week 1

5) Week 2 [ June 3 - June 9 ]


  • Extracting and analyzing data obtained in JSON formats from Wikimedia APIs
  • Analyzing the JSON data to calculate desired fields like repetitive entries
  • Website displaying the list view


  • Localhost/Github Pages simple website to display lists of users based on a certain parameter like most or least edits
  • Documentation of the produced work

6) Weekend 2 [ June 8 | June 9 ]


  • Test deliverables and revisit reports of Week 1 & Week 2

7) Week 3 [June 10 - June 16]


  • Making the website user friendly
  • Add features like show the relative size of each group, historic trends, retention rate over time


  • Working site displaying the options for user to access the statistics-view easily

8) Weekend 3 [ June 15 | June 16 ]


  • Test deliverables and revisit reports of Week 1 & Week 2 & Week 3

9) Week 4 [ June 17 - June 23 ]


  • Making the website responsive to enable web app view
  • Make simple basic tool on Heroku for application/use of the web app view enabled
  • Documentation and testing of the above features and apps.


  • Web-app version of the tool
  • Documentation of above work.

10) Weekend 4 [ June 22 | June 23 ]


  • Test deliverables and revisit reports of Week 3 & Week 4

~~~~~~~~~~~~~~~~~~~~~~~ First Evaluation [ June 24 - June 28 ] ~~~~~~~~~~~~~~~~~~~~~~~

11) Week 5 [ June 24 - June 30 ]


  • Testing old features and adding new features to the developed tools
  • Comparing corresponding users/results in lists obtained from different parameters (Funnel Analysis Approach)


  • Comparison results of lists
  • Documentation of new features

12) Weekend 5 [ June 29 | June 30 ]


  • Test deliverables and revisit reports of Week 4 & Week 5

13) Week 6 [ July 1 - July 7 ]


  • Make tool on Toolforge for statistics of number of edits and number of editors
  • Modifying tool to show filters for manually provided username
  • Categorizing users based on previous week's comparison results (Funnel Analysis Approach)


  • Working toolset on toolforge
  • Different categorical display of users

14) Weekend 6 [ July 6 | July 7 ]


  • Test deliverables and revisit reports of Week 3 & Week 5 & Week 6

15) Week 7 [ July 8 - July 14 ]


  • Developing tool to translate results/lists to machine readable form
  • Developing algorithm(s) to filter/extract record lists of those users who break off at a certain point/become inactive/leave groups


  • Translated results/lists
  • Documentation

16) Weekend 7 [ July 13 | July 14 ]


  • Test deliverables and revisit reports of Week 4 & Week 6 & Week 7

17) Week 8 [ July 15 - July 21 ]


  • Adding specialized highlight-view for extreme users(users with corner cases)
  • Developing funnel view on a tool with the help of the developed algorithm to point out breaking points

-Testing of above features


  • Prototype of funnel view
  • Documentation

18) Weekend 8 [ July 20 | July 21 ]


  • Test deliverables and revisit reports of Week 5 & Week 7 & Week 8

~~~~~~~~~~~~~~~~~~~~~~~ Second Evaluation [ July 22 – July 26 ] ~~~~~~~~~~~~~~~~~~~~~~~

19) Week 9 [ July 22 - July 28 ]


  • Augmenting toolset's features with option to export lists to desired format
  • Adding feature to highlight users who are target for intervention
  • Add feature in tool to pull in data from ORES and/or FlaggedRevs(Review API)


  • New features using ORES data
  • Export feature
  • Highlight feature

20) Weekend 9 [ July 27 | July 28 ]


  • Test deliverables and revisit reports of Week 6 & Week 8 & Week 9

21) Week 10 [ July 29 - August 4 ]


  • Polishing the funnel view of the tool
  • Use data pulled in from ORES to make intelligent decisions like quality of user/expected date of discontinuation
  • Adding internationalization features like translation, number or date formats with the help of attributes like rev_timestamp


  • Finalized funnel view
  • Translated results/lists

22) Weekend 10 [ August 3 | August 4 ]


  • Test deliverables and revisit reports of Week 7 & Week 9 & Week 10

23) Week 11 [ August 5 - August 11 ]


  • Altering different feature sections to make building blocks independent
  • Use data from the edit history reconstruction project.
  • Add features like ratio of the total amount of certain tasks users perform
  • Prepare descriptive manual for using the tool


  • Independent building blocks
  • New features like ratio of portion of certain tasks
  • Descriptive manual

24) Weekend 11 [ August 10 | August 11 ]


  • Test deliverables and revisit reports of Week 8 & Week 10 & Week 11

25) Week 12 [ August 11 - August 18 ]

If time allows,

  • Giving final touch to this tool Wikimedia Stats
  • Discussing and adding details for improvements
  • Make the tool client-usable


  • Launching/Releasing the tool to be used, just like beta testing

26) Weekend 12 [ August 17 | August 18 ]

-Test deliverables and revisit reports of Week 9 & Week 11 & Week 12

27) Final Week [ August 19 - August 25 ]


  • Getting feedback from users, mentors and other community members on all channels.
  • Working on project presentation.
  • Manual testing for corner cases
  • Working on the documentation for final submission.


  • Improvements based on the feedback received from mentors and community members
  • Project presentation

~~~~~~~~~~~~~~~~~~~~~~~ Final Evaluation [ August 26 – September 2] ~~~~~~~~~~~~~~~~~~~~~~~


Progress Report

  • Write, publish and share blogs
  • Stay online on all public/private chat platforms during working hours and even non-working hours
  • Write bi-weekly and, whenever and wherever possible, daily blog posts
  • Submit a Project Presentation

Where do you plan to publish your source code?

  • Separate branch on github and uploading code to the forked repository almost on a daily basis
  • Creating and merging pull requests when a feature is completed.

Communication on task

  • Github for creating subtasks and for managing bugs and bug fixes.
  • Public platforms like Github to communicate the status or progress report(s) of the project

About Me

Personal background

Currently, I am a Pre-Final Year Undergraduate at Indian Institute of Technology (ISM) Dhanbad pursuing BTech in Environmental Engineering as my major and Computer Science & Engineering as my Minor. I am passionately inclined to statistics with the skill of quick learning and find eternal excitement in data handling.

How did you hear about this program?

I didn't. I read about GSoC when I came to know about the prestigious program by following pointers on a public Q&A forum one day. And it has been a dream since then to participate in it and get selected.

Have you submitted your GSoC Proposal?


Do you have any questions about the project? If any, what all?

Yes. While working on the assigned microtasks, I found out that using the MediaWiki API sends a 'Maximum retries using the API' error code after every 15-20th reload/use by the prototype. Then it takes a while until the API allows further extraction of data. What can we do about this?

Do you know Hungarian? What are you doing about it?

No. But I have started learning it alongside(in my leisure hours) as I believe a personal eye-opening experience will certainly help in improvisations in many areas in building the tool and it's features.

What excites you about this project?

The Data. The Statistics. Smart approach like Funnel Analysis. The initiative to make a change.

Have you communicated with your mentors?


What does this project mean to you?


Time during Summer

Completely Available. Exams end in April. Not committed. To any extra commitments at all this summer. Therefore I'll be able to give 50 hours or more per week for the whole internship period.

Eligible for Google Summer of Code and Outreachy ?

Applying only for GSoC.

Past Experience

Made a project implementing blockchain technology during a 36-Hour challenge

Recently participated in a HackFest and developed this project within 36 hours based on blockchain with my team
Project link and detailed description: Metrobiki

Future Commitment(s) to the project | After the end of the internship

Reconfigure the set of reports available on the portal frequently and more importantly regularly.

Event Timeline

Tgr renamed this task from Build statistics toolset to support WM-HU editor retention grant to GSoC 2019 Proposal: Build statistics toolset to support WM-HU editor retention grant.Apr 11 2019, 5:30 PM

Dear Hsync7, thank you for your application and for expressing your interest in helping us out in the project.
I received your email as well, sorry that I have not answered until now.

Your application will be evaluated together with the others in the following days. I wish you the best! :)

Dear @Samat,
Thank you for considering my application. And for recognizing my interest in the project.

No problem about the reply of the email.

Thank you for your best wishes!

(look for next steps in the email you'll receive shortly with an option to request for a debrief on why your proposal was not accepted)

(look for next steps in the email you'll receive shortly with an option to request for a debrief on why your proposal was not accepted)

I have seen the mail that I received. I have sent another mail in response to the received mail.

I am eagerly waiting for the reply.


I would still do the project outside of GSoC. Please enlighten me with the further steps to proceed with.