[[https://wikimedia.hu/wiki/Home?uselang=en|Wikimedia Hungary]] is kicking of a one-year project to improve editor retention in the Hungarian Wikipedia (supported by a WMF grant). [[https://meta.wikimedia.org/wiki/Grants:Project/WM_HU/Editor_retention_program|See details here.]] Doing this effectively will require much more detailed statistics than currently available. Build a contributor statistics portal for the Hungarian Wikipedia that displays statistics and editor lists relevant for the program.
== Functional requirements ==
The portal should provide the following information:
* a "[[https://en.wikipedia.org/wiki/Funnel_analysis|funnel]]" view of the Hungarian Wikipedia community: given a number of user categories like users with 1-10 edits, users with 100+ edits in the last 30 days, administrators, users who did more than 10 reviews in the last 30 days etc.
** show the size of each group
** show the transitions between the groups (the number of editors who moved from one group to another in some given time frame)
** (stretch goal) show historic trends for these groups
* (stretch goal) where it makes sense, the same statistics with the number of edits instead of the number of editors (number of edits coming from editors with 1-10 edits etc).
* lists of editors of editors who are potential targets for intervention: transitioning from one group to another (e.g. recently registered editors; editors who have recently stopped participating), made some achievement and no one followed up yet (e.g. recently registered editors who have not been welcomed yet, recently reached their 1000th edit and not congratulated yet), had some negative interaction (e.g. first edit reverted)
** (stretch goal) annotate lists with data pulled in from other sources (such as the [[https://www.mediawiki.org/wiki/ORES|ORES]] edit scoring service, or the [[https://www.mediawiki.org/wiki/Extension:FlaggedRevs#API|review API]]) to identify users who are special in some way (e.g. well-intentioned but struggling with wiki syntax, or stuck in the review queue)
** an API to expose this information in a machine-readable way
** export in whatever format is convenient to the people who will follow up on these lists (e.g. wikitable or CSV)
* top lists of editors who perform a certain task (e.g. administrative actions, edit reviews, template edits) plus ratio of the total amount of tasks they perform
** an API to expose this information in a machine-readable way
** export in whatever format is convenient to the people who will follow up on these lists (e.g. wikitable or CSV)
* where it makes sense, support filtering / splitting results on manually provided username lists (this will be used to assess the effectiveness of interventions)
* (stretch goal) a registration cohort view of the editor community: grouping users by the year or month they started editing,
** show the relative size of each group
** show historic trends for these groups
** show retention rate over time (ie. how many of the editors registered in year X are still active in year Y)
** some combination of this with the groups from the funnel (
== Architecture requirements ==
* The portal is to be hosted on [[https://wikitech.wikimedia.org/wiki/Portal:Toolforge|Toolforge]] (Wikimedia's platform-as-a-service). It should use the data from the [[https://wikitech.wikimedia.org/wiki/Portal:Data_Services#Wiki_Replicas|replica of the wiki database]] and cache the results (and probably prime the cache with a periodic job; depends on how expensive the queries turn out to be.)
** (Stretch goal: use data from the [[https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits|edit history reconstruction project]]. This contains information about the context of editor actions (e.g. how many edits did the user have when they made the given edit) but is currently not publicly available and is hosted on a dedicated set of servers, so you'll need to create a job that runs there, extracts the relevant information and makes it available for the portal. Most of the features above do not require this.)
* The exact set of reports available on the portal will need to be changed frequently even after the end of the internship so it should be written in a flexible way where the building blocks of reports can be easily reconfigured.
* The portal should be written with reuse on other wikis in mind: specifics of the database should be abstracted away to the extent possible, and it should support internationalization (translation, date/number formats etc)
== Applicant requirements ==
* familiarity with PHP (preferred), Python or Node.js
* familiarity with SQL
* familarity with MediaWiki's API and/or database schema is a plus but not required
* familiarity with Hungarian Wikipedia and being able to speak Hungarian are a plus but not required