Page MenuHomePhabricator

Move wiki talk page archiving into production-grade extension
Open, Needs TriagePublic

Description

Currently there are a number of bots and gadgets that archive wiki talk page sections according to various criteria or user command; cf. for example https://en.wikipedia.org/wiki/User:Equazcion/OneClickArchiver, Pywikibot-archivebot.py, https://de.wikipedia.org/wiki/Benutzer:SpBot or https://de.wikipedia.org/wiki/Benutzer:TaxonBot. They are developed and maintained by volunteers, run on private machines or Toolforge, their UI is usually not localized, their UI is "different" from the usual MediaWiki experience, JavaScript is copied and pasted, scripts are only tested on the developer's browser, etc.

In addition, there are many of them, with a high probability that some have features and bugs that others don't have, not to speak of the duplicated effort, and some tools' source code is not easy to find.

Instead, archiving wiki talk page sections should be a first-class citizen of MediaWiki in the form of an extension deployed to the WMF cluster. As fodder for bikeshedding, I propose the name Extension:Archivist.

This extension should:

  1. Archive wiki talk page sections on user command like https://en.wikipedia.org/wiki/User:Equazcion/OneClickArchiver, i. e. add a button "[ archive ]" to section titles that archives the section to the talk page's archive.
  2. Archive wiki talk page sections that are old enough to the talk page's archive (cf. https://de.wikipedia.org/wiki/Vorlage:Autoarchiv).
  3. Archive wiki talk page sections that have been marked as "done" and are old enough to the talk page's archive (cf. https://de.wikipedia.org/wiki/Vorlage:Autoarchiv-Erledigt).

For implementation I propose the extension to provide a parser function(s)/tag element(s) that define a wiki talk page's "archive settings", for example the target page where a section is to be moved to, age of a section, etc. On page save, these settings would be stored as page properties. Existing templates would be amended to call this parser function(s)/tag element(s), enabling a seamless migration.

In addition, on page save for pages that have archive settings, the page would be parsed according to the criteria defined by the archive settings to determine when the next archiving needs to occur. For example, if on a page there is a section where the last signature is x days old and the archive settings request sections to be archived when they are y days old, archiving should be scheduled for CURRENT_DATE + y - x. This timestamp is added as a page property. (With a different job queue that allowed time scheduling, on page save the archiving could directly be scheduled for that time.)

A maintenance script is called regularly by a cron job that iterates over all pages that have archiving scheduled for a time in the (now) past and schedules the real archiving job for those pages. This job then parses the talk page again and moves the section that fit the archive settings to the archive page.

A point of focus should probably be set on the transaction-like quality of archiving a section. In the event of an edit conflict, the existing bots and scripts have to do their best to roll back either the section added to the archive page or the section deleted from the talk page, without any guarantee that this will cleanly succeed. Therefore it would be nice to combine those two edits in one atomic transaction. For the "[ archive ]" gadget, it would be useful if this functionality is exposed via the API so that the gadget can just relay the archive command to MediaWiki and afterwards trigger a reload of the page, making use of the safety of the transaction and also eliminating further duplication of code.

In the same way it would be useful for the gadget to be able to query the archive settings and not to have to parse the page itself by exposing the page properties in a fashion similar to T131911 or T154738.

Event Timeline

Would this be suitable for GSoC or Outreachy? We are currently recruiting projects and mentors for May-Aug 2017. @scfc: would you be interested in mentoring? Please add a Outreach-Programs-Projects tag to the projects you can mentor

I cannot mentor this project and cannot assess what chunk sizes would be reasonable for new developers to tackle.

srishakatux subscribed.

Removing the Possible-Tech-Projects tag as we are planning to kill it soon! This project does not seem to fit in the Outreach-Programs-Projects category in its current state, so I am not adding that tag right now!