Page MenuHomePhabricator

Proposal: A semi-automatic find and replace webapp
Open, LowPublic

Description

In the last years I've seen many bot requests where a relative simple find-and-replace type of edit had to be applied to large number of wikipages. Some common examples:

  • wikilink changes when redirects are not an option
  • template changes where existing transclusions have to be corrected
  • url fixes when a URL scheme for well-used website changes (prime example for substitution regex search and replace)
  • spelling fixes (highly contextual, has to be semi-automatic) [not really in the focus of the proposal, there are better ways]

As a Google Summer of Code project, I would like to build a modern, simple to use, webapp which allows users to "find and replace" over a large number of wikipages in a semi-automatic fashion. It would use websockets (sockjs) for asynchronous communication (search results and edit requests) and OAuth for authentication.

I've made a prototype (nothing more than the form currently), which probably explains the idea the best: tools.wmflabs.org/find-and-replace.

I'll add a more detailed proposal later (e.g. how local projects should be able to configure find-and-replace, as users for AWB have to be whitelisted on enwiki, but not on other wikis).

Existing alternatives

Existing tools which provide such functionality are Extension:Replace Text and the AutoWikiBrowser. The former is unlikely to be actived on WMF projects due to performance constraints, the latter has a bigger scope and requires users to install software.
A webapp version of AWB has long been wished, the goal of this proposal is not to replace AWB, but rather to go a first step and provide a simple to use find and replace tool for users who shy away from installing such tools like AWB.

Implementation

I've spend some time researching suitable libraries, one main objective is that it fits well into the tools environment and community. I'm now at a point where I would say the question which libraries to choose is mostly solved, the major ones include:

  • angularjs for the frontend/client-side rendering
  • angular-translate for i18n support (should integrate well with Translatewiki)
  • python's Tornado web server with sockjs support, flask-mwoauth for Oauth negotiation (test setup for this combination works on tools)
  • celery as task manager with dedicated worker processes, Redis as message broker
  • pywikibot for the MediaWiki API (currently no OAuth support, see T74065: Pywikibot: Implement support for OAuth. Probably best to hack support in using mwoauth and then replace it when OAuth is officially supported by pywikibot)

About me

I'm a physics student at the University of Göttingen and mostly active in the german wikipedia since 2010. I run a pywikibot based bot there named AsuraBot. While I've programmed in various languages, writing a webapp is something new for me.

I'm looking for possible mentors, if you're interested please ping me.

Event Timeline

Sitic created this task.Mar 4 2015, 10:02 AM
Sitic updated the task description. (Show Details)
Sitic raised the priority of this task from to Needs Triage.
Sitic added a project: Possible-Tech-Projects.
Sitic added a subscriber: Sitic.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 4 2015, 10:02 AM

@Reedy @Magioladitis as you both are AWB developers, I would love to hear your thoughts on this.

@Addshore This sounds a lot like what you're doing for your Dissertation...

Addshore added a comment.EditedMar 7 2015, 2:59 PM

Yep, I'm developing an extension which will do exactly this and more!

The basic implementation allows you to select a type of target (for example wikipage) and then filter those targets (for example by namespace and title / size etc.) and then perform an action or series of actions on them (append, replace, delete, etc)

Qgil added a subscriber: Qgil.Mar 7 2015, 3:23 PM

@Addshore, would you consider the possibility of mentoring a GSoC student to distribute the work, if there is enough work for two people here?

Reedy added a comment.Mar 7 2015, 7:11 PM

@Addshore, would you consider the possibility of mentoring a GSoC student to distribute the work, if there is enough work for two people here?

I suspect there might be (the possibilities are endless?), but he wouldn't be able to open source his work till after his dissertation has been completed, submitted, marked and returned. And even then, technically, the work is still "owned" by his university... So he might not actually be allowed to open source it (rare, but is possible). I know he licensed it O/S from the beginning, but it could be a little bit of a grey area

I guess this partially hinges on how development continued - if it was planned to work on Addshores version, or to write something separate, possibly with the intention of integrating it eventually.

This sounds similar to https://commons.wikimedia.org/wiki/Help:VisualFileChange.js . As a microtask I'd suggest to fix some bug or enhancement request for that.

And even then, technically, the work is still "owned" by his university...

This is unlikely to be true in continental Europe and is certainly false in Italy.

Right, I'm currently in the process of writing my report for my 'MassAction' extension using the jobqueue.
Here are some mocks and diagrams that should give you all a clearer idea of what it is / what it is trying to be before I can actually release the code..

Viewing of a task that has been created:


Creating of a new task:

Example filters:


Example actions:


Abstract classes:

Current DB:

States of a task:

All questions welcome!

So the code for what I described above is now available on gerrit.
https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/MassAction

It also now has a project on Phabricator where I have added a general list of things I think need to be done.
https://phabricator.wikimedia.org/project/profile/1150/

And of course an extension page on mediawiki.org
https://www.mediawiki.org/wiki/Extension:MassAction

Aklapper triaged this task as Low priority.Jul 25 2015, 12:43 PM
Qgil added a comment.Sep 23 2015, 9:06 AM

This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

Qgil added a comment.Sep 23 2015, 9:35 AM

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

Sumit added a subscriber: Sumit.Mar 1 2016, 5:37 PM
IMPORTANT: This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Wikimedia has been accepted as a mentor organization for GSoC '16. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.
Niharika removed a subscriber: Niharika.Mar 2 2016, 6:59 PM
srishakatux added a subscriber: srishakatux.

Adding Outreach-Programs-Projects and removing Possible-Tech-Projects as we are planning on killing the later soon!