Page MenuHomePhabricator

Proposal: A semi-automatic find and replace webapp
Closed, DeclinedPublic

Assigned To
None
Authored By
Sitic
Mar 4 2015, 10:02 AM
Referenced Files
F106128: pasted_file
Mar 27 2015, 6:08 PM
F106132: pasted_file
Mar 27 2015, 6:08 PM
F106126: pasted_file
Mar 27 2015, 6:08 PM
F106118: pasted_file
Mar 27 2015, 6:08 PM
F106124: pasted_file
Mar 27 2015, 6:08 PM
F106120: pasted_file
Mar 27 2015, 6:08 PM
F106130: pasted_file
Mar 27 2015, 6:08 PM
F106134: pasted_file
Mar 27 2015, 6:08 PM

Description

In the last years I've seen many bot requests where a relative simple find-and-replace type of edit had to be applied to large number of wikipages. Some common examples:

  • wikilink changes when redirects are not an option
  • template changes where existing transclusions have to be corrected
  • url fixes when a URL scheme for well-used website changes (prime example for substitution regex search and replace)
  • spelling fixes (highly contextual, has to be semi-automatic) [not really in the focus of the proposal, there are better ways]

As a Google Summer of Code project, I would like to build a modern, simple to use, webapp which allows users to "find and replace" over a large number of wikipages in a semi-automatic fashion. It would use websockets (sockjs) for asynchronous communication (search results and edit requests) and OAuth for authentication.

I've made a prototype (nothing more than the form currently), which probably explains the idea the best: tools.wmflabs.org/find-and-replace.

I'll add a more detailed proposal later (e.g. how local projects should be able to configure find-and-replace, as users for AWB have to be whitelisted on enwiki, but not on other wikis).

Existing alternatives

Existing tools which provide such functionality are Extension:Replace Text and the AutoWikiBrowser. The former is unlikely to be actived on WMF projects due to performance constraints, the latter has a bigger scope and requires users to install software.
A webapp version of AWB has long been wished, the goal of this proposal is not to replace AWB, but rather to go a first step and provide a simple to use find and replace tool for users who shy away from installing such tools like AWB.

Implementation

I've spend some time researching suitable libraries, one main objective is that it fits well into the tools environment and community. I'm now at a point where I would say the question which libraries to choose is mostly solved, the major ones include:

  • angularjs for the frontend/client-side rendering
  • angular-translate for i18n support (should integrate well with Translatewiki)
  • python's Tornado web server with sockjs support, flask-mwoauth for Oauth negotiation (test setup for this combination works on tools)
  • celery as task manager with dedicated worker processes, Redis as message broker
  • pywikibot for the MediaWiki API (currently no OAuth support, see T74065: Pywikibot: Implement support for OAuth. Probably best to hack support in using mwoauth and then replace it when OAuth is officially supported by pywikibot)

About me

I'm a physics student at the University of Göttingen and mostly active in the german wikipedia since 2010. I run a pywikibot based bot there named AsuraBot. While I've programmed in various languages, writing a webapp is something new for me.

I'm looking for possible mentors, if you're interested please ping me.

Event Timeline

Sitic raised the priority of this task from to Needs Triage.
Sitic updated the task description. (Show Details)
Sitic added a project: Possible-Tech-Projects.
Sitic subscribed.

@Reedy @Magioladitis as you both are AWB developers, I would love to hear your thoughts on this.

@Addshore This sounds a lot like what you're doing for your Dissertation...

Yep, I'm developing an extension which will do exactly this and more!

The basic implementation allows you to select a type of target (for example wikipage) and then filter those targets (for example by namespace and title / size etc.) and then perform an action or series of actions on them (append, replace, delete, etc)

@Addshore, would you consider the possibility of mentoring a GSoC student to distribute the work, if there is enough work for two people here?

@Addshore, would you consider the possibility of mentoring a GSoC student to distribute the work, if there is enough work for two people here?

I suspect there might be (the possibilities are endless?), but he wouldn't be able to open source his work till after his dissertation has been completed, submitted, marked and returned. And even then, technically, the work is still "owned" by his university... So he might not actually be allowed to open source it (rare, but is possible). I know he licensed it O/S from the beginning, but it could be a little bit of a grey area

I guess this partially hinges on how development continued - if it was planned to work on Addshores version, or to write something separate, possibly with the intention of integrating it eventually.

This sounds similar to https://commons.wikimedia.org/wiki/Help:VisualFileChange.js . As a microtask I'd suggest to fix some bug or enhancement request for that.

And even then, technically, the work is still "owned" by his university...

This is unlikely to be true in continental Europe and is certainly false in Italy.

Right, I'm currently in the process of writing my report for my 'MassAction' extension using the jobqueue.
Here are some mocks and diagrams that should give you all a clearer idea of what it is / what it is trying to be before I can actually release the code..

Viewing of a task that has been created:

pasted_file (519×453 px, 49 KB)

Creating of a new task:
pasted_file (521×413 px, 29 KB)

Example filters:
pasted_file (301×217 px, 14 KB)

pasted_file (91×261 px, 5 KB)

Example actions:
pasted_file (179×323 px, 6 KB)

pasted_file (106×261 px, 6 KB)

Abstract classes:
pasted_file (379×558 px, 89 KB)

Current DB:
pasted_file (255×558 px, 46 KB)

States of a task:
pasted_file (183×558 px, 40 KB)

All questions welcome!

So the code for what I described above is now available on gerrit.
https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/MassAction

It also now has a project on Phabricator where I have added a general list of things I think need to be done.
https://phabricator.wikimedia.org/project/profile/1150/

And of course an extension page on mediawiki.org
https://www.mediawiki.org/wiki/Extension:MassAction

This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Outreachy-Round-11 is around the corner. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

This is a message sent to all Possible-Tech-Projects. The new round of Wikimedia Individual Engagement Grants is open until 29 Sep. For the first time, technical projects are within scope, thanks to the feedback received at Wikimania 2015, before, and after (T105414). If someone is interested in obtaining funds to push this task, this might be a good way.

IMPORTANT: This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Wikimedia has been accepted as a mentor organization for GSoC '16. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

Declining this task as the proposal didn't receive interest from potential mentors as requested.