Page MenuHomePhabricator

patrol.py: implement alternative to mwlib
Closed, ResolvedPublic

Description

mwpfh and parsoid API could be used instead of mwlib to parse the patrol whitelist.

Event Timeline

jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot.
jayvdb added subscribers: jayvdb, Ricordisamoa, XZise, Unknown Object (MLST).

An alternative is to use JSON to store the whitelist (T95143), which would be an extension of Load the settings from wiki.

Well yesterday I've worked on porting it to mwparserfromhell after our Python 3 build failed because of mwlib. It's untested but should be pretty usable. But maybe the settings thing is better especially as it will allow patrolling user pages (currently the script would assume that it's a “key”).

For some time I've been thinking of taking advantage of the Parsoid API for such tasks.
That said, mwparserfromhell is still more stable, simpler and does not require server-side code.

Change 202011 had a related patch set uploaded (by XZise):
[FIX] patrol: Use mwparserfromhell

https://gerrit.wikimedia.org/r/202011

Might also want to consider use of restbase when dealing with wikimedia wikis at least

  1. How would you use RESTBase (I guess that is what you are talking about?)
  2. And would other wikis also need to use RESTBase so that they can use patrol.py?
  1. The same way you would use Parsoid, but with slightly different URLs.
  2. As with a direct Parsoid request, it'd rely on extra server-side code. You'd still need to fall back to something else for wikis which don't run it.

I think there is a misunderstanding. As I understand it Parsoid is used to convert wikitext into HTML and back. But pywikibot works almost entirely on wikitext so there is no need for Parsoid (or RESTBase) to convert it into HTML. At the moment it is just reading the wikitext (and mwparserfromhell/mwlib are parsing that wikitext) and @jayvdb suggested that someone actually writes JSON content into the page (which would act like wikitext and can be parsed by Python without external libraries). Using HTML would have similar problems like mwpfh/mwlib as we'd need to parse the HTML which could in fact be more problematic.

  1. The same way you would use Parsoid, but with slightly different URLs.
  2. As with a direct Parsoid request, it'd rely on extra server-side code. You'd still need to fall back to something else for wikis which don't run it.

Let's not introduce inconsistencies without benefits. We shall use mwparserfromhell.

OK. Just making sure that was considered, assuming someone was still thinking about Parsoid :)

OH! Did forgot that @Ricordisamoa suggested that hmm. Okay then your comment makes sense.

OH! Did forgot that @Ricordisamoa suggested that hmm. Okay then your comment makes sense.

@jayvdb suggested that in the first place.

OK. Just making sure that was considered, assuming someone was still thinking about Parsoid :)

Thanks. I'll consider using it in the future, should the need arise :)

Another reason to remove the dependency on mwlib: it has a nasty list of dependencies, which is both long and has some very restrictive version requirements:

"pyparsing>=1.4.11,<1.6", "timelib>=0.2", "bottle>=0.10", "pyPdf>=1.12", "apipkg>=1.2", "qserve>=0.2.7", "lxml", "py>=1.4", "sqlite3dbm", "simplejson>=2.3", "roman", "gevent", "odfpy>=0.9, <0.10", "Pillow", "setuptools"

https://github.com/pediapress/mwlib/blob/master/setup.py#L44

The worst aspect is that it requires "pyparsing>=1.4.11,<1.6", which prevents using the most recent version of pyparsing, https://pypi.python.org/pypi/pyparsing/2.0.3

The requirement for "odfpy>=0.9, <0.10" is not as bad, as the latest version is https://pypi.python.org/pypi/odfpy/0.9.6

Change 202011 merged by jenkins-bot:
[FIX] patrol: Replace mwlib with mwparserfromhell

https://gerrit.wikimedia.org/r/202011

I close it for now but would be interested in the whitelists of others. I only compared it with the result from the example mentioned in the doc string.