mwpfh and parsoid API could be used instead of mwlib to parse the patrol whitelist.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
[FIX] patrol: Replace mwlib with mwparserfromhell | pywikibot/core | master | +67 -54 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Invalid | None | T72936 Important tasks to be solved (tracking) | |||
Resolved | Xqt | T60053 Pywikibot Python 3 compatibility (tracking) | |||
Resolved | Xqt | T75704 Python 3 library support | |||
Resolved | XZise | T71980 patrol.py depends on mwlib.uparser not available on wmflabs | |||
Resolved | XZise | T95142 patrol.py: implement alternative to mwlib |
Event Timeline
An alternative is to use JSON to store the whitelist (T95143), which would be an extension of Load the settings from wiki.
Well yesterday I've worked on porting it to mwparserfromhell after our Python 3 build failed because of mwlib. It's untested but should be pretty usable. But maybe the settings thing is better especially as it will allow patrolling user pages (currently the script would assume that it's a “key”).
For some time I've been thinking of taking advantage of the Parsoid API for such tasks.
That said, mwparserfromhell is still more stable, simpler and does not require server-side code.
Change 202011 had a related patch set uploaded (by XZise):
[FIX] patrol: Use mwparserfromhell
Might also want to consider use of restbase when dealing with wikimedia wikis at least
- How would you use RESTBase (I guess that is what you are talking about?)
- And would other wikis also need to use RESTBase so that they can use patrol.py?
- The same way you would use Parsoid, but with slightly different URLs.
- As with a direct Parsoid request, it'd rely on extra server-side code. You'd still need to fall back to something else for wikis which don't run it.
I think there is a misunderstanding. As I understand it Parsoid is used to convert wikitext into HTML and back. But pywikibot works almost entirely on wikitext so there is no need for Parsoid (or RESTBase) to convert it into HTML. At the moment it is just reading the wikitext (and mwparserfromhell/mwlib are parsing that wikitext) and @jayvdb suggested that someone actually writes JSON content into the page (which would act like wikitext and can be parsed by Python without external libraries). Using HTML would have similar problems like mwpfh/mwlib as we'd need to parse the HTML which could in fact be more problematic.
OK. Just making sure that was considered, assuming someone was still thinking about Parsoid :)
OH! Did forgot that @Ricordisamoa suggested that hmm. Okay then your comment makes sense.
Another reason to remove the dependency on mwlib: it has a nasty list of dependencies, which is both long and has some very restrictive version requirements:
"pyparsing>=1.4.11,<1.6", "timelib>=0.2", "bottle>=0.10", "pyPdf>=1.12", "apipkg>=1.2", "qserve>=0.2.7", "lxml", "py>=1.4", "sqlite3dbm", "simplejson>=2.3", "roman", "gevent", "odfpy>=0.9, <0.10", "Pillow", "setuptools"
https://github.com/pediapress/mwlib/blob/master/setup.py#L44
The worst aspect is that it requires "pyparsing>=1.4.11,<1.6", which prevents using the most recent version of pyparsing, https://pypi.python.org/pypi/pyparsing/2.0.3
The requirement for "odfpy>=0.9, <0.10" is not as bad, as the latest version is https://pypi.python.org/pypi/odfpy/0.9.6
Change 202011 merged by jenkins-bot:
[FIX] patrol: Replace mwlib with mwparserfromhell
I close it for now but would be interested in the whitelists of others. I only compared it with the result from the example mentioned in the doc string.