Page MenuHomePhabricator

patrol.py depends on mwlib.uparser not available on wmflabs
Closed, ResolvedPublic

Description

Author: pdsanchez

Description:
on wmflabs I ran python patrol.py and it complained on import mwlib.uparser
since it is no longer included.

Therefore, script can't work anymore

$ python patrol.py
Traceback (most recent call last):                                                   
>>> Historia del fútbol de Jalisco <<<                                                
File "patrol.py", line 21, in <module>
  import mwlib.uparser  # used to parse the whitelist

Version: compat-(1.0)
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=72206

Details

Reference
bz69980

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:40 AM
bzimport set Reference to bz69980.
bzimport added a subscriber: Unknown Object (????).

Did it work previously? i.e. has the python package mwlib been uninstalled recently?

Are you able to create a virtualenv on wmflabs and install mwlib in it?

$ [your_ve/bin/]pip install mwlib

(see https://www.digitalocean.com/community/tutorials/common-python-tools-using-virtualenv-installing-with-pip-and-managing-packages for example)

fwiw, in case it is a version issue, #mwlib 0.15.14 works for me.

import mwlib._version
mwlib._version.display_version

'0.15.14'

import mwlib.uparser
mwlib.uparser.parseString(raw='== Foo ==',title='Foo')

Article->'Foo'

jayvdb set Security to None.
jayvdb moved this task from Backlog to Wikimedia prod/Cloud Services issues on the Pywikibot board.
jayvdb removed a subscriber: Unknown Object (????).

Using mwparserfromhell is not that complicated to include (but unfortunately I don't use compat so I wouldn't be able to test it). Instead of process_children the complete text is parsed and then filtered for links using [[http://mwparserfromhell.readthedocs.org/en/latest/api/mwparserfromhell.html#mwparserfromhell.wikicode.Wikicode.filter_wikilinks|filter_wikilinks()]] (afaik it's not important for patrol.py that the link is a children of something). So in line 225 and 226 it's:

for link in mwparserfromhell.parse(wikitext).filter_wikilinks():
    process_node(link, None)

Then process_node need to use pywikibot's Link (again I don't know how it works in compat) to parse the text from link.title to determine namespace and page title. This would also interpret interwiki links correctly. So if in process_node it sets obj = pywikibot.Link(obj.title, self.site) it's obj.title instead of obj.target. The ifs in lines 171 to 173 could be removed as well as the complete else block and process_children.

If you're talking about installing https://github.com/pediapress/mwlib on toollabs, I could get that done (if this is still required).

It's unclear to me what the purpose of this task is. Is it addressed to the Pywikibot team to replace one dependency with another, more common dependency that is already installed in Tools? Or is it addressed to the Toolforge team to install a missing dependency?

I suggest to (close this task as declined and create a new one|repurpose this task) for converting patrol.py to mwparserfromhell.
Not good to introduce a new, unnecessary, unmaintained, dependency.

Main result from T98712: Package mwlib on tool labs is that it's probably easier if you use a virtualenv for this; building a system package is nontrivial.

To workaround this:

$ pip install --user mwlib

Thanks @jayvdb, to assist after some failed attempts. I am told

  1. Create a virtualenv as per https://wikitech.wikimedia.org/wiki/Help:Tool_Labs ; then
  2. pip install mwlib (which takes a while)

When running the command from inside the (env)

python pwb.py patrol.py -lang:xx -family:wikixxxx -whitelist:"ns and filename of whitelist"

it returns the error

ImportError: No module named requests
Python module requests is required.
Try running 'pip install requests'

Installed "requests", and then I was able to get things running. :-) I did see an "InsecurePlatformWarning" ... A True SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail (https://urllib3.readthedocs.org/en/latest/security.html)

I am not sure whether that is a toolslabs issue, a python version issue, or a pywikibot issue, or an mwlib issue.

By the way @jyavdb, when you had it as a perpetual job at enWS, what was your process for having that? cron job that restarted?

@Billinghurst, in your virtualenv, run $ pip install -r requests-requirements.txt

There is an undocumented argument -repeat which would keep it running, re-running every 60 seconds.

@jayvdb no such file for requests-requirements.txt, also tried without success requirements.txt

I know not enough about pip to know where the hell I am looking for file existence

jayvdb assigned this task to XZise.

Problem still exists in compat, but core has now been fixed so it doesnt depend on mwlib. Instead it now depends on mwparserfromhell which is installed on tool labs.