Page MenuHomePhabricator

generate_family_file.py crashes for a private wiki
Open, MediumPublic

Description

If a MediaWiki site is private, requiring a login in order to read articles, then generate_family_file.py dies when trying to create a family file for that site.

$ ./generate_family_file.py https://privatewiki.example.com/wiki/Main_Page test
Generating family file from https://privatewiki.example.com/wiki/Main_Page
Traceback (most recent call last):
  File "./generate_family_file.py", line 321, in <module>
    FamilyFileGenerator(*sys.argv[1:]).run()
  File "./generate_family_file.py", line 94, in run
    w = Wiki(self.base_url)
  File "./generate_family_file.py", line 262, in __init__
    self._parse_post_117(wp, fromurl)
  File "./generate_family_file.py", line 293, in _parse_post_117
    info = json.loads(data.read().decode(data.charset))['query']['general']
KeyError: u'query'

Version info:

Pywikibot: [https] r-pywikibot-core.git (fd4707e, g2, 2016/12/05, 19:10:45, n/a)
Release version: 2.0rc5
httplib2 version: 0.9.1
  cacerts: /home/smith/pywikibot/core_stable/externals/httplib2/python2/httplib2/cacerts.txt
    certificate test: ok
Python: 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609]
  unicode test: ok
PYWIKIBOT2_DIR: Not set
PYWIKIBOT2_DIR_PWB: .
PYWIKIBOT2_NO_USER_CONFIG: Not set

Event Timeline

@maiden_taiwan I see that You're testing today a lot with 2.0rc5 release. Could You test it also on tip of master branch?

I don't want to be pessimistic but there is not much work done in 2.0 branch and there is ongoing discusion about just sticking with tip o branch instead of dedicating time to 2.0 branch (see: T106121). The sooner You swich to master, the faster You will get help and You issue wil be resolved.

@Magul - Thanks, I was not aware that I was using an outdated branch. I just followed the instructions in your official docs at https://www.mediawiki.org/wiki/Manual:Pywikibot/Installation.

What is the URL for git cloning your master branch? I am currently using:

https://gerrit.wikimedia.org/r/pywikibot/core.git

Primary repository is available here: ssh://gerrit.wikimedia.org:29418/pywikibot/core

If it will not work (there some issue with broken connection), we have (semi-)official mirror here: https://github.com/wikimedia/pywikibot-core

Here's the output of generate_family_file.py on a private wiki using your primary repository:

$ ./generate_family_file.py https://privatewiki.example.com/wiki/Main_Page test
Generating family file from https://privatewiki.example.com/wiki/Main_Page
Traceback (most recent call last):
  File "./generate_family_file.py", line 211, in <module>
    FamilyFileGenerator(*sys.argv[1:]).run()
  File "./generate_family_file.py", line 67, in run
    self.wikis[w.lang] = w
AttributeError: 'MWSite' object has no attribute 'lang'
<type 'exceptions.AttributeError'>
CRITICAL: Closing network session. 

Version info:

Pywikibot: [https] wikimedia-pywikibot-core (0781baa, g7716, 2016/12/21, 08:04:30, n/a)
Release version: 3.0-dev
requests version: 2.9.1
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609]
PYWIKIBOT2_DIR: Not set
PYWIKIBOT2_DIR_PWB: .
PYWIKIBOT2_NO_USER_CONFIG: Not set

Looks like the tip doesn't work for logging into private wikis either. I'll file a bug report.

Current implementation: The dynamic site loader, site_detect.py, cannot determine some of the site information without logging in, and quits reading intentionally. This information includes the site language, and this is why AttributeError: 'MWSite' object has no attribute 'lang'.

Suggestion: Allow the user to provide the missing site information, either on the command line (using options), in a configuration file, or during script execution if the script can prompt the user interactively.

Hapening to me just today. Branch master. See merged task for details. Latest version.

The error behavior has changed in the current version of pywikibot. The error is now RuntimeError: Unable to determine articlepath.

$ ./generate_family_file.py https://privatewiki.example.com/ test
Generating family file from https://privatewiki.example.com/
Traceback (most recent call last):
File "./generate_family_file.py", line 211, in <module>
FamilyFileGenerator(*sys.argv[1:]).run()
File "./generate_family_file.py", line 66, in run
w = Wiki(self.base_url)
File "<path>/pywikibot/site_detect.py", line 108, in __init__
'{0}'.format(self.fromurl))
RuntimeError: Unable to determine articlepath: https://privatewiki.example.com/notes/Home
<type 'exceptions.RuntimeError'>
CRITICAL: Closing network session.

$ ./pwb.py version.py
Pywikibot: [https] wikimedia-pywikibot-core (e2ee16d, g8356, 2017/06/26, 08:48:11, n/a)
Release version: 3.0-dev
requests version: 2.9.1
  cacerts: /etc/ssl/certs/ca-certificates.crt
    certificate test: ok
Python: 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609]
PYWIKIBOT2_DIR: Not set
PYWIKIBOT2_DIR_PWB: .
PYWIKIBOT2_NO_USER_CONFIG: Not set

Change 423234 had a related patch set uploaded (by Dalba; owner: Dalba):
[pywikibot/core@master] Implement a workaround to generate family file for private wikis

https://gerrit.wikimedia.org/r/423234

Change 423234 merged by jenkins-bot:
[pywikibot/core@master] Implement a workaround to generate family file for private wikis

https://gerrit.wikimedia.org/r/423234

Looks like this is still happening.

Example: generate_family_file.py https://www.mywiki.bogus/wiki/Main_Page mywiki
This will create the file mywiki_family.py in pywikibot\families
Please insert a short name (eg: freeciv): localwiki
Generating family file from http://<url>
Private wiki detected. Login is required.
Please enter your username? <username>
Traceback (most recent call last):
  File "C:\rewrite\trunk\pwb.py", line 257, in <module>
    if not main():
  File "C:\rewrite\trunk\pwb.py", line 250, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "C:\rewrite\trunk\pwb.py", line 119, in run_python_file
    main_mod.__dict__)
  File ".\generate_family_file.py", line 226, in <module>
    FamilyFileGenerator(*sys.argv[1:]).run()
  File ".\generate_family_file.py", line 49, in run
    self.wikis[w.lang] = w
AttributeError: 'MWSite' object has no attribute 'lang'
<type 'exceptions.AttributeError'>
CRITICAL: Closing network session.
pwb.py version
Pywikibot: pywikibot.git (3304dd6, s10669, 2018/11/03, 15:26:40, ok)
Release version: 3.1.dev0
requests version: 2.18.4
  cacerts: C:\Python27\lib\site-packages\certifi\cacert.pem
    certificate test: ok
Python: 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)]
PYWIKIBOT_DIR: Not set
PYWIKIBOT_DIR_PWB: C:\rewrite\trunk
PYWIKIBOT_NO_USER_CONFIG: 2
Config base dir: C:\rewrite\trunk
Dalba removed Dalba as the assignee of this task.Nov 4 2018, 2:58 AM
Dalba subscribed.

@Urbanecm Could you create a test private wiki and try to reproduce @Betacommand issue?

Just find a random private wiki:

tools.zhuyifei1999-test@tools-bastion-02:~$ PYWIKIBOT_NO_USER_CONFIG=1 python /shared/pywikipedia/core/pwb.py /shared/pywikipedia/core/generate_family_file.py https://advisors.wikimedia.org/wiki/Main_Page a_private_wiki
Skipping loading of user-config.py.
family and mylang are not set.
Defaulting to family='test' and mylang='test'.
Generating family file from https://advisors.wikimedia.org/wiki/Main_Page
Private wiki detected. Login is required.
Please enter your username? Meh this user doesn't exist
Traceback (most recent call last):
  File "/shared/pywikipedia/core/pwb.py", line 257, in <module>
    if not main():
  File "/shared/pywikipedia/core/pwb.py", line 250, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "/shared/pywikipedia/core/pwb.py", line 119, in run_python_file
    main_mod.__dict__)
  File "/shared/pywikipedia/core/generate_family_file.py", line 226, in <module>
    FamilyFileGenerator(*sys.argv[1:]).run()
  File "/shared/pywikipedia/core/generate_family_file.py", line 49, in run
    self.wikis[w.lang] = w
AttributeError: 'MWSite' object has no attribute 'lang'
<type 'exceptions.AttributeError'>
CRITICAL: Closing network session.