Page MenuHomePhabricator

zh-min-nan wiktionary preloadpages
Closed, ResolvedPublic

Description

zh-min-nan wiktionary returns different names of pages:

I:\py\rewrite>pwb.py interwiki -family:wiktionary -subcats:Gí-giân -cleanup -lang:zh-min-nan -async  -whenneeded:5 -untranslated

NOTE: Number of pages queued is 0, trying to add 50 more.
Retrieving 36 pages from wiktionary:zh-min-nan.
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Bân-lâm-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hôa-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Eng-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Ji?t-gí'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hui-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hoat-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Tek-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Se-pan-gâ-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Phux-tô-gâ-gú'

WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:O?at-lâm-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:A-la-pek-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Ke-te-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Dan-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hun-lân-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:In-nî-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Í-tai-li-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Lo -se-a-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hi-lia?p-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:La-teng-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Pe?h lo -se-a-
gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hi-pek-lâi-gú'

WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Se-kai-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Peng-te-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Pho-lân-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Lâm-hui-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Bông-kó -gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Pho-su-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Thai-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Má-lâi-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Ido-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Hân-gú'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Thó -ní-kî-gú'

WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Mî-iux?-tó-gú'

WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Bân-tang-oe'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Kheh-oe'
WARNING: preloadpages: Query returned unexpected title 'Lui-pia?t:Tagalog-gú'
Dump nan (wiktionary) appended.
Traceback (most recent call last):
  File "I:\py\rewrite\pwb.py", line 222, in <module>
    run_python_file(filename, argv, argvu, file_package)
  File "I:\py\rewrite\pwb.py", line 81, in run_python_file
    main_mod.__dict__)
  File ".\scripts\interwiki.py", line 2647, in <module>
    main()
  File ".\scripts\interwiki.py", line 2622, in main
    bot.run()
  File ".\scripts\interwiki.py", line 2365, in run
    self.queryStep()
  File ".\scripts\interwiki.py", line 2338, in queryStep
    self.oneQuery()
  File ".\scripts\interwiki.py", line 2334, in oneQuery
    subject.batchLoaded(self)
  File ".\scripts\interwiki.py", line 1321, in batchLoaded
    elif page.isRedirectPage() or page.isCategoryRedirect():
  File "I:\py\rewrite\pywikibot\page.py", line 644, in isCategoryRedirect
    for (template, args) in self.templatesWithParams():
  File "I:\py\rewrite\pywikibot\tools.py", line 711, in wrapper
    return obj(*__args, **__kw)
  File "I:\py\rewrite\pywikibot\page.py", line 1869, in templatesWithParams
    templates = textlib.extract_templates_and_params(self.text)
  File "I:\py\rewrite\pywikibot\page.py", line 440, in text
    self._text = self.get(get_redirect=True)
  File "I:\py\rewrite\pywikibot\tools.py", line 711, in wrapper
    return obj(*__args, **__kw)
  File "I:\py\rewrite\pywikibot\page.py", line 349, in get
    self._getInternals(sysop)
  File "I:\py\rewrite\pywikibot\page.py", line 373, in _getInternals
    self.site.loadrevisions(self, getText=True, sysop=sysop)
  File "I:\py\rewrite\pywikibot\site.py", line 3167, in loadrevisions
    % (page, pagedata['title']))
pywikibot.exceptions.Error: loadrevisions: Query on [[zh-min-nan:ňłćÚí×:L┼źi-pia
╠Źt:A-la-pek-g├║]] returned data on 'L┼źi-pia╠Źt:L┼źi-pia╠Źt:A-la-pek-g├║'
<class 'pywikibot.exceptions.Error'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

The exception message was actually encoded as UTF8 but interpreted as cp852 so it looks so ugly. It actually says (when decodec as cp852 and encoded as UTF-8):

Query on [[zh-min-nan:分類:Lūi-pia̍t:A-la-pek-gú]] returned data on 'Lūi-pia̍t:Lūi-pia̍t:A-la-pek-gú'

Event Timeline

JAnD raised the priority of this task from to Unbreak Now!.
JAnD updated the task description. (Show Details)
JAnD added a project: Pywikibot.
JAnD subscribed.

It seems that api returns different name of category namespace than is displayed

For compat it could be solved by adding
self.namespaces[14]['zh-min-nan'] = [u'Lūi-pia̍t', u'分類'] to families/wiktionary_family.py

but for core I don't know, where are namespace names defined

The namespace is dynamically queried, so it does appear in: http://zh-min-nan.wiktionary.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces|namespacealiases

If you search for : 14 you get two results and one contains a result with two characters and other looks like “Lūi-pia̍t”.

Now the stacktrace is hard to decypher but I guess it queries 分類:Lūi-pia̍t:… but it gets a result for Lūi-pia̍t:Lūi-pia̍t:…. What is strange that the namespace name appears twice but it should actually say both are the same, as APISite.sametitle does determine the namespace ID and also supports the aliases.

>>> import pywikibot
>>> s = pywikibot.Site('zh-min-nan', 'wiktionary')
>>> s.sametitle('分類:Lūi-pia̍t:…', 'Lūi-pia̍t:Lūi-pia̍t:…')
True
>>> s.namespaces[14]
Namespace(id=14, custom_name='Lūi-pia̍t', canonical_name='Category', aliases=['分類'], case='case-sensitive')

By the way I was trying to get the revisions for that page manually but I can't determine the page name you are using because of all that gibberish. Maybe someone knows how to convert that into Unicode?

Okay after a bit of trickery I was able to determine that the UTF8 content was encoded as cp852 instead:

>>> 'ňłćÚí×:L┼źi-pia╠Źt:A-la-pek-g├║'.encode('cp852').decode('utf8')
'分類:Lūi-pia̍t:A-la-pek-gú'

And as I thought the API returned a result for the other namespace name: https://zh-min-nan.wiktionary.org/w/api.php?action=query&prop=revisions&titles=%E5%88%86%E9%A1%9E:L%C5%ABi-pia%CC%8Dt:A-la-pek-g%C3%BA (apart from the fact that it says missing)

I don't get your error when I try to get a revisions (and the error I get is correct). When I remove one namespace but still use the namespace alias it does work.

>>> import pywikibot
>>> p = 'ňłćÚí×:L┼źi-pia╠Źt:A-la-pek-g├║'.encode('cp852').decode('utf8')
>>> po = pywikibot.Page(pywikibot.Site('zh-min-nan', 'wiktionary'), p)
>>> po.exists()
False
>>> list(po.revisions(total=1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xzise/Programms/core/pywikibot/page.py", line 1374, in revisions
    step=step, total=total)
  File "/home/xzise/Programms/core/pywikibot/site.py", line 3169, in loadrevisions
    raise NoPage(page)
pywikibot.exceptions.NoPage: Page [[wiktionary:zh-min-nan:Lūi-pia̍t:Lūi-pia̍t:A-la-pek-gú]] doesn't exist.
>>> po = pywikibot.Page(pywikibot.Site('zh-min-nan', 'wiktionary'), '分類:A-la-pek-gú')
>>> po.exists()
True
>>> list(po.revisions(total=1))
[<pywikibot.page.Revision object at 0x7f1d1ebc2fd0>]

@JAnD, could you post your user-config.py? Do you have any transliteration_target and console_encoding set?

XZise set Security to None.

Two more questions:

  • do you get this immediately after starting the interwiki bot? Because I get
Retrieving 35 pages from wiktionary:zh-min-nan.
[[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]]: [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]] gives new interwiki [[af:Kategorie:Woorde in Arabies]]
[[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]]: [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]] gives new interwiki [[ar:تصنيف:عربية]]
[[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]]: [[zh-min-nan:Lūi-pia̍t:A-la-pek-gú]] gives new interwiki [[ast:Categoría:Árabe]]

etc. Can you re-run with -debug, and provide the debug log that is then stored in logs/

  • Could you test whether
python pwb.py listpages -family:wiktionary -lang:zh-min-nan -subcats:Gí-giân -v -debug -get

gives the same error for you?

Okay one question from me now: Those warnings that a query returned unexpected titles. Are those new, because those rely also on APISite.sametitle so it could be connected to your original problem.

Okay one question from me now: Those warnings that a query returned unexpected titles. Are those new, because those rely also on APISite.sametitle so it could be connected to your original problem.

This error appeared in the middle of work, so maybe some change in server side, because later there were new names of categories in zh-min-nan.wikt

Because of T86621 I am still not able to try it correctly now in work PC. I'll try to completely reinstall pywikibot there :-(

The base directory is d:\Py\rewrite
=== Pywikibot framework v2.0 -- Logging header ===
COMMAND: ['listpages', '-family:wiktionary', '-lang:zh-min-nan', '-subcats:G\xed
-gi\xe2n', '-v', '-debug', '-get']
DATE: 2015-01-14 06:54:05.896000 UTC
VERSION: pywikibot-core (161110a, s5977, 2015/01/12, 21:07:52, n/a)
CONFIG FILE DIR: d:\Py\rewrite
PACKAGES:
  _ctypes (C:\Python27\DLLs\_ctypes.pyd) = 1.1.0
  _hashlib (C:\Python27\DLLs\_hashlib.pyd) = ??
  _socket (C:\Python27\DLLs\_socket.pyd) = ??
  _sqlite3 (C:\Python27\DLLs\_sqlite3.pyd) = ??
  _ssl (C:\Python27\DLLs\_ssl.pyd) = ??
  ctypes (C:\Python27\lib\ctypes\) = 1.1.0
  distutils (C:\Python27\lib\distutils\) = 2.7.3
  email (C:\Python27\lib\email\) = 4.0.3
  logging (C:\Python27\lib\logging\) = 0.5.1.2
  mwparserfromhell: No module named mwparserfromhell
  pickle (C:\Python27\lib\pickle.pyc) = $Revision: 72223 $
  pyexpat (C:\Python27\DLLs\pyexpat.pyd) = 2.7.3
  pywikibot ([path unknown]) = ??
  re (C:\Python27\lib\re.pyc) = 2.2.1
  unicodedata (C:\Python27\DLLs\unicodedata.pyd) = ??
  urllib (C:\Python27\lib\urllib.pyc) = 1.17
  urllib2 (C:\Python27\lib\urllib2.pyc) = 2.7
MODULES:
  pywikibot/comms/http.py  2015-01-13 09:16:52.361998
  pywikibot/data/api.py  2015-01-13 09:16:52.344997
  pywikibot/textlib.py 530dc70 2015-01-12 01:02:38
  pywikibot/i18n.py 4a96bbe 2015-01-13 09:05:27.612832
  pywikibot/comms/threadedhttp.py 8de4213 2015-01-12 01:02:38
  pywikibot/date.py 36dc254 2015-01-12 01:02:38
  pywikibot/exceptions.py 2a948c5 2015-01-12 01:02:38
  pywikibot/site.py  2015-01-13 09:17:32.105271
  pywikibot/bot.py  2015-01-13 09:11:58.632197
  pywikibot/throttle.py a311a20 2015-01-12 01:02:38
  pywikibot/page.py  2015-01-13 09:12:04.328523
  pywikibot/family.py  2015-01-13 09:12:02.689430
  pywikibot/plural.py 02a50e4 2015-01-12 01:02:38
  pywikibot/version.py 2229075 2015-01-12 01:02:38
  pywikibot/userinterfaces/terminal_interface.py b0e2743 2015-01-12 01:02:38
  pywikibot/config2.py  2015-01-13 09:12:00.894327
  pywikibot/userinterfaces/terminal_interface_win32.py 7e3fd89 2015-01-12 01:02:
38
  pywikibot/userinterfaces/terminal_interface_base.py 84f7102 2015-01-13 09:13:3
1.471508
  pywikibot/pagegenerators.py aeb2d15 2015-01-12 01:02:38
  pywikibot/tools.py  2015-01-13 09:16:52.399000
  pywikibot/diff.py 09fdfdf 2015-01-12 01:02:38
  pywikibot/login.py db7be8f 2015-01-12 01:02:38
  pywikibot/userinterfaces/transliteration.py 1d8e217 2015-01-12 01:02:38
=== === === === === === === === === === === === === ===
Pywikibot r7fd7983bff6db53bed6a75f5137623890f7a5292
Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
Found 1 wiktionary:zh-min-nan processes running, including this one.
ERROR: Traceback (most recent call last):
  File "D:\Py\rewrite\pywikibot\data\api.py", line 983, in submit
    headers=headers, body=body)
  File "D:\Py\rewrite\pywikibot\tools.py", line 711, in wrapper
    return obj(*__args, **__kw)
  File "D:\Py\rewrite\pywikibot\comms\http.py", line 248, in request
    baseuri = site.base_url(uri)
  File "D:\Py\rewrite\pywikibot\site.py", line 641, in __getattr__
    % (self.__class__.__name__, attr))
AttributeError: APISite instance has no attribute 'base_url'

/w/api.php?maxlag=5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query&
siprop=namespaces%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg, maxlag
=5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query&siprop=namespaces
%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg
WARNING: Waiting 5 seconds before retrying.
ERROR: Traceback (most recent call last):
  File "D:\Py\rewrite\pywikibot\data\api.py", line 983, in submit
    headers=headers, body=body)
  File "D:\Py\rewrite\pywikibot\tools.py", line 711, in wrapper
    return obj(*__args, **__kw)
  File "D:\Py\rewrite\pywikibot\comms\http.py", line 248, in request
    baseuri = site.base_url(uri)
  File "D:\Py\rewrite\pywikibot\site.py", line 641, in __getattr__
    % (self.__class__.__name__, attr))
AttributeError: APISite instance has no attribute 'base_url'

/w/api.php?maxlag=5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query&
siprop=namespaces%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg, maxlag
=5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query&siprop=namespaces
%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg
WARNING: Waiting 10 seconds before retrying.
JAnD claimed this task.

Probably only some serverside change, now it works again

XZise removed JAnD as the assignee of this task.Jan 14 2015, 10:48 AM
XZise lowered the priority of this task from Unbreak Now! to Medium.

I'm curious what server side change this might be…

Also please don't mark your bug reports immediately with “Unbreak Now!” (or another higher priority). Obviously you want that bug to be fixed, but doesn't everybody want that? And as we saw in both this and T86621, which were marked as “Unbreak Now!”, are pywikibot unrelated bugs (although I'm not sure what went wrong here).

jayvdb renamed this task from zh-min-nan to zh-min-nan wiktionary preloadpages.Jan 15 2015, 1:22 PM